2015-06-27 65 views
0

我想从PGA网站上刮取数据以获得美国所有高尔夫球场的列表。我想抓取数据并输入到CSV文件中。我的问题是运行我的脚本后,我得到这个错误。任何人都可以帮助解决这个错误,以及我如何能够提取数据?UnicodeEncodeError:使用Python和beautifulsoup4刮取数据

以下是错误消息:

File "/Users/AGB/Final_PGA2.py", line 44, in
writer.writerow(row)

UnicodeEncodeError: 'ascii' codec can't encode character u'\u201c' in position 35: ordinal not in range(128)

脚本下面;

import csv 
import requests 
from bs4 import BeautifulSoup 

courses_list = [] 
for i in range(906):  # Number of pages plus one 
    url = "http://www.pga.com/golf-courses/search?page={}&searchbox=Course+Name&searchbox_zip=ZIP&distance=50&price_range=0&course_type=both&has_events=0".format(i) 
r = requests.get(url) 
soup = BeautifulSoup(r.content) 

g_data2=soup.find_all("div",{"class":"views-field-nothing"}) 

for item in g_data2: 
    try: 
      name = item.contents[1].find_all("div",{"class":"views-field-title"})[0].text 
      print name 
    except: 
      name='' 
    try: 
      address1=item.contents[1].find_all("div",{"class":"views-field-address"})[0].text 
    except: 
      address1='' 
    try: 
      address2=item.contents[1].find_all("div",{"class":"views-field-city-state-zip"})[0].text 
    except: 
      address2='' 
    try: 
      website=item.contents[1].find_all("div",{"class":"views-field-website"})[0].text 
    except: 
      website='' 
    try: 
      Phonenumber=item.contents[1].find_all("div",{"class":"views-field-work-phone"})[0].text 
    except: 
      Phonenumber=''  

    course=[name,address1,address2,website,Phonenumber] 

    courses_list.append(course) 


with open ('PGA_Final.csv','a') as file: 
      writer=csv.writer(file) 
      for row in courses_list: 
       writer.writerow(row) 
+0

您能编辑您的文章以正确显示吗?如果您将整个东西缩进4个空格,它将显示为代码块而不是未格式化的文本。 –

+0

我编辑了帖子,等待批准。 – Leb

+0

http://stackoverflow.com/questions/30551429/error-writing-data-to-csv-due-to-ascii-error-in-python/30551550#30551550 –

回答

0
with open ('PGA_Final.csv','a') as file: 
      writer=csv.writer(file) 
      for row in courses_list: 
       writer.writerow(row) 

修改成:

with open ('PGA_Final.csv','a') as file: 
      writer=csv.writer(file) 
      for row in courses_list: 
       writer.writerow(row.encode('utf-8')) 

或者:

import codecs 
.... 
with codecs.open('PGA_Final.csv','a', encoding='utf-8') as file: 
      writer=csv.writer(file) 
      for row in courses_list: 
       writer.writerow(row) 
+0

你也可以使用['codecs.open'](https://docs.python.org/2/library/codecs。 html#codecs.open),它像普通的'open'一样工作,但也接受'encoding' kwarg。 –

+0

我在您的建议中增加了另一个解决方案。 – Leb

+1

AttributeError:'list'对象没有第一个选项的'encode'属性 – Gonzalo68

1

你不应该在Python 3下得到的错误以下是修复了一些无关的问题的代码示例您码。它分析给定网页上的指定字段,并将它们保存为csv:

#!/usr/bin/env python3 
import csv 
from urllib.request import urlopen 
import bs4 # $ pip install beautifulsoup4 

page = 905 
url = ("http://www.pga.com/golf-courses/search?page=" + str(page) + 
     "&searchbox=Course+Name&searchbox_zip=ZIP&distance=50&price_range=0" 
     "&course_type=both&has_events=0") 
with urlopen(url) as response: 
    field_content = bs4.SoupStrainer('div', 'views-field-nothing') 
    soup = bs4.BeautifulSoup(response, parse_only=field_content) 

fields = [bs4.SoupStrainer('div', 'views-field-' + suffix) 
      for suffix in ['title', 'address', 'city-state-zip', 'website', 'work-phone']] 

def get_text(tag, default=''): 
    return tag.get_text().strip() if tag is not None else default 

with open('pga.csv', 'w', newline='') as output_file: 
    writer = csv.writer(output_file) 
    for div in soup.find_all(field_content): 
     writer.writerow([get_text(div.find(field)) for field in fields])