2016-10-01 47 views
1

我正在使用BeautifulSoup4并请求从网站上刮取信息。Python忽略字符并从循环列表中打印下一个字符

然后,我将所需的信息存储在列表中,有两个列表,分别列出了从页面抓取的两种不同类型的信息。

try: 
     for i in range(0,1000): 
      location = dive_data1[((9*i)-7)].text 
      locations.append(location) 
      location = dive_data2[((9*i)-7)] 
      locations.append(location) 
      depth = dive_data1[((9*i)-6)].text 
      depths.append(depth) 
      depth = dive_data2[((9*i)-6)].text 
      depths.append(depth) 

    except: 
     pass 

之后,我尝试将这些列表传递给另一个循环以将内容写入CSV文件。

try: 
     writer = csv.writer(dive_log) 
     writer.writerow(("Locations and depths")) 
     writer.writerow(("Sourced from:", str(url_page))) 
     writer.writerow(("Location", "Depth")) 
     for i in range(len(locations)): 
      writer.writerow((locations[i], depths[i])) 

当我运行脚本我收到此错误:

writer.writerow((locations[i], depths[i])) 
UnicodeEncodeError: 'ascii' codec can't encode characters in position 65-66:  ordinal not in range(128) 

我尝试这样通过字符它不能编码:

writer = csv.writer(dive_log) 
    writer.writerow(("Locations and depths")) 
    writer.writerow(("Sourced from:", str(url_page))) 
    writer.writerow(("Location", "Depth")) 
    for i in range(len(locations)): 
     try: 
      writer.writerow((locations[i], depths[i])) 

     except: 
      pass 

当运行此,只有线之前到for循环被执行,它完全通过for循环的重复。

我的脚本中的代码的全部内容复制到下面,以防它与我在其余部分中没有看到的内容有关。

import csv 
from bs4 import BeautifulSoup 
import requests 

dive_log = open("divelog.csv", "wt") 
url_page = "https://en.divelogs.de/log/Mark_Gosling" 
r = requests.get(url_page) 
soup = BeautifulSoup(r.content) 

dive_data1 = soup.find_all("tr", {"class": "td2"}) 
dive_data2 = soup.find_all("td", {"class": "td"}) 
locations = [] 
depths = [] 

try: 
    for i in range(0,1000): 
     location = dive_data1[((9*i)-7)].text 
     locations.append(location) 
     location = dive_data2[((9*i)-7)] 
     locations.append(location) 
     depth = dive_data1[((9*i)-6)].text 
     depths.append(depth) 
     depth = dive_data2[((9*i)-6)].text 
     depths.append(depth) 

except: 
    pass 

try: 
    writer = csv.writer(dive_log) 
    writer.writerow(("Locations and depths")) 
    writer.writerow(("Sourced from:", str(url_page))) 
    writer.writerow(("Location", "Depth")) 
    for i in range(len(locations)): 
     try: 
      writer.writerow((locations[i], depths[i])) 

     except: 
      pass 

finally: 
    dive_log.close() 

print open("divelog.csv", "rt").read() 
print "\n\n" 
print locations 
+1

这应该跟charac它不能编码:'汤= BeautifulSoup(response.content.decode('utf-8','忽略'))' – yedpodtrzitko

+0

不要忽视任何东西,除非你可以丢失数据,找出正确的编码然后使用那。数据也是UTF-8编码,所以问题在于别处。也不要使用毯子除外,抓住你的期望和记录/打印错误。 –

回答

-1

像@yedpodtriztko指出。您可以见好就收,它不能与下面的解码字符:

,而不是做的:

soup = BeautifulSoup(r.content) 

,你可以这样做:

soup = BeautifulSoup(r.content.decode('utf-8', 'ignore')) 
0

你需要编码为UTF-8在你写的循环中:

for i in range(len(locations)): 
     writer.writerow((locations[i].encode("utf-8"), depths[i].encode("utf-8")))