1
将多个类别的网页抓取到csv中。成功获得第一类成列,但第二列数据不写入csv。我正在使用的代码:抓取网站将数据移动到多个csv列
import urllib2
import csv
from bs4 import BeautifulSoup
url = "http://digitalstorage.journalism.cuny.edu/sandeepjunnarkar/tests/jazz.html"
page = urllib2.urlopen(url)
soup_jazz = BeautifulSoup(page)
all_years = soup_jazz.find_all("td",class_="views-field views-field-year")
all_category = soup_jazz.find_all("td",class_="views-field views-field-category-code")
with open("jazz.csv", 'w') as f:
csv_writer = csv.writer(f)
csv_writer.writerow([u'Year Won', u'Category'])
for years in all_years:
year_won = years.string
if year_won:
csv_writer.writerow([year_won.encode('utf-8')])
for categories in all_category:
category_won = categories.string
if category_won:
csv_writer.writerow([category_won.encode('utf-8')])
它将列标题写入第二列而不是category_won。
根据您的建议,我已把它编译阅读:
with open("jazz.csv", 'w') as f:
csv_writer = csv.writer(f)
csv_writer.writerow([u'Year Won', u'Category'])
for years, categories in zip(all_years, all_category):
year_won = years.string
category_won = categories.string
if year_won and category_won:
csv_writer.writerow([year_won.encode('utf-8'), category_won.encode('utf-8')])
但现在我已经收到以下错误:
csv_writer.writerow([year_won.encode( 'UTF-8' ),category_won.encode( 'UTF-8')]) ValueError异常:I/O操作上关闭的文件
只是去尝试,现在我上面列出得到一个错误。 – user1922698
@ user1922698:然后,您正在尝试运行'with'语句的*外部*循环。 –
但上面生成的内容一次又一次地显示了同一类别,但它们都是不同的类别。 – user1922698