美丽的汤错误：列表索引超出范围

我是一个**非常新的Python程序员。使用urllib和beautifulsoup在webcrawler上工作。请忽略顶部的while循环和我的增量，我只是运行这个测试版本，并为一页，但它最终将包括一整套。我的问题是，这会得到汤，但会产生一个错误。我不确定我是否正确收集表格数据，但我希望这段代码可以忽略链接并将文本写入.csv文件。现在我专注于将文本正确地打印到屏幕上。美丽的汤错误：列表索引超出范围

line 17, in <module> 
    uspc = col[0].string 
IndexError: list index out of range

这里是代码：

for row in table.findAll('tr')[1:]:

到：

for row in table.findAll('tr')[2:]:

的

import urllib 
from bs4 import BeautifulSoup 

i=125 
while i==125: 
    url = "http://www.uspto.gov/web/patents/classification/cpc/html/us" + str(i) + "tocpc.html" 
    print url + '\n' 
    i += 1 
    data = urllib.urlopen(url).read() 
    print data 
    #get the table data from dump 
    #append to csv file 
    soup = BeautifulSoup(data) 
    table = soup.find("table", width='80%') 
    for row in table.findAll('tr')[1:]: 
     col = row.findAll('td') 
     uspc = col[0].string 
     cpc1 = col[1].string 
     cpc2 = col[2].string 
     cpc3 = col[3].string 
     record = (uspc, cpc1, cpc2, cpc3) 
     print "|".join(record)

来源

2013-04-09 Super-cluser

[Beautifulsoup for row loop只能运行一次？]（http://stackoverflow.com/questions/15908604/beautifulsoup-for-row-loop-only-runs-once） – gauden 2013-04-09 18:10:16

最后，我通过改变以下行解决了这个问题错误是因为表格的第一行有分割colu mns

来源

2013-04-12 17:39:59

美丽的汤错误：列表索引超出范围

回答

相关问题