2014-03-26 126 views
-1

尝试运行此代码时,我不断收到列表索引超出范围的错误,代码通过遍历页面解析站点表并将数据输入到Excel表中。IndexError:列表索引超出范围?

from urllib.request import urlopen 
from bs4 import BeautifulSoup 
from openpyxl import Workbook 
from openpyxl.cell import get_column_letter 
import datetime 


now = datetime.datetime.now() 

wb = Workbook() 
dest_filename = r'iOS Top Grossing Data.xlsx' 
ws = wb.active 
ws = wb.create_sheet() 
ws.title = now.strftime("%m-%d-%y") 
sh = wb.get_sheet_by_name('Sheet') 
wb.remove_sheet(sh) 

ws['A1'] = "REVENUE" 
ws.column_dimensions['A'].width = 11 
ws.cell('A1').style.alignment.horizontal = 'center' 
ws.cell('A1').style.font.bold = True 

ws['B1'] = "FREE" 
ws.column_dimensions['B'].width = 7 
ws.cell('B1').style.alignment.horizontal = 'center' 
ws.cell('B1').style.font.bold = True 

ws['C1'] = "PAID" 
ws.column_dimensions['C'].width = 7 
ws.cell('C1').style.alignment.horizontal = 'center' 
ws.cell('C1').style.font.bold = True 

ws['D1'] = "GAME" 
ws.column_dimensions['D'].width = 27 
ws.cell('D1').style.alignment.horizontal = 'center' 
ws.cell('D1').style.font.bold = True 

ws['E1'] = "PRICE" 
ws.column_dimensions['E'].width = 7 
ws.cell('E1').style.alignment.horizontal = 'center' 
ws.cell('E1').style.font.bold = True 

ws['F1'] = "REVENUE" 
ws.column_dimensions['F'].width = 11 
ws.cell('F1').style.alignment.horizontal = 'center' 
ws.cell('F1').style.font.bold = True 

ws['G1'] = "ARPU INDEX" 
ws.column_dimensions['G'].width = 15 
ws.cell('G1').style.alignment.horizontal = 'center' 
ws.cell('G1').style.font.bold = True 

ws['H1'] = "DAILY NEW USERS" 
ws.column_dimensions['H'].width = 17 
ws.cell('H1').style.alignment.horizontal = 'center' 
ws.cell('H1').style.font.bold = True 

ws['I1'] = "DAILY ACTIVE USERS" 
ws.column_dimensions['I'].width = 19 
ws.cell('I1').style.alignment.horizontal = 'center' 
ws.cell('I1').style.font.bold = True 

ws['J1'] = "ARPU" 
ws.column_dimensions['J'].width = 7 
ws.cell('J1').style.alignment.horizontal = 'center' 
ws.cell('J1').style.font.bold = True 

ws['K1'] = "RANK CHANGE" 
ws.column_dimensions['K'].width = 14 
ws.cell('K1').style.alignment.horizontal = 'center' 
ws.cell('K1').style.font.bold = True 

page = 0 

while page < 6: 
     page += 1 
     url = "http://thinkgaming.com/app-sales-data/?page=" + str(page) 
     html = str(urlopen(url).read()) 

     soup = BeautifulSoup(html) 
     table = soup.find("table") 

     counter = 0 

     while counter < 51:  
         rows = table.findAll('tr')[counter] 
         cols = rows.findAll('td') 

         revenue = cols[0].string 
         revenue = revenue.replace('\\n', '') 
         revenue = revenue.replace(' ', '') 

         free = cols[1].string 
         free = free.replace('\\n', '') 
         free = free.replace(' ', '') 

         paid = cols[2].string 
         paid = paid.replace('\\n', '') 
         paid = paid.replace(' ', '') 

         game = cols[3].string 

         price = cols[4].string 
         price = price.replace('\\n', '') 
         price = price.replace(' ', '') 

         revenue2 = cols[5].string 
         revenue2 = revenue2.replace('\\n', '') 
         revenue2 = revenue2.replace(' ', '') 

         dailynewusers = cols[6].string 
         dailynewusers = dailynewusers.replace('\\n', '') 
         dailynewusers = dailynewusers.replace(' ', '') 

         cell_location = counter 
         cell_location += 1 

         ws['A'+str(cell_location)] = revenue 

         counter += 1 

wb.save(filename = dest_filename)    

这里的回溯:

Traceback (most recent call last): 
File "C:\Users\shiver_admin\Desktop\script.py", line 89, in <module> revenue = cols[0].string IndexError: list index out of range 
+0

请张贴您的回溯。 – Manhattan

+0

回溯(最近通话最后一个): 文件 “C:\用户\ shiver_admin \桌面\ script.py”,行89,在 收入= COLS [0] .string IndexError:列表索引超出 – Droster

+1

你的范围得到了那个错误,因为cols显然是空的。 '0'是指第一个索引,如果该列表没有第一个元素(即它没有元素),那么唯一可能超出范围的方式是。 –

回答

3

一样的意见,你没有得到任何<td>标签简单地

正在对收入= COLS [0] .string给出的错误因为它们不存在,尤其是索引[0]。该表中的第一个<tr>标签是这样的:

enter image description here

如果你注意到,它有内部头。基本上,你应该开始你的counter在1而不是0.

另一种方法来确保你得到正确的行是检查他们是否有类。如果您注意到,正确的<tr>行内有类(oddeven)。你可以使用像table.find_all("tr", class_=True)这样的东西来获取它们。

示例代码(注意:在Python 2.7编码,但很容易的修改,以适应Python 3.x都有):

import requests as rq 
from bs4 import BeautifulSoup as bsoup 

url = "http://thinkgaming.com/app-sales-data/?page=1" 
r = rq.get(url) 
soup = bsoup(r.content) 

table = soup.find("table", class_="table") 

rows = table.find_all("tr", class_=True) 
cols = [td.get_text().strip().encode("utf-8") for td in rows[0].find_all("td")] 

print cols 

结果:

['1', '10', '-', 'Clash of Clans', 'Free', 'n/a', '44,259'] 
[Finished in 2.8s] 

让我们知道这会有所帮助。

相关问题