2012-08-17 25 views
0

我正在运行for循环来抓取某些XML的内容,并且它工作正常,直到我达到第29次迭代。在这一点上它给我这个错误:list索引错误(即使存在)?

File "C:\Program Files (x86)\Google\google_appengine\lib\webapp2\webapp2.py", line 572, in dispatch 
    return self.handle_exception(e, self.app.debug) 
File "C:\Program Files (x86)\Google\google_appengine\lib\webapp2\webapp2.py", line 570, in dispatch 
    return method(*args, **kwargs) 
File "J:\Art & Graphic Design\Graphic Design\Websites\lawvoter-dev\cron_congressman.py", line 64, in get 
    birthday  = re.findall("<birthday>(.*)</birthday>",element)[0] 
IndexError: list index out of range 

的代码是:

for element in members: 
      title   = re.findall("<title>(.*)</title>",element)[0] 
      role   = re.findall("<role_type_label>(.*)</role_type_label>",element)[0] 
      name_sortable = re.findall("<name_sortable>(.*)</name_sortable>",element)[0] 
      firstname  = re.findall("<firstname>(.*)</firstname>",element)[0] 
      lastname  = re.findall("<lastname>(.*)</lastname>",element)[0] 
      gender  = re.findall("<gender_label>(.*)</gender_label>",element)[0] 
      birthday  = re.findall("<birthday>(.*)</birthday>",element)[0] 
      party   = re.findall("<party>(.*)</party>",element)[0] 
      state   = re.findall("<state>(.*)</state>",element)[0] 
      description = re.findall("<description>(.*)</description>",element)[0] 
      start_date = re.findall("<startdate>(.*)</startdate>",element)[0] 
      end_date  = re.findall("<enddate>(.*)</enddate>",element)[0] 
      website  = re.findall("<website>(.*)</website>",element)[0] 
      bioguideid = re.findall("<bioguideid>(.*)</bioguideid>",element)[0] 
      osid   = re.findall("<osid>(.*)</osid>",element)[0] 
      pvsid   = re.findall("<pvsid>(.*)</pvsid>",element)[0] 
      twitterid  = re.findall("<twitterid>(.*)</twitterid>",element)[0] 
      youtubeid  = re.findall("<youtubeid>(.*)</youtubeid>",element)[0] 

      member = Congressman(title=title, role=role, name_sortable=name_sortable, firstname=firstname, lastname=lastname, gender=gender, birthday=birthday, party=party, state=state, 
           description=description, start_date=start_date, end_date=end_date, website=website, bioguideid=bioguideid, osid=osid, pvsid=pvsid, twitterid=twitterid, youtubeid=youtubeid) 
      member.put() 

我真的不知道为什么这个错误弹出?它在前29次迭代中总是正常工作?以防万一,数据模型中的每个元素也被设置为“default = None”。但是,当我查看XML本身,并转到错误发生的确切位置时,该值实际上就是存在的。任何人都知道为什么它会给出错误,即使该值存在?

回答

1

它看起来像

birthday  = re.findall("<birthday>(.*)</birthday>",element)[0] 

返回一个空列表和你正试图以提取不在列表中的第一个元素,它抛出

IndexError: list index out of range 

喜欢这里:

>>> l = [] 
>>> l[0] 
Traceback (most recent call last): 
    File "<stdin>", line 1, in <module> 
IndexError: list index out of range 
>>> 

编辑:

import re, logging 

def findelement(item, element): 
    i = re.findall(item, element) 
    if not i: 
     logging.info('no item found for %s with element %s' %(item, element)) 
     return '' 
    return i[0] 


for element in members: 
    title = findelement("<title>(.*)</title>", element) 
    ... 
+0

这就是我的想法,但是当我打印出那一行时,迭代会在列表中打印一个值。类似于: '>> a = ['1984-10-20'] >>> a [0] IndexError'' – glitchbox 2012-08-17 13:22:07

+0

相似还是正好?你正在迭代成员,所以它可能是那个元素在特定的迭代中返回一个空列表。尝试记录结果。 – aschmid00 2012-08-17 13:26:30

+0

当我查看它显示的XML时,我在“生日”之后投掷了一张照片,在第29次迭代之后,出现“Status:500”错误。 – glitchbox 2012-08-17 13:31:31