使用.lower解析网站时，列表索引超出范围（）

我正在解析网站以计算提及关键字的换行符的数量。一切都正常运行下面的代码：使用.lower解析网站时，列表索引超出范围（）

import time 
import urllib2 
from urllib2 import urlopen 
import datetime 

website = 'http://www.dailyfinance.com/2014/11/13/market-wrap-seventh-dow-record-in-eight-days/#!slide=3077515' 
topSplit = 'NEW YORK -- ' 
bottomSplit = "<div class=\"knot-gallery\"" 

# Count mentions on newlines 
def main(): 
    try: 
     x = 0 
     sourceCode = urllib2.urlopen(website).read() 
     sourceSplit = sourceCode.split(topSplit)[1].split(bottomSplit)[0] 
     content = sourceSplit.split('\n') # provides an array 

     for line in content: 
      if 'gain' in line: 
       x += 1 

     print x 

    except Exception,e: 
     print 'Failed in the main loop' 
     print str(e) 

main()

不过，我想考虑到所有提及特定关键字（在这种情况下'gain'或'Gain'）的。反过来，我在源代码中包含了.lower()的阅读。

sourceCode = urllib2.urlopen(website).read().lower()

然而，这给我的错误：

Failed in the main loop

list index out of range

假设.lower()被摆脱的指数，为什么会发生这种情况？

来源

2015-04-07 Chuck

您只使用小写字符串（这就是lower()所做的），但您尝试使用topSplit = 'NEW YORK -- '进行拆分，这应该使用单个项目创建列表。

然后，您可以尝试访问索引1，这将总是不能在该列表：

sourceCode.split(topSplit)[1]

考虑到这两种情况下，看看与re模块正则表达式的使用，下面是一个例子：

>>> string = "some STRING lol" 
>>> re.split("string", string, flags=re.IGNORECASE) 
['some ', ' lol'] 
>>> re.split("STRING", string, flags=re.IGNORECASE) 
['some ', ' lol']

来源

2015-04-07 11:27:52

很好的回答，并根据你的建议我使用'topSplit ='NEW YORK - '.lower（）'让它运行。我也会看看're'模块，谢谢你的支持。 – Chuck

使用.lower解析网站时，列表索引超出范围（）

回答

相关问题