0
我正在解析网站以计算提及关键字的换行符的数量。一切都正常运行下面的代码:使用.lower解析网站时,列表索引超出范围()
import time
import urllib2
from urllib2 import urlopen
import datetime
website = 'http://www.dailyfinance.com/2014/11/13/market-wrap-seventh-dow-record-in-eight-days/#!slide=3077515'
topSplit = 'NEW YORK -- '
bottomSplit = "<div class=\"knot-gallery\""
# Count mentions on newlines
def main():
try:
x = 0
sourceCode = urllib2.urlopen(website).read()
sourceSplit = sourceCode.split(topSplit)[1].split(bottomSplit)[0]
content = sourceSplit.split('\n') # provides an array
for line in content:
if 'gain' in line:
x += 1
print x
except Exception,e:
print 'Failed in the main loop'
print str(e)
main()
不过,我想考虑到所有提及特定关键字(在这种情况下'gain'
或'Gain'
)的。反过来,我在源代码中包含了.lower()
的阅读。
sourceCode = urllib2.urlopen(website).read().lower()
然而,这给我的错误:
Failed in the main loop
list index out of range
假设.lower()
被摆脱的指数,为什么会发生这种情况?
很好的回答,并根据你的建议我使用'topSplit ='NEW YORK - '.lower()'让它运行。我也会看看're'模块,谢谢你的支持。 – Chuck