0
我为NFL播放数据创建了以下刮板。它将结果写入一个csv文件并完成我所需要的一切,除非我不知道如何在csv文件的每一行中为谁实际拥有该球附加一列。
我可以从“主页”和“离开”标签中抓取文本,以显示谁在游戏中用于查询目的,但我需要刮板识别何时占有权发生变化(从家到远或反之亦然)。我对Python相当陌生,尝试过不同的缩进,但我不认为这是问题所在。任何帮助将不胜感激。我觉得答案超出了我的理解范围。用lxml进行Python解析
我也意识到,我的代码可能不是最Pythonic,但我还在学习。我正在使用Python 2.7.9。
import lxml
from lxml import html
import csv
import urllib2
import re
game_date = raw_input('Enter game date: ')
data_html = 'http://www.cbssports.com/nfl/gametracker/playbyplay/[email protected]'
url = urllib2.urlopen(data_html).read()
data = lxml.html.fromstring(url)
plays = data.cssselect('tr#play')
home = data.cssselect('tr#home')
away = data.cssselect('tr#away')
csvfile = open('C:\\DATA\\PBP.csv', 'a')
writer = csv.writer(csvfile)
for play in plays:
frame = []
play = play.text_content()
down = re.search(r'\d', play)
if down == None:
pass
else:
down = down.group()
dist = re.search(r'-(\d+)', play)
if dist == None:
pass
else:
dist = dist.group(1)
field_end = re.search(r'[A-Z]+', play)
if field_end == None:
pass
else:
field_end = field_end.group()
yard_line = re.search(r'[A-Z]+([\d]+)', play)
if yard_line == None:
pass
else:
yard_line = yard_line.group(1)
desc = re.search(r'\s(.*)', play)
if desc == None:
pass
else:
desc = desc.group()
time = re.search(r'\((..*\d)\)\s', play)
if time == None:
pass
else:
time = time.group(1)
for team in away:
teamA = team.text_content()
teamA = re.search(r'(\w+)\s', teamA)
teamA = teamA.group(1)
teamA = teamA.upper()
for team in home:
teamH = team.text_content()
teamH = re.search(r'(\w+)\s', teamH)
teamH = teamH.group(1)
teamH = teamH.upper()
frame.append(game_date)
frame.append(down)
frame.append(dist)
frame.append(field_end)
frame.append(yard_line)
frame.append(time)
frame.append(teamA)
frame.append(teamH)
frame.append(desc)
writer.writerow(frame)
csvfile.close()
我做的不好解释什么,我想,所以我肯定是道歉。当for循环遍历列表时,我希望它在遇到'home'标记时返回主队名称。然后继续返回主队名称,直到遇到“离开”标签。然后为客队做同样的事情。希望能让它更清楚一些。 – RushNMob