2016-02-04 140 views
0

我为NFL播放数据创建了以下刮板。它将结果写入一个csv文件并完成我所需要的一切,除非我不知道如何在csv文件的每一行中为谁实际拥有该球附加一列。
我可以从“主页”和“离开”标签中抓取文本,以显示谁在游戏中用于查询目的,但我需要刮板识别何时占有权发生变化(从家到远或反之亦然)。我对Python相当陌生,尝试过不同的缩进,但我不认为这是问题所在。任何帮助将不胜感激。我觉得答案超出了我的理解范围。用lxml进行Python解析

我也意识到,我的代码可能不是最Pythonic,但我还在学习。我正在使用Python 2.7.9。

import lxml 
from lxml import html 
import csv 
import urllib2 
import re 

game_date = raw_input('Enter game date: ') 

data_html = 'http://www.cbssports.com/nfl/gametracker/playbyplay/[email protected]' 

url = urllib2.urlopen(data_html).read() 

data = lxml.html.fromstring(url) 


plays = data.cssselect('tr#play') 
home = data.cssselect('tr#home') 
away = data.cssselect('tr#away') 

csvfile = open('C:\\DATA\\PBP.csv', 'a') 
writer = csv.writer(csvfile) 

for play in plays: 

    frame = [] 
    play = play.text_content() 

    down = re.search(r'\d', play) 
    if down == None: 
     pass 
    else: 
     down = down.group() 

    dist = re.search(r'-(\d+)', play) 
    if dist == None: 
     pass 
    else: 
     dist = dist.group(1) 

    field_end = re.search(r'[A-Z]+', play) 

    if field_end == None: 
     pass 
    else: 
     field_end = field_end.group() 

    yard_line = re.search(r'[A-Z]+([\d]+)', play) 

    if yard_line == None: 
     pass 
    else: 
     yard_line = yard_line.group(1) 

    desc = re.search(r'\s(.*)', play) 
    if desc == None: 
     pass 
    else: 
     desc = desc.group() 

    time = re.search(r'\((..*\d)\)\s', play) 
    if time == None: 
     pass 
    else: 
     time = time.group(1) 

    for team in away: 
     teamA = team.text_content() 
     teamA = re.search(r'(\w+)\s', teamA) 
     teamA = teamA.group(1) 
     teamA = teamA.upper() 

    for team in home: 
     teamH = team.text_content() 
     teamH = re.search(r'(\w+)\s', teamH) 
     teamH = teamH.group(1) 
     teamH = teamH.upper() 

    frame.append(game_date) 
    frame.append(down) 
    frame.append(dist) 
    frame.append(field_end) 
    frame.append(yard_line) 
    frame.append(time) 
    frame.append(teamA) 
    frame.append(teamH) 
    frame.append(desc) 

    writer.writerow(frame) 

csvfile.close() 
+0

我做的不好解释什么,我想,所以我肯定是道歉。当for循环遍历列表时,我希望它在遇到'home'标记时返回主队名称。然后继续返回主队名称,直到遇到“离开”标签。然后为客队做同样的事情。希望能让它更清楚一些。 – RushNMob

回答

0

我猜你需要另一个值追加到帧,每一行,其中是否藏有改变的迹象。

后:

frame.append(desc) 

地址:

if teamA == teamH: 
    frame.append("Same possession") 
else: 
    frame.append("Changed possession") 

(注意,这个假设的队名是一致的,没有多余的空格/填充/在teamA/teamH值格式)。

您不必使用字符串,例如,您可以将0设为无变化,1为变更占有。

HTH 巴尼