2014-12-25 127 views
3

我想从以下wikipedia page检索3列(NFL团队,玩家姓名,大学团队)。我是python的新手,一直在尝试使用beautifulsoup来完成这个任务。我只需要属于QB的列,但我甚至无法获得所有列,尽管位置。这是我迄今为止所做的,它什么都不输出,我不完全确定为什么。我相信这是由于一个标签,但我不知道要改变什么。任何帮助将不胜感激。'Wikipedia使用Python刮脸

wiki = "http://en.wikipedia.org/wiki/2008_NFL_draft" 
header = {'User-Agent': 'Mozilla/5.0'} #Needed to prevent 403 error on Wikipedia 
req = urllib2.Request(wiki,headers=header) 
page = urllib2.urlopen(req) 
soup = BeautifulSoup(page) 

rnd = "" 
pick = "" 
NFL = "" 
player = "" 
pos = "" 
college = "" 
conf = "" 
notes = "" 

table = soup.find("table", { "class" : "wikitable sortable" }) 

#print table 

#output = open('output.csv','w') 

for row in table.findAll("tr"): 
    cells = row.findAll("href") 
    print "---" 
    print cells.text 
    print "---" 
    #For each "tr", assign each "td" to a variable. 
    #if len(cells) > 1: 
     #NFL = cells[1].find(text=True) 
     #player = cells[2].find(text = True) 
     #pos = cells[3].find(text=True) 
     #college = cells[4].find(text=True) 
     #write_to_file = player + " " + NFL + " " + college + " " + pos 
     #print write_to_file 

    #output.write(write_to_file) 

#output.close() 

我知道它有很多评论它,因为我试图找到故障是在哪里。

回答

5

这里是我会做什么:

  • 发现使用find_next_sibling()
  • 发现里面
  • 的每一行所有tr标签Player Selections
  • 获得下一wikitable,发现tdth标签并通过索引获得想要的细胞

下面是代码:

filter_position = 'QB' 
player_selections = soup.find('span', id='Player_selections').parent 
for row in player_selections.find_next_sibling('table', class_='wikitable').find_all('tr')[1:]: 
    cells = row.find_all(['td', 'th']) 

    try: 
     nfl_team, name, position, college = cells[3].text, cells[4].text, cells[5].text, cells[6].text 
    except IndexError: 
     continue 

    if position != filter_position: 
     continue 

    print nfl_team, name, position, college 

这里是输出(仅四分卫被过滤):

Atlanta Falcons Ryan, MattMatt Ryan† QB Boston College 
Baltimore Ravens Flacco, JoeJoe Flacco QB Delaware 
Green Bay Packers Brohm, BrianBrian Brohm QB Louisville 
Miami Dolphins Henne, ChadChad Henne QB Michigan 
New England Patriots O'Connell, KevinKevin O'Connell QB San Diego State 
Minnesota Vikings Booty, John DavidJohn David Booty QB USC 
Pittsburgh Steelers Dixon, DennisDennis Dixon QB Oregon 
Tampa Bay Buccaneers Johnson, JoshJosh Johnson QB San Diego 
New York Jets Ainge, ErikErik Ainge QB Tennessee 
Washington Redskins Brennan, ColtColt Brennan QB Hawaiʻi 
New York Giants Woodson, Andre'Andre' Woodson QB Kentucky 
Green Bay Packers Flynn, MattMatt Flynn QB LSU 
Houston Texans Brink, AlexAlex Brink QB Washington State