2017-03-16 39 views
1

所以我试图从ESPN上刮一个NBA比赛的盒子比分。我试图首先获得名称,但是我很难摆脱html标签。抓取Python中的html标签时抓取

我使用

get_text(), .text(), .string_strip() 

尝试,但他们不断给我的错误。

下面是我正在使用的代码。

from bs4 import BeautifulSoup 
import requests 

url= "http://scores.espn.com/nba/boxscore?gameId=400900407" 
r = requests.get(url) 
soup = BeautifulSoup(r.text,"html.parser") 

name = [] 
for row in soup.find_all('tr')[1:]: 
     player_name = row.find('td', attrs={'class': 'name'}) 
     name.append(player_name) 
print(name) 
+0

你说的错误。什么错误? –

回答

3

使用player_name.text应该工作,但问题是,有时row.find('td', attrs={'class': 'name'}是空的。试试这样:

if player_name: 
    name.append(player_name.text) 
+0

这工作!谢谢 – jhaywoo8

2

我解决这个问题这样的:

from bs4 import BeautifulSoup 
import requests 

url= "http://scores.espn.com/nba/boxscore?gameId=400900407" 
r = requests.get(url) 
soup = BeautifulSoup(r.text,"html.parser") 

name = [] 
for row in soup.find_all('tr')[1:]: 
    try: 
     player_name = row.select('td.name span')[0].text 
     name.append(player_name) 
    except: 
     pass 
print(name) 
1

我的代码,供大家参考

import requests 

from pyquery import PyQuery as pyq 

url= "http://scores.espn.com/nba/boxscore?gameId=400900407" 
r = requests.get(url) 
doc = pyq(r.content) 
print([h.text() for h in doc('.abbr').items()])