2016-09-25 67 views
1

数据格式得到span标记内的值:如何用美丽的汤3

<tr><td>Modu</td><td><span class="comments">90</span></td></tr> 
<tr><td>Kenzie</td><td><span class="comments">88</span></td></tr> 

我想只有90,那么88等。 我多么努力:

#2.7 version python 
#link I used as input: http://python-data.dr-chuck.net/comments_283660.html 
import urllib 
from BeautifulSoup import * 
url = raw_input('Enter - ') 
html = urllib.urlopen(url).read() 
soup = BeautifulSoup(html) 
r=0; 
t=0 
tags = soup('span') 
for tag in tags: 
    #print tag.get('class', None) 
    #print tag.get('class="comments">', None) 
    print 'Contents:',tag.contents 

输出为:

Contents: [u'100'] 
Contents: [u'100'] 
Contents: [u'97'] 
Contents: [u'95'] 
.... 

如何避免 “U”,只有得到100,100,97,95 ...

回答

1

您可以索引内容列表print 'Contents:',tag.contents[0]或更好的只是从td中提取文本:

tags = soup('span') 
for tag in tags: 
    print('Contents:',tag.text) 

其中使用你的链接会给你:

('Contents:', u'100') 
('Contents:', u'100') 
('Contents:', u'97') 
('Contents:', u'95') 
('Contents:', u'95') 
('Contents:', u'94') 
('Contents:', u'93') 
('Contents:', u'92') 
('Contents:', u'84') 
('Contents:', u'78') 
('Contents:', u'78') 
('Contents:', u'76') 
('Contents:', u'69') 
('Contents:', u'64') 
('Contents:', u'60') 
('Contents:', u'58') 
('Contents:', u'53') 
('Contents:', u'51') 
('Contents:', u'49') 
('Contents:', u'49') 
('Contents:', u'45') 
('Contents:', u'45') 
('Contents:', u'45') 
('Contents:', u'44') 
('Contents:', u'39') 
('Contents:', u'38') 
('Contents:', u'37') 
('Contents:', u'35') 
('Contents:', u'34') 
('Contents:', u'33') 
('Contents:', u'32') 
('Contents:', u'32') 
('Contents:', u'30') 
('Contents:', u'29') 
('Contents:', u'28') 
('Contents:', u'27') 
('Contents:', u'21') 
('Contents:', u'19') 
('Contents:', u'16') 
('Contents:', u'16') 
('Contents:', u'15') 
('Contents:', u'13') 
('Contents:', u'13') 
('Contents:', u'12') 
('Contents:', u'11') 
('Contents:', u'9') 
('Contents:', u'6') 
('Contents:', u'2') 
('Contents:', u'1') 
('Contents:', u'1') 

u只是意味着你有的Unicode字符串,你可以调用str(tag.text)),如果你真的想删除它,或者如果你想整数,你将不得不调用int(tag.text))。另外我会建议你升级到bs4