2013-05-08 162 views
-1

下面的HTML代码:通过标签的内容Beautifulsoup搜索标签

<div class="rating-list"> 
<ul class="recommend"> 
<li> 
<span class="recommend-titleInline">Stayed April 2013, traveled as a couple</span> 
<ul class="recommend-column first"> 
<li class="recommend-answer"> 
<span class="rate rate_ss ss50"> 
<img class="sprite-ratings" src="http://c1.tacdn.com/img2/x.gif" alt="5 of 5 stars" content="5.0"/> 
</span> 
Value</li> 
<li class="recommend-answer"> 
<span class="rate rate_ss ss50"> 
<img class="sprite-ratings" src="http://c1.tacdn.com/img2/x.gif" alt="5 of 5 stars" content="5.0"/> 
</span> 
Location</li> 
<li class="recommend-answer"> 
<span class="rate rate_ss ss50"> 
<img class="sprite-ratings" src="http://c1.tacdn.com/img2/x.gif" alt="5 of 5 stars" content="5.0"/> 
</span> 
Sleep Quality</li> 
</ul> 
<ul class="recommend-column"> 
<li class="recommend-answer"> 
<span class="rate rate_ss ss50"> 
<img class="sprite-ratings" src="http://c1.tacdn.com/img2/x.gif" alt="5 of 5 stars" content="5.0"/> 
</span> 
Rooms</li> 
<li class="recommend-answer"> 
<span class="rate rate_ss ss50"> 
<img class="sprite-ratings" src="http://c1.tacdn.com/img2/x.gif" alt="5 of 5 stars" content="5.0"/> 
</span> 
Cleanliness</li> 
<li class="recommend-answer"> 
<span class="rate rate_ss ss50"> 
<img class="sprite-ratings" src="http://c1.tacdn.com/img2/x.gif" alt="5 of 5 stars" content="5.0"/> 
</span> 
Service</li> 
</ul> 
</li> 
</ul> 
</div> 

现在我已经使用Beautifulsoup得到整个标签的话,我想这样的“礼”标签:

valueRatingTag = subRatingListTags[i].find(name = 'li', attrs = { 'class' : 'recommend-answer' }, text = 'Value') 
locationRatingTag = subRatingListTags[i].find(name = 'li', attrs = { 'class' : 'recommend-answer' }, text = 'Location') 
sleepRatingTag = subRatingListTags[i].find(name = 'li', attrs = { 'class' : 'recommend-answer' }, text = 'Sleep Quality') 
     roomRatingTag = subRatingListTags[i].find(name = 'li', attrs = { 'class' : 'recommend-answer' }, text = 'Rooms') 
     cleanRatingTag = subRatingListTags[i].find(name = 'li', attrs = { 'class' : 'recommend-answer' }, text = 'Cleanliness') 
     serviceRatingTag = subRatingListTags[i].find(name = 'li', attrs = { 'class' : 'recommend-answer' }, text = 'Service') 

但似乎fail.the六个变量都没有,这是不是我expect.what我应该做

回答

0

会使用一个正则表达式作为参数传递给text帮助?

subRatingListTags[i].find(text=re.compile("Location")) 

换行符可能导致确切的文本匹配在这里失败。

+0

这样一来,我只能得到字符串 '位置',而不是标签 5 of 5 stars 位置 – haipeng31 2013-05-08 06:53:19

0

你不清楚你想要什么。总之:

>>> lis = [t for t in soup.find_all('li', 'recommend-answer')] 
>>> lis[0].text 
'\n\n\n\nValue' 
>>> lis[1].text 
'\n\n\n\nLocation' 
>>> lis[0].img['alt'] 
'5 of 5 stars' 

你一定要预处理的HTML开始分析它之前删除所有换行符。