使用BeautifulSoup解析并使用特殊格式获得结果

我是新手，我开始使用BeautifulSoup和Python开发，并且我希望以全文形式获取结果，而不使用任何HTML标记或其他非文本元素。使用BeautifulSoup解析并使用特殊格式获得结果

我这样做是使用Python：

#!/usr/bin/env python 

import urllib2 
from bs4 import BeautifulSoup 

html_content = urllib2.urlopen("http://www.demo.com/index.php") 

soup = BeautifulSoup(html_content, "lxml") 

# COMMENTS COUNT 
count_comment = soup.find("span", "sidebar-comment__label") 
count_comment 
count_comment_final = count_comment.find_next("meta") 


# READ COUNT 
count_read = soup.find("span", "sidebar-read__label js-read") 
count_read 
count_read_final = count_read.find_next("meta") 

# PRINT RESULT 
print count_comment_final 
print count_read_final

我的HTML看起来像这样：

<div class="box"> 
     <span class="sidebar-comment__label">Comments</span> 
     <meta itemprop="interactionCount" content="Comments:115"> 
</div> 


<div class="box"> 
     <span class="sidebar-read__label js-read">Read</span> 
     <meta itemprop="interactionCount" content="Read:10"> 
</div>

，我得到这个：

<meta content="Comments:115" itemprop="interactionCount"/> 
<meta content="Read:10" itemprop="interactionCount"/>

我会得到这样的：

You've 115 comments 
You've 10 read

首先，这可能吗？

其次，我的代码好吗？

第三，你能帮助我吗？ ;-)

来源

2014-09-25 TwinyTwice

count_comment_final和count_read_final是从输出中清楚看到的标签。您需要提取两个标签的属性content的值。这是使用count_comment_final['content']完成这将给作为Comments:115，使用split(':')

#!/usr/bin/env python 

import urllib2 
from bs4 import BeautifulSoup 

html_content = urllib2.urlopen("http://www.demo.com/index.php") 

soup = BeautifulSoup(html_content, "lxml") 

# COMMENTS COUNT 
count_comment = soup.find("span", "sidebar-comment__label") 
count_comment 
count_comment_final = count_comment.find_next("meta") 


# READ COUNT 
count_read = soup.find("span", "sidebar-read__label js-read") 
count_read 
count_read_final = count_read.find_next("meta") 

# PRINT RESULT 
print count_comment_final['content'].split(':')[1] 
print count_read_final['content'].split(':')[1]

来源

2014-09-25 05:10:18 nu11p01n73R

差不多完成了，它会显示 “注释” 和 “读”，而不是 “115” 和 “10”。 – TwinyTwice 2014-09-25 05:18:41

使用'split（'：'）[1]'。对不起 – nu11p01n73R 2014-09-25 05:20:17

count_comment_final和count_read_final是标签元件，可以使用剥去Comments:，

count_comment_final.get('content')

这会给这样的输出，

'Comments:115'

所以你可以得到评论count：伯爵一样，

count_comment_final.get('content').split(':')[1]

同样适用于count_read_final，

count_read_final.get('content').split(':')[1]

来源

2014-09-25 05:14:27

使用BeautifulSoup解析并使用特殊格式获得结果

回答

相关问题