2017-05-04 82 views
0

使用BeautifulSoup,我从所述页面的html文档中提取了网页上的评论。使用此代码我已经能够打印出意见:从HTML文档中提取文本到单词列表中

import urllib2 
 
from bs4 import BeautifulSoup 
 

 
url = "http://songmeanings.com/songs/view/3530822107858560012/" 
 
response = urllib2.build_opener(urllib2.HTTPCookieProcessor).open(url) 
 
html_doc = response.read() 
 
soup = BeautifulSoup(html_doc, 'html.parser') 
 

 
def loop_until(text,first_elem): 
 
    try: 
 
    text += first_elem.string 
 
    if first_elem.next == first_elem.find_next('div'): 
 
     return text 
 
    else: 
 
     return loop_until(text,first_elem.next.next) 
 
    except TypeError: 
 
     pass 
 
     
 
wordList = [] 
 

 
for strong_tag in soup.find_all('strong'): 
 
    next_elem = strong_tag.next_sibling 
 
    print loop_until("", next_elem)

现在我需要从该选择所有的字,并将其添加到单词表,我将如何去这样做?

回答

1

改变你的最后一行,使用append

wordList.append(loop_until("", next_elem)) 
+0

笑!我不知道为什么这并没有超出我的想法。谢谢! –