从HTML文档中提取文本到单词列表中

使用BeautifulSoup，我从所述页面的html文档中提取了网页上的评论。使用此代码我已经能够打印出意见：从HTML文档中提取文本到单词列表中

import urllib2 
 
from bs4 import BeautifulSoup 
 

 
url = "http://songmeanings.com/songs/view/3530822107858560012/" 
 
response = urllib2.build_opener(urllib2.HTTPCookieProcessor).open(url) 
 
html_doc = response.read() 
 
soup = BeautifulSoup(html_doc, 'html.parser') 
 

 
def loop_until(text,first_elem): 
 
    try: 
 
    text += first_elem.string 
 
    if first_elem.next == first_elem.find_next('div'): 
 
     return text 
 
    else: 
 
     return loop_until(text,first_elem.next.next) 
 
    except TypeError: 
 
     pass 
 
     
 
wordList = [] 
 

 
for strong_tag in soup.find_all('strong'): 
 
    next_elem = strong_tag.next_sibling 
 
    print loop_until("", next_elem)

现在我需要从该选择所有的字，并将其添加到单词表，我将如何去这样做？

来源

2017-05-04 Otis Cheng

改变你的最后一行，使用append

wordList.append(loop_until("", next_elem))

来源

2017-05-04 11:41:38

笑！我不知道为什么这并没有超出我的想法。谢谢！ –

从HTML文档中提取文本到单词列表中

回答

相关问题