在Python中涉及urllib2和BeautifulSoup的这个函数是什么？

所以我之前问过一个关于从html页面获取高分的问题，另一个用户给了我下面的代码来帮助。我是python和beautifulsoup的新手，所以我正在尝试通过其他一些代码一块一块地去做。据我所知大部分，但我不明白这是什么一段代码是什么，它的功能是：在Python中涉及urllib2和BeautifulSoup的这个函数是什么？

def parse_string(el): 
     text = ''.join(el.findAll(text=True)) 
     return text.strip()

这里是整个代码：在一个元素内部

from urllib2 import urlopen 
from BeautifulSoup import BeautifulSoup 
import sys 

URL = "http://hiscore.runescape.com/hiscorepersonal.ws?user1=" + sys.argv[1] 

# Grab page html, create BeatifulSoup object 
html = urlopen(URL).read() 
soup = BeautifulSoup(html) 

# Grab the <table id="mini_player"> element 
scores = soup.find('table', {'id':'mini_player'}) 

# Get a list of all the <tr>s in the table, skip the header row 
rows = scores.findAll('tr')[1:] 

# Helper function to return concatenation of all character data in an element 
def parse_string(el): 
    text = ''.join(el.findAll(text=True)) 
    return text.strip() 

for row in rows: 

    # Get all the text from the <td>s 
    data = map(parse_string, row.findAll('td')) 

    # Skip the first td, which is an image 
    data = data[1:] 

    # Do something with the data... 
    print data

来源

2009-06-14 Alex

el.findAll(text=True)返回所有文字及其子元素。通过文本我的意思是一切都不在标签内;所以在<b>hello</b>然后“你好”将是文字，但<b>和</b>不会。

因此，该函数将给定元素下方的所有文本连接在一起，并从正面和背面剥离空白。

这里给findAll文档的链接：http://www.crummy.com/software/BeautifulSoup/documentation.html#arg-text

来源

2009-06-14 02:13:37

使用反引号的HTML。 :) – 2009-06-14 02:18:33

在Python中涉及urllib2和BeautifulSoup的这个函数是什么？

回答

相关问题