2013-03-15 116 views
0

如何才能找到所有div和span标签的顺序保存。使用BeautifulSoup它非常简单:soup.findAll(name=['span', 'div']),但我最近切换到lxml,因为它比BeautifulSoup快得多。lxml findall div和span标签

回答

1
import lxml.html 
from lxml.cssselect import CSSSelector 
content = result.read() 
page_html = lxml.html.fromstring(content) 

elements = page_html.xpath('//*[self::div or self::span]') 

sd_selector = CSSSelector('span,div') 
elements = sd_selector(page_html) 
+0

谢谢你,这并在trick.Which方法更快?我假设第一个。 – vericule 2013-03-15 16:15:35

1
import lxml.html as LH 
content = '''\ 
<tr> 
<div>idend</div> 
<span>Green<\span> 
<tr> 
''' 
root = LH.fromstring(content) 
for tag in root.xpath('//*[self::div or self::span]'): 
    print(tag) 

产生

<Element div at 0xb751f23c> 
<Element span at 0xb751f11c>