BeautifulSoup为了

考虑以下情况：BeautifulSoup为了

tag1 = soup.find(**data_attrs) 
tag2 = soup.find(**delim_attrs)

有没有办法找出哪些标签在页面发生“第一”？

澄清：

对于我而言，顺序是一样的beautifulsoup的FindNext方法的。（我目前正在使用这个事实来“解决”我的问题，虽然它很乱。）
这里的目的基本上是累积不以“分隔符标记”分隔的标记。也许有更好的方法来做到这一点？

来源

2014-12-28 Khodeir

BeautifulSoup标签不追踪他们在页面中的顺序，没有。您必须再次循环所有标签并在列表中找到您的两个标签。

使用标准sample BeautifulSoup tree：

>>> tag1 = soup.find(id='link1') 
>>> tag2 = soup.find(id='link2') 
>>> tag1, tag2 
(<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>, <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>) 
>>> all_tags = soup.find_all(True) 
>>> all_tags.index(tag1) 
6 
>>> all_tags.index(tag2) 
7

我会使用一个tag.find_all()与功能相匹配两种标签类型，而不是;这样，你得到的标签的列表，可以看到它们的相对顺序：

tag_match = lambda el: (
    getattr(el, 'name', None) in ('tagname1', 'tagname2') and 
    el.attrs.get('attributename') == 'something' and 
    'classname' in el.attrs.get('class') 
) 
tags = soup.find(tag_match)

，或者你可以在同一个父使用.next_siblings迭代器遍历所有元素，看看分隔符随之而来的，等

来源

2014-12-28 11:23:26

BeautifulSoup为了

回答

相关问题