我可以将两个'findAll'搜索块合并成一个吗？

编辑：除了像Yacoby合并循环以外的其他方法。

for tag in soup.findAll(['script', 'form']): 
    tag.extract() 

for tag in soup.findAll(id="footer"): 
    tag.extract()

也可以予多个块到一个：

for tag in soup.findAll(id="footer"): 
    tag.extract() 

for tag in soup.findAll(id="content"): 
    tag.extract() 

for tag in soup.findAll(id="links"): 
    tag.extract()

，或者可以是有一些lambda表达式，我可以检查是否在阵列，或任何其它更简单的方法。

而且我怎么找到属性类的标签，如类保留关键字：

编辑：这部分是由soup.findAll（ATTRS = {：“NOPRINT”“类”}）：解决

for tag in soup.findAll(class="noprint"): 
    tag.extract()

来源

2009-12-01 Priyank Bolia

如果你只发布每个问题的一个问题，你会得到更好的结果 – hop 2009-12-01 10:03:26

你可以通过函数来.findall()这样的：

soup.findAll(lambda tag: tag.name in ['script', 'form'] or tag['id'] == "footer")

但你可能是首先建立的标签列表，然后遍历它更好：

tags = soup.findAll(['script', 'form']) 
tags.extend(soup.findAll(id="footer")) 

for tag in tags: 
    tag.extract()

如果你要筛选一些id S，你可以使用：

for tag in soup.findAll(lambda tag: tag.has_key('id') and 
            tag['id'] in ['footer', 'content', 'links']): 
    tag.extract()

更具体的方法是将一个lambda分配给id参数：

for tag in soup.findAll(id=lambda value: value in ['footer', 'content', 'links']): 
    tag.extract()

来源

2009-12-01 10:41:37 hop

我收到错误：SyntaxError：无效的语法 – 2009-12-01 10:47:08

SyntaxError？奇怪...你应该得到一个TypeError。 – hop 2009-12-01 10:51:27

在soup.findAll固定的类型错误 – hop 2009-12-01 11:03:08

我不知道是否能BeautifulSoup更优雅做到这一点，但你可以合并这两个循环，像这样：

for tag in soup.findAll(['script', 'form']) + soup.findAll(id="footer"): 
    tag.extract()

你可以找到像这样的类（Documentation）：

for tag in soup.findAll(attrs={'class': 'noprint'}): 
    tag.extract()

来源

2009-12-01 10:05:26 Yacoby

它的工作良好，但看起来并不干净结合长循环... + ... + ... + ... + .. 。+ ... + ... + ...还有其他更好的方法吗？ – 2009-12-01 10:33:30

回答你问题的第二部分是那里在documentation：

Searching by CSS class

The attrs argument would be a pretty obscure feature were it not for one thing: CSS. It's very useful to search for a tag that has a certain CSS class, but the name of the CSS attribute, class, is also a Python reserved word.

You could search by CSS class with soup.find("tagName", { "class" : "cssClass" }), but that's a lot of code for such a common operation. Instead, you can pass a string for attrs instead of a dictionary. The string will be used to restrict the CSS class.
from BeautifulSoup import BeautifulSoup 
soup = BeautifulSoup("""Bob's Bold Barbeque Sauce now available in 
 Hickory and Lime</a>""") 

soup.find("b", { "class" : "lime" }) 
# Lime 

soup.find("b", "hickory") 
# Hickory

来源

2009-12-01 10:09:45 hop

links = soup.find_all('a',class_='external') ,we can pass class_ to filter based on class values 

from bs4 import BeautifulSoup 
from urllib.request import urlopen 

with urlopen('http://www.espncricinfo.com/') as f: 
    raw_data= f.read() 
    soup= BeautifulSoup(raw_data,'lxml') 
    # print(soup) 
    links = soup.find_all('a',class_='external') 
    for link in links: 
     print(link)

来源

2018-01-26 17:53:42

我可以将两个'findAll'搜索块合并成一个吗？

回答

相关问题