.contents和.children之间的差异

我读过.contents返回标签的直接子元素，如果我们想要迭代这些子元素，我们应该使用.children。但我已经尝试了他们，并得到了相同的输出。.contents和.children之间的差异

html_doc = """ 
<html><head><title>The Dormouse's story</title></head> 
<body> 
<p class="title"><b>The Dormouse's story</b></p> 

<p class="story">Once upon a time there were three little sisters; and their names were 
<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>, 
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and 
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>; 
and they lived at the bottom of a well.</p> 

<p class="story">...</p></body></html> 
""" 
soup = BeautifulSoup(html_doc, "html.parser") 
title_tag = soup.title 

for child in title_tag.children: 
    print(child) 
for child in title_tag.contents: 
    print(child)

来源

2017-04-05 Hamza

我得到了'NameError：name'title_tag'未定义'。如何使这个工作的例子？ – tdelaney

对不起。好，完成！ – Hamza

该文档比这更微妙一点。它说

Instead of getting them as a list, you can iterate over a tag’s children using the .children generator

但是你可以列出直接在迭代for循环，你可以通过调用iter()获得一个迭代器，因此它似乎kindof毫无意义，甚至有.children财产。仔细看，这里是如何实现children。

#Generator methods 
@property 
def children(self): 
    # return iter() to make the purpose of the method clear 
    return iter(self.contents) # XXX This seems to be untested.

是的，这是完全没有意义的。这两段代码是相同的，只不过for child in title_tag.contents获得列表的迭代器，而for child in title_tag.children使用迭代器。

来源

2017-04-05 15:38:28 tdelaney

考虑到你在谈论BeautifulSoup（你应该给我们一些背景内容！）...

至于说here，主要的区别在于.contents你会得到一个列表，而与.children你会得到一个发电机。

它似乎没有任何区别，因为您可以迭代它们两个，但是当您使用大量数据时，应该始终更喜欢使用生成器来节省计算机的内存。

想象一下：你有一个10K的文本文件，你需要在每一行工作。当使用一个列表（例如：with open('t.txt') as f: lines = f.readlines()）时，你会用一些你不会马上工作的东西来填充你的大部分内存，只是在那里花费空间（更不用说依靠你的环境，你可能没有内存不够......）在使用发电机的时候，你会根据需要得到一条线，但是没有内存消耗......

来源

2017-04-05 15:39:26

.contents和.children之间的差异

回答

相关问题