2017-06-07 214 views
0

没有回来时,我有以下字符串我试图提取:标签内容beautifulsoup

<item> 
<dc:creator><![CDATA[Chris M]]></dc:creator> 
<pubDate>Tue, 06 Jun 2017 07:38:23 +0000</pubDate> 
</item> 

我试图获取名称克里斯M和其他作者用这样的:

soup = BeautifulSoup(response, "lxml") 
items = soup.findAll("item") 
      for i in items: 
       author = i.find('dc:creator') 
       print author 

此输出:

<dc:creator></dc:creator> 

如何从标签中获取名称内容?

+0

您是否试过'creator'而不是'dc:creator'? – codekaizer

+0

@codekaizer是的,它不会返回任何东西 – Atma

回答

0

这个工作我使用Python 3 - https://repl.it/languages/python3

指定解析器xml

import bs4 as bs 
content=""" 
<collection> 
    <item><dc:creator><![CDATA[Chris M]]></dc:creator></item> 
    <item><dc:creator><![CDATA[Harris A]]></dc:creator></item> 
</collection> 
""" 

soup = bs.BeautifulSoup(content, 'xml') 

items = soup.findAll("item") 
for i in items: 
    author = i.find('creator') 
    print(author.string) 

输出:

Chris M 
Harris A 
0

BeautifulSoup识别CData的一个子类,所以你可以把它检查它的实例。

>>> from bs4 import BeautifulSoup, CData 

>>> text = """<item> 
<dc:creator><![CDATA[Chris M]]></dc:creator> 
<pubDate>Tue, 06 Jun 2017 07:38:23 +0000</pubDate> 
</item>""" 
>>> soup = BeautifulSoup(text) 
>>> for item in soup.findAll(text=True): 
     if isinstance(item, CData): 
      print(item) 


Chris M 
相关问题