2016-10-25 48 views
1

无标签元素考虑以下html片段:获取Beautifulsoup

<div class="mapCopy"> 
    <b> 
     <a href="someurl.com"> 
      URL Text 
     </a> 
    </b> 
    <br/> 
     Address Line 1 
    <br/> 
     Address Line 2 
    <br/> 
     City, State, Zip 
    <p> 
     Phone: (123) 456-7890 
    <br/> 
     Fax: (123) 456-7890 
    </p> 
</div> 

我怎么可能只提取1地址线,地址线2,城市,州和邮编?我相信我应该能够迭代div并排除任何具有<b>标记的元素,但我不确定必要的语法。

回答

0

您可以提取不包含标签<div>的所有儿童:

>>> S = BeautifulSoup("<div...") 
>>> [child.strip() for child in S.find('div').children 
...  if "<" not in str(child) 
...  and len(child) > 1 
... ] 
['Address Line 1', 'Address Line 2', 'City, State, Zip']