在BeautifulSoup中提取多个Span范围内的内容

我试图从多个span标签中提取字符串内容。 HTML页面的快照是：在BeautifulSoup中提取多个Span范围内的内容

<div class="secondary-attributes"> 
    <span class="neighborhood-str-list"> 
     Southeast 
    </span> 
    <address> 
     1234 Python Blvd S<br>Somewhere, NV 98765 
    </address> 
    <span class="biz-phone"> 
     (555) 123-4567 
    </span> 
</div>

具体来说，我想提取的电话号码，坐落在<span class="biz-phone></span>标记之间。我尝试用下面的代码可以这样做：

import requests 
from bs4 import BeautifulSoup 

res = requests.get(url) 
soup = BeautifulSoup(res.text, "html.parser") 

phone_number_results = [phone_numbers for phone_numbers in soup.find_all('span','biz-phone')]

没有任何语法错误编译的代码，但它并没有完全给我的结果，我希望：

['<span class="biz-phone">\n  (702) 476-5050\n </span>', '<span class="biz-phone">\n  (702) 253-7296\n </span>', '< 
span class="biz-phone">\n  (702) 385-7912\n </span>', '<span class="biz-phone">\n  (702) 776-7061\n </span>', '<spa 
n class="biz-phone">\n  (702) 221-7296\n </span>', '<span class="biz-phone">\n  (702) 252-7296\n </span>', '<span c 
lass="biz-phone">\n  (702) 659-9101\n </span>', '<span class="biz-phone">\n  (702) 355-9445\n </span>', '<span clas 
s="biz-phone">\n  (702) 396-3333\n </span>', '<span class="biz-phone">\n  (702) 643-9851\n </span>', '<span class=" 

biz-phone">\n  (702) 222-1441\n </span>']

我的问题两部分：

为什么运行程序时会出现span标签？
我该如何摆脱它们？我可以做字符串编辑，但我觉得我不会充分利用BeautifulSoup包。有没有更优雅的方式？

注意：有更多的HTML代码片段，就像上面显示的整个页面一样;需要提取的<span class="biz-phone"> (555) 123-4567 </span>代码（即更多电话号码）的实例更多，因此我在考虑使用find_all()。

预先感谢您。

来源

2016-10-30 daOnlyBG

使用'phone_numbers.text'或甚至'phone_numbers.text.strip（）' – furas

谢谢@furas，这就是诀窍！ – daOnlyBG

find_all()返回的标签（bs4.element.Tag），而不是字符串列表。
由于@furas指出，要访问的每个标签的text属性提取标签中的文字：

phone_number_results = [phone_numbers.text.strip() for phone_numbers in soup.find_all('span', 'biz-phone')]

（你也不妨打个电话strip()）

来源

2016-10-30 20:53:58 dmcc

谢谢，'.text'诀窍！我不知道那个属性 - 我尝试了其他几个（即'.contents'），但这似乎没有帮助。虽然你的解决方案工作。 – daOnlyBG

在BeautifulSoup中提取多个Span范围内的内容

回答

相关问题