2017-07-26 77 views
-1

试图从real-estate agent page抽取数据项目网页抓取 - 无法打印使用Python和BeautifulSoup

我能得到所有两个名称和职位描述的电话号码,但只有少数的电话号码。

这是我的代码:

from urllib.request import urlopen as uReq 
from bs4 import BeautifulSoup as soup 

my_url = 'https://www.raywhite.com/contact/?type=People&target=people&suburb=Sydney%2C+NSW+2000&radius=5&firstname=&lastname=&_so=people' 

# opening connection 
uClient = uReq(my_url) 
page_html = uClient.read() 
uClient.close() 

    page_soup = soup(page_html, "html.parser") 

containers = page_soup.findAll("div",{"class":"card horizontal-split vcard"}) 

for container in containers: 
    agent_name = container.findAll("li", {"class":"agent-name"}) 
    name = agent_name[0].text 

    agent_role = container.findAll("li", {"class":"agent-role"}) 
    role = agent_role[0].text 

    phone = container.find("a").text 

    print("name: " + name) 
    print("role: " + role) 
    print("phone: " + phone) 

这是印的第一对夫妇的样品,只有前两种药物有其电话号码列:

name: Mark Constantine 
role: Principal 
phone: 0418 222 643 
name: Dawn Veloskey 
role: Operations Manager 
phone: 0418 449 600 
name: Yvonne Lau 
role: Sales 
phone: 

name: Anthony Cavallaro 
role: Managing Director | Selling Principal 
phone: 

name: Ciara OConnor 
role: Sales Executive 
phone: 

name: Michael Buium 
role: Commercial Sales Manager and Auctioneer 
phone: 

name: Albert Hui 
role: Senior Commercial Property Manager 
phone: 

name: Jessie Yee 
role: Associate Director, Commercial Leasing & Management 
phone: 

不知道为什么其他电话号码未被打印,任何建议都非常感谢。

+1

问题寻求帮助调试(“为什么不是这个代码的工作?”)必须包括所期望的行为,一个特定的问题或错误和在问题本身中重现它所需的最短代码。没有明确问题陈述的问题对其他读者无益。请参阅:[如何创建最小,完整和可验证示例](https://stackoverflow.com/help/mcve)。 –

回答

3

这是因为前两个没有照片,否则照片是第一个“a”标签。

取代:

phone = container.find("a").text 

有:

filterfn = lambda x: 'href' in x.attrs and x['href'].startswith("tel") 
phones = map(lambda x: x.text,filter(filterfn,container.findAll("a"))) 

for phone in phones: 
    print("phone number: " + phone) 
+0

谢谢@XingzhouLiu,但是当我尝试运行它时,出现此错误:名称:Mark Constantine 角色:校长 回溯(最近调用最后一次): 文件“C:\ Users \ Toby \ Desktop \ Webscrape \ scraped - mark4.py“,第28行,在 print(”phone:“+ phone) TypeError:必须是str,而不是列表 – Oren

+0

我保留手机列表b/c某些有多个电话号码。您也可以使用打印(“手机:”+ repr(手机))或查看上面的编辑。 –