2
我试图抓取https://www.wellstar.org/locations/pages/default.aspx的位置数据,当我查看源代码时,我注意到医院地址的类有时拼写有额外的'd' - 'adddress'和'address' 。有没有办法来解决以下代码中的这种差异?我试图加入一个if
语句来测试address
对象的长度,但我只能得到与'adddress'类关联的地址。我觉得我很接近但没有想法。BeautifulSoup - 拼错类
import urllib
import urllib.request
from bs4 import BeautifulSoup
import re
def make_soup(url):
thepage = urllib.request.urlopen(url)
soupdata = BeautifulSoup(thepage,"html.parser")
return soupdata
soup = make_soup("https://www.wellstar.org/locations/pages/default.aspx")
for table in soup.findAll("table",class_="s4-wpTopTable"):
for type in table.findAll("h3"):
type = type.get_text()
for name in table.findAll("div",class_="PurpleBackgroundHeading"):
name = name.get_text()
address=""
for address in table.findAll("div",class_="WS_Location_Adddress"):
address = address.get_text(separator=" ")
if len(address)==0:
for address in table.findAll("div",class_="WS_Location_Address"):
address = address.get_text(separator = " ")
print(type, name, address)
两个很好的选择 - 我很好奇/正则表达式吓倒,是诚实的。这可能是花点时间学习操作员的理由。 – Daniel