BeautifulSoup - 拼错类

我试图抓取https://www.wellstar.org/locations/pages/default.aspx的位置数据，当我查看源代码时，我注意到医院地址的类有时拼写有额外的'd' - 'adddress'和'address' 。有没有办法来解决以下代码中的这种差异？我试图加入一个if语句来测试address对象的长度，但我只能得到与'adddress'类关联的地址。我觉得我很接近但没有想法。BeautifulSoup - 拼错类

import urllib 
import urllib.request 
from bs4 import BeautifulSoup 
import re 

def make_soup(url): 
    thepage = urllib.request.urlopen(url) 
    soupdata = BeautifulSoup(thepage,"html.parser") 
    return soupdata 

soup = make_soup("https://www.wellstar.org/locations/pages/default.aspx") 

for table in soup.findAll("table",class_="s4-wpTopTable"): 
    for type in table.findAll("h3"): 
     type = type.get_text() 
    for name in table.findAll("div",class_="PurpleBackgroundHeading"): 
     name = name.get_text() 
    address="" 
    for address in table.findAll("div",class_="WS_Location_Adddress"): 
      address = address.get_text(separator=" ") 
    if len(address)==0: 
     for address in table.findAll("div",class_="WS_Location_Address"): 
      address = address.get_text(separator = " ") 
      print(type, name, address)

来源

2016-09-30 Daniel

BeautifulSoup为适应大，你可以使用正则表达式：

for address in table.find_all("div", class_=re.compile(r"WS_Location_Ad{2,}ress")):

其中d{2,}将匹配d 2倍以上。

或者，你可以指定一个类的列表：

for address in table.find_all("div", class_=["WS_Location_Address", "WS_Location_Adddress"]):

来源

2016-09-30 19:27:42 alecxe

两个很好的选择 - 我很好奇/正则表达式吓倒，是诚实的。这可能是花点时间学习操作员的理由。 – Daniel

BeautifulSoup - 拼错类

回答

相关问题