python
  • web-scraping
  • beautifulsoup
  • python-requests
  • 2016-03-13 56 views 1 likes 
    1

    我想从使用BeautifulSoup的网站中提取公司名称和地址等数据的摘录。然而,我得到以下失败:用BeautifulSoup刮擦:物体没有属性

    Calgary's Notary Public 
    Traceback (most recent call last): 
        File "test.py", line 16, in <module> 
        print item.find_all(class_='jsMapBubbleAddress').text 
    AttributeError: 'ResultSet' object has no attribute 'text' 
    

    HTML代码片段在这里。我想提取所有文本信息并转换为CSV文件。请任何人帮助我。

    <div class="listing__right article hasIcon"> 
        <h3 class="listing__name jsMapBubbleName" itemprop="name"><a data-analytics='{"lk_listing_id":"100971374","lk_non-ad-rollup":"0","lk_page_num":"1","lk_pos":"in_listing","lk_proximity":"14.5","lk_directory_heading":[{"085100":[{"00910600":"1"},{"00911000":"1"}]}],"lk_geo_tier":"in","lk_area":"left_1","lk_relevancy":"1","lk_name":"busname","lk_pos_num":"1","lk_se_id":"e292d1d2-f130-463d-8f0c-7dd66800dead_Tm90YXJ5_Q2FsZ2FyeSwgQUI_56","lk_ev":"link","lk_product":"l2"}' href="/bus/Alberta/Calgary/Calgary-s-Notary-Public/100971374.html?what=Notary&amp;where=Calgary%2C+AB&amp;useContext=true" title="See detailed information for Calgary's Notary Public">Calgary's Notary Public</a> </h3> 
        <div class="listing__address address mainLocal"> 
         <em class="itemCounter">1</em> 
         <span class="listing__address--full" itemprop="address" itemscope="" itemtype="http://schema.org/PostalAddress"> 
         <span class="jsMapBubbleAddress" itemprop="streetAddress">340-600 Crowfoot Cres NW</span>, <span class="jsMapBubbleAddress" itemprop="addressLocality">Calgary</span>, <span class="jsMapBubbleAddress" itemprop="addressRegion">AB</span> <span class="jsMapBubbleAddress" itemprop="postalCode">T3G 0B4</span></span> 
         <a class="listing__direction" data-analytics='{"lk_listing_id":"100971374","lk_non-ad-rollup":"0","lk_page_num":"1","lk_pos":"in_listing","lk_proximity":"14.5","lk_directory_heading":[{"085100":[{"00910600":"1"},{"00911000":"1"}]}],"lk_geo_tier":"in","lk_area":"left_1a","lk_relevancy":"1","lk_name":"directions-step1","lk_pos_num":"1","lk_se_id":"e292d1d2-f130-463d-8f0c-7dd66800dead_Tm90YXJ5_Q2FsZ2FyeSwgQUI_56","lk_ev":"link","lk_product":"l2"}' href="/merchant/directions/100971374?what=Notary&amp;where=Calgary%2C+AB&amp;useContext=true" rel="nofollow" title="Get direction to Calgary's Notary Public">Get directions »</a> 
        </div> 
        <div class="listing__details"> 
         <p class="listing__details__teaser" itemprop="description">We offer you a convenient, quick and affordable solution for your Notary Public or Commissioner for Oaths in Calgary needs.</p> 
        </div> 
        <div class="listing__ratings--root"> 
         <div class="listing__ratings ratingWarp" itemprop="aggregateRating" itemscope="" itemtype="http://schema.org/AggregateRating"> 
         <meta content="5" itemprop="ratingValue"/> 
         <meta content="1" itemprop="ratingCount"/> 
         <span class="ypStars" data-analytics-group="stars" data-clicksent="false" data-rating="rating5" title="Ratings: 5 out of 5 stars"> 
         <span class="star1" data-analytics-name="stars" data-label="Optional : Why did you hate it?" title="I hated it"></span> 
         <span class="star2" data-analytics-name="stars" data-label="Optional : Why didn't you like it?" title="I didn't like it"></span> 
         <span class="star3" data-analytics-name="stars" data-label="Optional : Why did you like it?" title="I liked it"></span> 
         <span class="star4" data-analytics-name="stars" data-label="Optional : Why did you really like it?" title="I really liked it"></span> 
         <span class="star5" data-analytics-name="stars" data-label="Optional : Why did you love it?" title="I loved it"></span> 
         </span><a class="listing__ratings__count" data-analytics='{"lk_listing_id":"100971374","lk_non-ad-rollup":"0","lk_page_num":"1","lk_pos":"in_listing","lk_proximity":"14.5","lk_directory_heading":[{"085100":[{"00910600":"1"},{"00911000":"1"}]}],"lk_geo_tier":"in","lk_area":"left_1","lk_relevancy":"1","lk_name":"read_yp_reviews","lk_pos_num":"1","lk_se_id":"e292d1d2-f130-463d-8f0c-7dd66800dead_Tm90YXJ5_Q2FsZ2FyeSwgQUI_56","lk_ev":"link","lk_product":"l2"}' href="/bus/Alberta/Calgary/Calgary-s-Notary-Public/100971374.html?what=Notary&amp;where=Calgary%2C+AB&amp;useContext=true#ypgReviewsHeader" rel="nofollow" title="1 of Review for Calgary's Notary Public">1<span class="hidden-phone"> YP review</span></a> 
         </div> 
        </div> 
        <div class="listing__details detailsWrap"> 
         <ul> 
         <li><a href="/search/si/1/Notaries/Calgary%2C+AB" title="Notaries">Notaries</a> 
          , 
         </li> 
         <li><a href="/search/si/1/Notaries+Public/Calgary%2C+AB" title="Notaries Public">Notaries Public</a></li> 
         </ul> 
        </div> 
    </div> 
    

    有许多div s的listing__right article hasIcon。我正在使用for循环来提取信息。

    我到目前为止写的python代码是。

    import requests 
    from bs4 import BeautifulSoup 
    
    url = 'http://www.yellowpages.ca/search/si-rat/1/Notary/Calgary%2C+AB' 
    response = requests.get(url) 
    content = response.content 
    
    soup = BeautifulSoup(content) 
    g_data=soup.find_all('div', attrs={'class': 'listing__right article hasIcon'}) 
    
    for item in g_data: 
        print item.find('h3').text 
        #print item.contents[2].find_all('em', attrs={'class': 'itemCounter'})[1].text 
        print item.find_all(class_='jsMapBubbleAddress').text 
    
    +0

    'find_all '返回一个列表,Python中的列表没有'text'属性或属性。尝试遍历代码最后一行返回的列表。 – MrPyCharm

    +0

    我只想要第一个匹配元素 –

    +0

    print item.find_all(class _ ='jsMapBubbleAddress')[0] .text –

    回答

    1

    find_all返回它没有“文本”属性,使你得到一个错误的列表,不知道什么输出你要找的,但是这个代码似乎确定工作:

    import requests 
    from bs4 import BeautifulSoup 
    
    url = 'http://www.yellowpages.ca/search/si-rat/1/Notary/Calgary%2C+AB' 
    response = requests.get(url) 
    content = response.content 
    
    soup = BeautifulSoup(content,"lxml") 
    g_data=soup.find_all('div', attrs={'class': 'listing__right article hasIcon'}) 
    
    for item in g_data: 
        print item.find('h3').text 
        #print item.contents[2].find_all('em', attrs={'class': 'itemCounter'})[1].text 
        items = item.find_all(class_='jsMapBubbleAddress') 
        for item in items: 
         print item.text 
    
    相关问题