2016-12-02 119 views
0

我在Python中使用beautifulsoup,要得到所有链接:如何从DOM中的页面获取所有链接?

links = soup.select('.cover > .card-click-target') 
     print(links); 

但它给了我一个元素和字符串值的数组。

我的HTML代码:

<div class="cover"> 
    <div class="cover-image-container"> 
    <div class="cover-outer-align"> 
     <div class="cover-inner-align"> 
     <img alt="Kate Mobile Lite" class="cover-image" data-cover-large="" data-cover-small="" src="" aria-hidden="true"> 
     </div> 
    </div> 
    </div> 
    <a class="card-click-target" href="/s/kate_new_6" aria-label=" Kate Mobile Lite  "> 
    <span class="movies preordered-overlay-container id-preordered-overlay-container" style="display:none"> 
     <span class="preordered-label">Предзаказ</span> 
    </span> 
    <span class="preview-overlay-container"> </span> 
    </a> 
</div> 

<div class="cover"> 
    <div class="cover-image-container"> 
    <div class="cover-outer-align"> 
     <div class="cover-inner-align"> 
     <img alt="Kate Mobile Lite" class="cover-image" data-cover-large="" data-cover-small="" src="" aria-hidden="true"> 
     </div> 
    </div> 
    </div> 
    <a class="card-click-target" href="/s/kate_new_6" aria-label=" Kate Mobile Lite  "> 
    <span class="movies preordered-overlay-container id-preordered-overlay-container" style="display:none"> 
     <span class="preordered-label">Предзаказ</span> 
    </span> 
    <span class="preview-overlay-container"> 
    </span> 
    </a> 
</div> 
+1

不看真的很难帮助页面的实际来源,但如果你正在寻找链接(这是'a't ags),你应该使用'find_all('a')'。 – Dekel

+0

再次请看问题,我作了更改 – MisterPi

+0

我没有看到任何更改 – Dekel

回答

1
link_tags = soup.find_all('a', class_="card-click-target") 
links = [i.get('href') for i in link_tags] 

出来:

['/s/kate_new_6', '/s/kate_new_6'] 

选择版本:

link_tags = soup.select('.cover .card-click-target') 
links =[i.get('href') for i in link_tags] 
+0

谢谢,但是如何设置父目录?'.cover> card-click-target' – MisterPi

1

我不会完全相信CSS选择器BeautifulSoup,只是一个快速的搜索,你会发现this answer here谈到更新BeautifulSoup固定他的问题。

我会强烈建议您write a function做的工作

links = soup.find_all(lambda tag: tag.parent.get('class', None) == ['cover'] \ 
         and tag.get('class', None) == ['card-click-target']) 

匿名lambda函数将搜索类的card-click-target所有标签,并且确保这些标签有一个父带班的cover

0

检查这个例子:

>>> s = """ <div class="cover"> 
     <div class="cover-image-container"> 
     <div class="cover-outer-align"> 
      <div class="cover-inner-align"> 
      <img alt="Kate Mobile Lite" class="cover-image" data-cover-large="" data-cover-small="" src="" aria-hidden="true"> 
      </div> 
     </div> 
     </div> 
     <a class="card-click-target" href="/s/kate_new_6" aria-label=" Kate Mobile Lite  "> 
     <span class="movies preordered-overlay-container id-preordered-overlay-container" style="display:none"> 
      <span class="preordered-label">Предзаказ</span> 
     </span> 
     <span class="preview-overlay-container"> </span> 
     </a> 
    </div> 

    <div class="cover"> 
     <div class="cover-image-container"> 
     <div class="cover-outer-align"> 
      <div class="cover-inner-align"> 
      <img alt="Kate Mobile Lite" class="cover-image" data-cover-large="" data-cover-small="" src="" aria-hidden="true"> 
      </div> 
     </div> 
     </div> 
     <a class="card-click-target" href="/s/kate_new_6" aria-label=" Kate Mobile Lite  "> 
     <span class="movies preordered-overlay-container id-preordered-overlay-container" style="display:none"> 
      <span class="preordered-label">Предзаказ</span> 
     </span> 
     <span class="preview-overlay-container"> 
     </span> 
     </a> 
    </div>""" 
>>> sp = BeautifulSoup(s) 
>>> sp.select(".cover > a.card-click-target") 
[<a aria-label=" Kate Mobile Lite  " class="card-click-target" href="/s/kate_new_6"> 
<span class="movies preordered-overlay-container id-preordered-overlay-container" style="display:none"> 
<span class="preordered-label">?????????</span> 
</span> 
<span class="preview-overlay-container"> </span> 
</a>, 
<a aria-label=" Kate Mobile Lite  " class="card-click-target" href="/s/kate_new_6"> 
<span class="movies preordered-overlay-container id-preordered-overlay-container" style="display:none"> 
<span class="preordered-label">?????????</span> 
</span> 
<span class="preview-overlay-container"> 
</span> 
</a>] 

>>> len(sp.select(".cover > a.card-click-target")) 
2 
+0

我仍然得到零,'len (sp.select(“。cover> a.card-click-target”))' – MisterPi

+0

在这个**完全**完整代码中?或者您只使用** ** len(sp'部分? – Dekel

+0

是,我得到页面的完整的HTML代码,并在使用规则 – MisterPi

相关问题