2015-04-16 104 views
-1

HTML的部分看起来像这样的块,如何抓取一个网站

<div id="block-hubs3d-hub-hub-specialties" class="block block-hubs3d-hub first odd"> 
     <h3 class="block-title">Specialties</h3> 

<div class="field field-name-field-hub-specialties field-type-taxonomy-term-reference field-label-hidden"> 
    <div class="field-items"> 
      <div class="field-item item-1 even">ABS+PLA+Nylon+Flexible</div> 
      <div class="field-item item-2 odd">Custom Finishing</div> 
      <div class="field-item item-3 even">DLP - SLA Technology</div> 
      <div class="field-item item-4 odd">Makerjuice G+</div> 
     </div> 
</div> 

如何得到它的格式,例如:

specialties: ABS+PLA+Nylon+Flexible, Custom Finishing, DLP - SLA Technology, DLP - SLA Technology 

到目前为止,我只知道使用BS4把所有的文字:

response = requests.get('https://www.3dhubs.com/new-york/hubs/peerless') 
soup = bs4.BeautifulSoup(response.text) 
+0

阅读该文档http://www.crummy.com/software/BeautifulSoup/bs4/doc/ – taesu

回答

2

class找到div S:

import bs4 

h = """ 
<div id="block-hubs3d-hub-hub-specialties" class="block block-hubs3d-hub first odd"> 
     <h3 class="block-title">Specialties</h3> 

<div class="field field-name-field-hub-specialties field-type-taxonomy-term-reference field-label-hidden"> 
    <div class="field-items"> 
      <div class="field-item item-1 even">ABS+PLA+Nylon+Flexible</div> 
      <div class="field-item item-2 odd">Custom Finishing</div> 
      <div class="field-item item-3 even">DLP - SLA Technology</div> 
      <div class="field-item item-4 odd">Makerjuice G+</div> 
     </div> 
</div> 
""" 

b = bs4.BeautifulSoup(h) 

specialties = [div.text for div in b.findAll("div", {"class":"field-item"})] 
print(", ".join(b)) 

输出:

ABS+PLA+Nylon+Flexible, Custom Finishing, DLP - SLA Technology, Makerjuice G+