2013-05-28 93 views
2

我试图刮去http://expo.getbootstrap.com/如何获取锚标签的HREF属性?

网站的HTML是这样的:

<div class="col-span-4"> 
    <p> 
    <a class="thumbnail" target="_blank" href="https://www.getsentry.com/"> 
     <img src="/screenshots/sentry.jpg"> 
    </a> 
    </p> 
</div> 

我引入nokogiri代码:

url = "http://expo.getbootstrap.com/" 
doc = Nokogiri::HTML(open(url)) 
puts doc.css("title").text 
doc.css(".col-span-4").each do |site| 
    title=site.css("h4 a").text 
    href = site.css("a.thumbnail")[0]['href'] 
end 

的目标很简单,拿到href<img>标签的href和该网站的<title>,但它一直在报告:

undefined method [] for nil:NilClass 
在该行

href = site.css("a.thumbnail")[0]['href'] 

这真的快把我逼疯了,因为我在这里写下的代码在另一种情况是实际工作。

+0

请告诉我发生的事情,如果你尝试的site.css(“a.thumbnail “)['href']或site.css(”a.thumbnail“)['href'] [0]? – Bala

+0

它报告'不能将字符串转换为Integer'在这两种情况下 – cqcn1991

回答

1

你并不是说所有的.col-span-4 div都包含一个缩略图。这应该工作:

url = "http://expo.getbootstrap.com/" 
doc = Nokogiri::HTML(open(url)) 
puts doc.css("title").text 
doc.css(".col-span-4").each do |site| 
    title = site.css("h4 a").text 
    thumbnail = site.css("a.thumbnail") 
    next if thumbnail.empty? 
    href = thumbnail[0]['href'] 
end 
+0

'next如果thumbnail.empty ?; href = thumbnail [0] ['href']'可以更简洁地写成'href = thumbnail [0] ['href']除非thumbnail.empty?' –

+0

这个工程!其实我已经考虑过你在这里陈述的情况。但不要在上面写下来。我用'如果标题'来决定是否去href。但是,如果标题标题不够详细。 'if!title.empty?'是工作解决方案 – cqcn1991

2

我会做这样的事情:

require 'nokogiri' 
require 'open-uri' 
require 'pp' 

doc = Nokogiri::HTML(open('http://expo.getbootstrap.com/')) 

thumbnails = doc.search('a.thumbnail').map{ |thumbnail| 
    { 
    href: thumbnail['href'], 
    src: thumbnail.at('img')['src'], 
    title: thumbnail.parent.parent.at('h4 a').text 
    } 
} 

pp thumbnails 

其中,运行后有:

# => [ 
    { 
    :href => "https://www.getsentry.com/", 
    :src => "/screenshots/sentry.jpg", 
    :title => "Sentry" 
    }, 
    { 
    :href => "http://laravel.com", 
    :src => "/screenshots/laravel.jpg", 
    :title => "Laravel" 
    }, 
    { 
    :href => "http://gruntjs.com", 
    :src => "/screenshots/gruntjs.jpg", 
    :title => "Grunt" 
    }, 
    { 
    :href => "http://labs.bittorrent.com", 
    :src => "/screenshots/bittorrent-labs.jpg", 
    :title => "BitTorrent Labs" 
    }, 
    { 
    :href => "https://www.easybring.com/en", 
    :src => "/screenshots/easybring.jpg", 
    :title => "Easybring" 
    }, 
    { 
    :href => "http://developers.kippt.com/", 
    :src => "/screenshots/kippt-developers.jpg", 
    :title => "Kippt Developers" 
    }, 
    { 
    :href => "http://www.learndot.com/", 
    :src => "/screenshots/learndot.jpg", 
    :title => "Learndot" 
    }, 
    { 
    :href=>"http://getflywheel.com/", 
    :src=>"/screenshots/flywheel.jpg", 
    :title=>"Flywheel" 
} 
]