Ruby机械化获取href属性值

我对Ruby相当新，但是我通过刮板混淆了我的方式。我正在使用机械化，目前看起来相当不错。虽然我现在已经有点抓住了一堆链接的href属性。Ruby机械化获取href属性值

我需要获取href属性，以便我可以打开每个页面并获取更多信息。

这可能吗？

下面是一个例子。

all_results.search("table.mcsResultsTable tr").each do |tablerow| 
    installer_link = tablerow.search("td:first-child a").href 
    puts installer_link + "\n"

来源

2013-09-27 David

下面是一个例子来帮助你，如何提取HREF atttribute：

require 'nokogiri' 

doc = Nokogiri::HTML.parse <<-eot 
<a name="html" href = "http://foo">HTML Tutorial</a><br> 
<a name="css" href = "http://fooz">CSS Tutorial</a><br> 
<a name="xml" href = "http://fiz">XML Tutorial</a><br> 
<a href="/js/">JavaScript Tutorial</a> 
eot 

doc.search("//a").class # => Nokogiri::XML::NodeSet 
doc.search("//a").each {|nd| puts nd['href'] } 
doc.search("//a").map(&:class) 
# => [Nokogiri::XML::Element, Nokogiri::XML::Element, Nokogiri::XML::Element, 
# Nokogiri::XML::Element]

输出：

http://foo 
http://fooz>CSS Tutorial</a><br> 
<a name= 
/js/

基本上doc.search("//a")会给你节点集，这只不过是Nokogiri::XML::Node（s）的一个集合。您可以使用方法Nokogiri::XML::Node#[]来获得属性va任何特定节点的lue。 Nokogiri将属性/值对保存为散列。看下面：

require 'nokogiri' 

doc = Nokogiri::HTML.parse <<-eot 
<a target="_blank" class="tryitbtn" href="tryit.asp?filename=try_methods">Try it yourself &raquo;</a> 
eot 

doc.at('a').keys 
# => ["target", "class", "href"] 
doc.at('a').values 
# => ["_blank", "tryitbtn", "tryit.asp?filename=try_methods"] 
doc.at('a')['target'] # => "_blank" 
doc.at('a')['class'] # => "tryitbtn"

来源

2013-09-27 10:01:47

我不希望每页//标签在页面上，虽然。 – David

尽管这确实奏效 - tablerow.search（“// a”）。each {| nd |放nd ['href']} – David

@David是的，这是诀窍，我想告诉你.. –

Ruby机械化获取href属性值

回答

相关问题