2012-04-20 112 views
0

我试图用Mechanize Gem解析一个网站。到目前为止,这是我:用Ruby解析机械化

page = agent.get("http://www.greatgiftsformen.com/price-range-under-c-131_142.html?page=all") 
page.parser.xpath('//tr[(((count(preceding-sibling::*) + 1) = 2) and parent::*)]//*[contains(concat(" ", @class, " "), concat(" ", "productListing-data", " "))]')[5] 

,我得到了这个产品的要素回:

=> #<Nokogiri::XML::Element:0x81c175ec name="td" attributes=[#<Nokogiri::XML::Attr:0x81c17d58 name="valign" value="top">, #<Nokogiri::XML::Attr:0x81c17eac name="align" value="center">, #<Nokogiri::XML::Attr:0x81c17ec0 name="class" value="productListing-data">] children=[#<Nokogiri::XML::Element:0x805fa174 name="a" attributes=[#<Nokogiri::XML::Attr:0x81c13794 name="href" value="http://www.greatgiftsformen.com/gas-pump-retro-liquor-dispenser-p-249.html?osCsid=05f5dbb816874ece6db883c2c48d7ae1">] children=[#<Nokogiri::XML::Element:0x8068e270 name="img" attributes=[#<Nokogiri::XML::Attr:0x81c115ac name="src" value="product_thumb.php?img=images/prod/liquordisp-gas.jpg&w=160&h=160">, #<Nokogiri::XML::Attr:0x81c115c0 name="width" value="160">, #<Nokogiri::XML::Attr:0x81c115d4 name="height" value="160">, #<Nokogiri::XML::Attr:0x81c11714 name="border" value="0">, #<Nokogiri::XML::Attr:0x81c11728 name="alt" value="Gas Pump Retro Liquor Dispenser">, #<Nokogiri::XML::Attr:0x81c11750 name="title" value="Gas Pump Retro Liquor Dispenser">, #<Nokogiri::XML::Attr:0x81c11764 name="class" value="fotgal">]>]>]> 

但是当我试图让HREF,我回去为零:

url = item.attributes['href'] 
=> nil 

回答

1

需要添加子节点:

url = item.children[0].attributes['href'].to_s