2013-08-31 63 views
1

我不能放弃该产品的价格和输出,我得到的是遵循对每一价格:如何使用Nokogiri刮取HTML?

<div class="pu-final"> 
    <span class="fk-font-17 fk-bold">Rs. 1999</span> 
</div> 

我的代码是:

require 'rubygems' 
require 'nokogiri' 
require 'open-uri' 

url = "http://www.flipkart.com/mens-footwear/shoes/casual-shoes/pr?sid=osp,cil,nit,e1f" 
doc = Nokogiri::HTML(open(url)) 
puts doc.at_css("title").text 
doc.css(".gu4,.browse-product").each do |item| 
    title = item.at_css(".fk-display-block,.title").text 
    puts title 
    puts "=================" 
    price = item.at_css(".pu-final") 
    puts price 
end 
+0

你为什么要关闭帖子?我正在写一个答案。 –

回答

2

我尝试相同的代码无线一个小小的改变,它运作良好。搏一搏。

变化

price = item.at_css(".pu-final") 

price = item.at_css(".pu-final").text unless item.at_css(".pu-final").nil? 
+0

它工作得很好,并按要求。谢谢 – shamshul2007

0

您可以如下操作:

require 'nokogiri' 

doc = Nokogiri::HTML::Document.parse <<-eotl 
<div class="pu-final"> 

        <span class="fk-font-17 fk-bold">Rs. 1999</span> 
</div> 
eotl 

doc.at_css('div.pu-final > span.fk-font-17.fk-bold').class 
# => Nokogiri::XML::Element 
doc.at_css('div.pu-final > span.fk-font-17.fk-bold').text 
# => "Rs. 1999" 

doc.at_css('div.pu-final')会给你Nokogiri::XML::Node。然后你必须使用Nokogiri::XML::Node#text()来获取元素内的文本值。

使用XPath

doc.xpath("normalize-space(//div[contains(@class,'pu-final')]/span[contains(@class,'fk-font-17')])") 
# => "Rs. 1999" 

的完整代码

require 'nokogiri' 
require 'open-uri' 

url = "http://www.flipkart.com/mens-footwear/shoes/casual-shoes/pr?sid=osp,cil,nit,e1f" 
doc = Nokogiri::HTML(open(url)) 

doc.css("div.pu-details.lastUnit").each do |dv| 
    product_name = dv.at_css('div.pu-title a').text.strip 
    product_price = dv.xpath("normalize-space(.//div[contains(@class,'pu-final')]/span)").to_s 
    print product_name," <-----> ",product_price,"\n" 
end 

输出

Fila Storm Zender Sneakers <-----> Rs. 1819 
Puma Future Cat M1 Big 102 O Sneakers <-----> Rs. 3849 
Fila Filamotor V4 Sneakers <-----> Rs. 1449 
Adidas Volantis Hiking Shoes <-----> Rs. 2999 
Fila Varsity Sneakers <-----> Rs. 1249 
Puma Evo Speed F1 Low BMW Sneakers <-----> Rs. 2609 
Lee Cooper Running and Walking Shoes <-----> Rs. 1329 
Lee Cooper Running and Walking Shoes <-----> Rs. 1329 
United Colors of Benetton Sneakers <-----> Rs. 2799 
United Colors of Benetton Party Wear Shoes <-----> Rs. 2449 
Timberland 6 In Premium Boots <-----> Rs. 8490 
Timberland Ek Mid Boots <-----> Rs. 8490 
Clarks Montacute Lord Boots <-----> Rs. 3249 
Clarks Latch Mast Corporate Casuals <-----> Rs. 1999 
Levi's Boots <-----> Rs. 2999 
+0

我尝试过,但它为所有产品赋予相同的价值,它如何将其与产品名称同时打印的特定产品绑定。 – shamshul2007

+0

@ shamshul2007相同的值意味着什么?你能给出更相关的HTML部分吗?你给的HTML,根据我的回答会适合你.. –

+0

我想要废除页面http://www.flipkart.com/mens-footwear/shoes/casual-shoes/pr?sid=osp,尼尔,e1f和输出应该像产品名称=产品的价格 – shamshul2007