2012-07-05 39 views
5

解析的HttpResponse与引入nokogiri红宝石解析的HttpResponse与引入nokogiri

嗨,我无法解析的HttpResponse与引入nokogiri对象。

我用这个功能来获取此网站:

当我做这个网站打印获取链接

def fetch(uri_str, limit = 10) 


    # You should choose better exception. 
    raise ArgumentError, 'HTTP redirect too deep' if limit == 0 

    url = URI.parse(URI.encode(uri_str.strip)) 
    puts url 

    #get path 
    req = Net::HTTP::Get.new(url.path,headers) 
    #start TCP/IP 
    response = Net::HTTP.start(url.host,url.port) { |http| 
     http.request(req) 
    } 
    case response 
    when Net::HTTPSuccess 
    then #print final redirect to a file 
    puts "this is location" + uri_str 
    puts "this is the host #{url.host}" 
    puts "this is the path #{url.path}" 

    return response 
    # if you get a 302 response 
    when Net::HTTPRedirection 
    then 
    puts "this is redirect" + response['location'] 
    return fetch(response['location'],aFile, limit - 1) 
    else 
    response.error! 
    end 
end 




      html = fetch("http://www.somewebsite.com/hahaha/") 
      puts html 
      noko = Nokogiri::HTML(html) 

乱码一大堆和 引入nokogiri抱怨说,“node_set必须是引入nokogiri :: XML :: NODESET

如果有人可以提供帮助这将是颇为赞赏

+1

你应该使用机械化代替这个炎热的烂摊子。它负责重定向并处理你的编码。 – pguardiario 2012-07-05 23:15:58

回答

4

的第一件事。你fetch方法返回一个Net::HTTPResponse对象,而不仅仅是body。你应该将身体提供给Nokogiri。

response = fetch("http://www.somewebsite.com/hahaha/") 
puts response.body 
noko = Nokogiri::HTML(response.body) 

我已更新您的脚本,以便它可以运行(波纹管)。有几件事是未定义的。

require 'nokogiri' 
require 'net/http' 

def fetch(uri_str, limit = 10) 
    # You should choose better exception. 
    raise ArgumentError, 'HTTP redirect too deep' if limit == 0 

    url = URI.parse(URI.encode(uri_str.strip)) 
    puts url 

    #get path 
    headers = {} 
    req = Net::HTTP::Get.new(url.path,headers) 
    #start TCP/IP 
    response = Net::HTTP.start(url.host,url.port) { |http| 
     http.request(req) 
    } 

    case response 
    when Net::HTTPSuccess 
    then #print final redirect to a file 
    puts "this is location" + uri_str 
    puts "this is the host #{url.host}" 
    puts "this is the path #{url.path}" 

    return response 
    # if you get a 302 response 
    when Net::HTTPRedirection 
    then 
    puts "this is redirect" + response['location'] 
    return fetch(response['location'], limit-1) 
    else 
    response.error! 
    end 
end 

response = fetch("http://www.google.com/") 
puts response 
noko = Nokogiri::HTML(response.body) 
puts noko 

该脚本不给出错误并打印内容。由于您收到的内容,您可能会收到Nokogiri错误。我在Nokogiri遇到的一个常见问题是字符编码。没有确切的错误,不可能说出发生了什么。

我recommnend在下面的StackOverflow问题

ruby 1.9: invalid byte sequence in UTF-8看(特别this answer

How to convert a Net::HTTP response to a certain encoding in Ruby 1.9.1?

+0

谢谢,但nokogiri仍然给我这个错误 – 2012-07-05 14:18:16

+0

非常感谢Mr.Simard,我会查找字符编码。 – 2012-07-05 16:02:14

+0

如何查看更详细的调试消息? Nokogiri给我的唯一错误是这个node_set必须是Nokogiri :: XML :: Nodeset – 2012-07-05 17:54:59