你什么都有了构建详细信息页面的URL。抓住相对URL(我将称之为路径)附加基本URL并发出新请求。
require 'mechanize'
agent = Mechanize.new
agent.pluggable_parser.default = Mechanize::Page
base = 'http://lookbook.nu'
page = agent.get(base + '/north-america')
detail_pages = page.search("//div[contains(@class, 'look_meta_container')]/p/a[1]/@href").map(&:text)
# ["/user/1069907-Veronica-P", "/elliott_alexzander", "/neno", "/skirtsofurban", "/tovogueorbust", "/dthutt", "/ryapie", "/lovebetweentheracks", "/lonleyboy", "/bobbyraffin", "/tsangtastic", "/user/737385-Katia-H"]
detail_pages.each do |path|
page = agent.get(base + path)
name = page.search("//div[@id='userheader']//h1/a").text
fans = page.search("//span[contains(text(), 'Fans')]/../span[1]").text
puts name + " have " + fans + " fans"
end
=>
Veronica P have 26,044 fans
Elliott Alexzander have 3,409 fans
Neno Neno have 15,304 fans
Laura P have 975 fans
Alexandra G. have 620 fans
Dayeanne Hutton have 336 fans
Mariah Alysz have 288 fans
Lina Dinh have 11,675 fans
Talal Amine have 882 fans
Bobby Raffin have 72,469 fans
Jenny Tsang have 8,909 fans
Katia H. have 282 fans
注:我为了得到一个Mechanize::Page
响应使用#pluggable_parser.default
。通常你不需要,但他们没有正确设置内容类型。
除非您使用的是很旧版本的Ruby,你不需要'要求' rubygems''。你不需要'需要'nokogiri',因为它已经是Mechanize的依赖。另外,您可能不需要'require'open-uri'',因为Mechanize提供了自己的抓取页面的机制。 –