2016-03-08 60 views
0

我有一个程序,使用或者正在运行的程序,而作为一个参数一个关键词或关键词搜索谷歌:机械化刮谷歌的网址

例如:pull_sites.rb "testing" 回报这些网站>>>

https://en.wikipedia.org/wiki/Software_testing 
http://en.wikipedia.org/wiki/Test_automation 
http://www.istqb.org/about-istqb.html 
http://softwaretestingfundamentals.com/test-plan/ 
https://en.wikipedia.org/wiki/Software_testing 
http://webcache.googleusercontent.com/search%3Fhl%3Den%26biw%26bih%26q%3Dcache:9qU2GDLzZzEJ:https://en.wikipedia.org/wiki/Software_testing%252Btesting%26gbv%3D1%26%26ct%3Dclnk 
https://en.wikipedia.org/wiki/Test_strategy 
https://en.wikipedia.org/wiki/Category:Software_testing 
https://en.wikipedia.org/wiki/Test_automation 
https://en.wikipedia.org/wiki/Portal:Software_testing 
https://en.wikipedia.org/wiki/Test 
http://webcache.googleusercontent.com/search%3Fhl%3Den%26biw%26bih%26q%3Dcache:R94CAo00wOYJ:https://en.wikipedia.org/wiki/Test%252Btesting%26gbv%3D1%26%26ct%3Dclnk 
https://en.wikipedia.org/wiki/Unit_testing 
http://webcache.googleusercontent.com/search%3Fhl%3Den%26biw%26bih%26q%3Dcache:G9V8uRLkPjIJ:https://en.wikipedia.org/wiki/Unit_testing%252Btesting%26gbv%3D1%26%26ct%3Dclnk 
https://testing.byu.edu/ 
http://webcache.googleusercontent.com/search%3Fhl%3Den%26biw%26bih%26q%3Dcache:d9bGrCHr9fsJ:https://testing.byu.edu/%252Btesting%26gbv%3D1%26%26ct%3Dclnk 
https://www.test.com/ 
http://webcache.googleusercontent.com/search%3Fhl%3Den%26biw%26bih%26q%3Dcache:S92tylTr1V8J:https://www.test.com/%252Btesting%26gbv%3D1%26%26ct%3Dclnk 
http://ddce.utexas.edu/disability/using-testing-accommodations/ 
http://blogs.vmware.com/virtualblocks/2015/07/06/vsan-vs-nutanix-head-to-head-performance-testing-part-4-exchange/ 
http://www.networkforgood.com/nonprofitblog/testing-101-4-steps-optimizing-your-fundraising-approach/ 
http://www.auslea.com/software-testing-training.html 
http://academy.littletonpublicschools.net/Default.aspx%3Ftabid%3D12807%26articleType%3DArticleView%26articleId%3D2400 
https://golang.org/pkg/testing/ 
http://webcache.googleusercontent.com/search%3Fhl%3Den%26biw%26bih%26q%3Dcache:EALG7Jlm9eoJ:https://golang.org/pkg/testing/%252Btesting%26gbv%3D1%26%26ct%3Dclnk 
http://www.speedtest.net/ 
http://webcache.googleusercontent.com/search%3Fhl%3Den%26biw%26bih%26q%3Dcache:M47_v0xF3m8J:http://www.speedtest.net/%252Btesting%26gbv%3D1%26%26ct%3Dclnk 
https://www.act.org/content/act/en/products-and-services/the-act/taking-the-test.html 
http://webcache.googleusercontent.com/search%3Fhl%3Den%26biw%26bih%26q%3Dcache:1sMSoJBXydoJ:https://www.act.org/content/act/en/products-and-services/the-act/taking-the-test.html%252Btesting%26gbv%3D1%26%26ct%3Dclnk 
http://www.act.org/content/act/en/products-and-services/the-act/test-preparation.html 
http://webcache.googleusercontent.com/search%3Fhl%3Den%26biw%26bih%26q%3Dcache:pAzlNJl3YY4J:http://www.act.org/content/act/en/products-and-services/the-act/test-preparation.html%252Btesting%26gbv%3D1%26%26ct%3Dclnk 

它按预期的方式工作,但只是刮了谷歌的第一页,是否有可能搜索说第1-5页?

这里是刮源:如果您使用的是谷歌Chrome或Firefox,开辟了开发工具

def get_urls 
    puts "Searching...".green 
    agent = Mechanize.new 
    page = agent.get('http://www.google.com/') 
    google_form = page.form('f') 
    google_form.q = "#{SEARCH}" #SEARCH is the parameter given when program is run 
    page = agent.submit(google_form, google_form.buttons.first) 
    page.links.each do |link| 
     if link.href.to_s =~/url.q/ 
     str=link.href.to_s 
     strList=str.split(%r{=|&}) 
     url=strList[1] 
     File.open("links.txt", "a+"){ |s| s.puts(url) } 
     end 
    end 
    end 
+0

是的,它是可能的。您是否尝试点击或导航到其他网页? – kjprice

+0

@kjprice如何在程序已经运行时点击并导航到程序中的另一个页面?问题是否可以在程序中搜索页面,而不是如果我可以单击2,3或4 .. – 13aal

+0

@ 13aal是的,您可以告诉机械化在点击底部的页面链接后点击底部的页面链接页面,然后刮那些页面等。这就是你要求怎么做? – bkunzi01

回答

1

确定。这将帮助您确定要自动点击的链接。当你做谷歌搜索,然后滚动到底部,你会看到页面链接点击。在浏览器中使用开发人员工具,您需要确定哪些类或id谷歌正在分配这些页码链接。然后使用机械化点击方法来跟踪这些链接。例如,如果该链接被标记为“下一个”你可以用这样简单的东西:

第2页= page1.link_with。(:文本=>“下一步”)点击

我是从我的电话接听,因此可以节省您的时间谷歌“点击链接”与机械化更多细节。

+0

例如:'page_1 =“http://google.com”';'page_2 = page_1.link_with(:text =>“search”)。click';'page_3 = page_2.link_with(:text =>“搜索“)。点击'会点击页面1,2和3? – 13aal

+0

你有这个概念,但我不认为与你的例子,你想要的文字“搜索”的链接。我想你会想要名称为“下一个”的链接,因为这会将您带到第一页的第2页。但是如果你确信链接有文字搜索,那么你就很好。 – bkunzi01

1

这是一个GET形式,从而更容易只是为了让自己的要求:

https://www.google.com/search?q=foo 
https://www.google.com/search?q=foo&start=10 
https://www.google.com/search?q=foo&start=20