2012-11-15 63 views
0

我试图让机械化使用随机用户代理每次我开始打开一个URL。 有人能指出我需要采取的正确方向吗? - 我到处搜索,找不到参考。机械化随机用户代理

谢谢!

+0

什么都有你到目前为止尝试过?它是如何为你打破的?请向我们展示您尝试过的代码,以便我们为您提供帮助。 –

回答

0

我有,当我在做网络爬虫同样的问题,这是我使用的解决方案:

class URLOpener():  
    def opener(self,user_agent): 
     cj=cookielib.CookieJar() 
     #Process Hadlers 
     opener=urllib2.build_opener(urllib2.HTTPCookieProcessor(cj)) 
     opener.addheaders=[ 
         ('User-Agent', user_agent), 
         ('Accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'), 
         ('Accept-Language', 'en-gb,en;q=0.5'), 
         ('Accept-Charset', 'ISO-8859-1,utf-8;q=0.7,*;q=0.7'), 
         ('Keep-Alive', '115'), 
         ('Connection', 'keep-alive'), 
         ('Cache-Control', 'max-age=0'), 
        ] 
     return opener 

    #Openers with different User-Agents 
    def opener_list(self,f_path): 
     #f_path is a path to the file that contains browsers 
     f=open(f_path, 'r+') 
     count=0 
     user_agent_list=list() 
     for line in f.xreadlines(): 
      count+=1 
      user_agent_list.append(line[:-1]) 
     openers=[self.opener(user_agent) for user_agent in user_agent_list] 
     return openers 

也文件,我创建看起来类似于:

Mozilla/5.0 (Linux; U; Android 4.0.3; ko-kr; LG-L160L Build/IML74K) AppleWebkit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30 
Mozilla/5.0 (Linux; U; Android 4.0.3; de-ch; HTC Sensation Build/IML74K) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30 
Mozilla/5.0 (Linux; U; Android 2.3; en-us) AppleWebKit/999+ (KHTML, like Gecko) Safari/999.9 
Mozilla/5.0 (Linux; U; Android 2.3.5; zh-cn; HTC_IncredibleS_S710e Build/GRJ90) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1 
Mozilla/5.0 (Linux; U; Android 2.3.5; en-us; HTC Vision Build/GRI40) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1 
Mozilla/5.0 (Linux; U; Android 2.3.4; fr-fr; HTC Desire Build/GRJ22) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1 
Mozilla/5.0 (Linux; U; Android 2.3.4; en-us; T-Mobile myTouch 3G Slide Build/GRI40) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1 
Mozilla/5.0 (Linux; U; Android 2.3.3; zh-tw; HTC_Pyramid Build/GRI40) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1 
Mozilla/5.0 (Linux; U; Android 2.3.3; zh-tw; HTC_Pyramid Build/GRI40) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari 
Mozilla/5.0 (Linux; U; Android 2.3.3; zh-tw; HTC Pyramid Build/GRI40) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1 
1

link为您提供了示例用户代理引用。示例代码演示它:

from random import choice 
user_agents = ['Mozilla/5.0 (X11; U; Linux; i686; en-US; rv:1.6) Gecko Debian/1.6-7','Konqueror/3.0-rc4; (Konqueror/3.0-rc4; i686 Linux;;datecode)','Opera/9.52 (X11; Linux i686; U; en)'] 
random_user_agent = choice(user_agents) 

可以包括尽可能多的数目从上述链接到可变user_agents user_agents的。

现在只需将random_user_agent放入Mechanize中,方法是在初始化期间添加标头。