2011-11-09 28 views
3

我使用Python 2.6.5,我试图捕获通过HTTP发送的原始http请求,这工作正常,除了当我添加一个代理处理程序混入所以情况是如下:通过urllib2 Python原始http请求检索问题

  • HTTP和HTTPS请求做工精细没有代理处理程序:原始的HTTP请求捕获
  • HTTP请求正常工作与代理处理程序:代理确定,原始的HTTP请求捕获
  • HTTPS请求失败,代理处理程序:代理正常但未捕获原始HTTP请求!

下面的问题是接近,但不解决我的问题:

这是我在做什么:

class MyHTTPConnection(httplib.HTTPConnection): 
    def send(self, s): 
      global RawRequest 
      RawRequest = s # Saving to global variable for Requester class to see 
      httplib.HTTPConnection.send(self, s) 

class MyHTTPHandler(urllib2.HTTPHandler): 
    def http_open(self, req): 
      return self.do_open(MyHTTPConnection, req) 

class MyHTTPSConnection(httplib.HTTPSConnection): 
    def send(self, s): 
      global RawRequest 
      RawRequest = s # Saving to global variable for Requester class to see 
      httplib.HTTPSConnection.send(self, s) 

class MyHTTPSHandler(urllib2.HTTPSHandler): 
    def https_open(self, req): 
      return self.do_open(MyHTTPSConnection, req) 

委托类:

global RawRequest 
ProxyConf = { 'http':'http://127.0.0.1:8080', 'https':'http://127.0.0.1:8080' } 
# If ProxyConf = { 'http':'http://127.0.0.1:8080' }, then Raw HTTPS request captured BUT the proxy does not see the HTTPS request! 
# Also tried with similar results:  ProxyConf = { 'http':'http://127.0.0.1:8080', 'https':'https://127.0.0.1:8080' } 
ProxyHandler = urllib2.ProxyHandler(ProxyConf) 
urllib2.install_opener(urllib2.build_opener(ProxyHandler, MyHTTPHandler, MyHTTPSHandler)) 
urllib2.Request('http://www.google.com', None) # global RawRequest updated 
# This is the problem: global RawRequest NOT updated!? 
urllib2.Request('https://accounts.google.com', None) 

,但如果我删除ProxyHandler它的工作原理!:

global RawRequest 
urllib2.install_opener(urllib2.build_opener(MyHTTPHandler, MyHTTPSHandler)) 
urllib2.Request('http://www.google.com', None) # global RawRequest updated 
urllib2.Request('https://accounts.google.com', None) # global RawRequest updated 

我如何加入ProxyHandler加入混合,同时保持对RawRequest的访问?

预先感谢您。

+0

如果您确定自己有答案,请将其作为答案而不是评论发布。 –

+0

好点乔纳森,只是将评论移到答案部分。干杯。 –

回答

1

回答我自己的问题:这似乎是底层库中的一个bug,使得RawRequest列表可以解决问题:HTTP Raw请求是第一项。自定义HTTPS类被多次调用,最后一个为空。该自定义HTTP类只调用的事实一旦表明,这是Python中的错误,但该列表的解决方案得到周围

RawRequest = s 

只需改为:

RawRequest.append(s) 

与以前的初始化通过RawRequest[0](列表的第一个元素)RawRequest = []和检索原始请求