2013-02-05 56 views
0

当使用urlgrabber时,推荐处理Content-Encoding: gzip文件的方法是什么?urlgrabber与gzip支持

现在,我的猴子打补丁这样的:

g = URLGrabber(http_headers=(("Accept-Encoding", "gzip"),)) 
g.is_compressed = False # I don't know yet if the server will send me compressed data 

# Backup current method of handling downloaded headers 
try: 
    PyCurlFileObject.orig_hdr_retrieve 
except AttributeError: 
    PyCurlFileObject.orig_hdr_retrieve = PyCurlFileObject._hdr_retrieve 

def hdr_retrieve(instance, buf): 
    r = PyCurlFileObject.orig_hdr_retrieve(instance, buf) 
    if "content-encoding" in buf.lower() and "zip" in buf.lower(): 
     g.is_compressed = True 
    return r 
PyCurlFileObject._hdr_retrieve = hdr_retrieve 

g.urlgrab(url, dest) 

if g.is_compressed: 
    # ungzip file here 

但它看起来并不很干净,我担心它不是线程要么...

回答

0

我想我已经发现了一个线程安全的解决方案:

g = URLGrabber((http_headers=(("Accept-Encoding", "gzip"),))) 
g.opts._set_attributes(grabber=g) 
try: 
    PyCurlFileObject.orig_setopts 
except AttributeError: 
    PyCurlFileObject.orig_setopts = PyCurlFileObject._set_opts 

    def setopts(instance, opts={}): 
     PyCurlFileObject.orig_setopts(instance, opts) 
     grabber = instance.opts.grabber 
     grabber.is_compressed = False 

     def hdr_retrieve(buf): 
      r = PyCurlFileObject._hdr_retrieve(instance, buf) 
      if "content-encoding" in buf.lower() and "zip" in buf.lower(): 
       grabber.is_compressed = True 
      return r 

     instance.curl_obj.setopt(pycurl.HEADERFUNCTION, hdr_retrieve) 
    PyCurlFileObject._set_opts = setopts 

,但它仍然没有感到很“干净” :)