从ThreadPoolExecutor调用时特殊的调用urllib.request.urlopen留下打开的文件描述符

我试图从雅虎财务使用多个线程下载大量数据。我正在使用concurrent.futures.ThreadPoolExecutor来加快速度。一切顺利，直到我使用所有可用的文件描述符（默认为1024）。从ThreadPoolExecutor调用时特殊的调用urllib.request.urlopen留下打开的文件描述符

当urllib.request.urlopen()引发异常时，文件描述符保持打开状态（无论套接字使用多长时间）。通常这个文件描述符被重用，如果我只从一个（主）线程运行的东西，所以这个问题不会发生。但是当这些异常的urlopen()调用由ThreadPoolExecutor线程产生时，这些文件描述符保持打开状态。我到目前为止唯一的解决方案是使用非常繁琐和低效的进程（ProcessPoolExecutor），或者将允许的文件描述符的数量增加到非常大的数量（并非我的库的所有潜在用户都将这样做无论如何）。必须有一个更聪明的方法来解决这个问题。

而且我也知道这是不是在Python库中的错误或我只是做错了什么......

我Debian上运行的Python 3.4.1（测试内核3.10-3-AMD64）。

这是一个示例代码演示了此行为：

import concurrent 
import concurrent.futures 
import urllib.request 
import os 
import psutil 
from time import sleep 


def fetchfun(url): 
    urllib.request.urlopen(url) 


def main(): 

    print(os.getpid()) 
    p = psutil.Process(os.getpid()) 
    print(p.get_num_fds()) 


    # this url doesn't exist 
    test_url = 'http://ichart.finance.yahoo.com/table.csv?s=YHOOxyz' + \ 
      '&a=00&b=01&c=1900&d=11&e=31&f=2019&g=d' 

    with concurrent.futures.ThreadPoolExecutor(1) as executor: 
     futures = [] 
     for i in range(100): 
      futures.append(executor.submit(fetchfun, test_url)) 
     count = 0 
     for future in concurrent.futures.as_completed(futures): 
      count += 1 
      print("{}: {} (ex: {})".format(count, p.get_num_fds(), future.exception())) 

    print(os.getpid()) 
    sleep(60) 


if __name__ == "__main__": 
    main()

来源

2014-09-19 Sebastian Jylanki

当HTTPError被升高，这样可以节省用于请求作为HTTPError的fp属性到HTTPResponse对象的引用。该引用将保存在您的futures列表中，该列表在您的程序结束之前不会被销毁。这意味着有一个HTTPResponse在整个程序中保持活跃状态。只要该参考存在，HTTPResponse中使用的插座就保持打开状态。解决此问题的一种方法是在处理例外时明确关闭HTTPResponse：

with concurrent.futures.ThreadPoolExecutor(1) as executor: 
    futures = [] 
    for i in range(100): 
     futures.append(executor.submit(fetchfun, test_url)) 
    count = 0 
    for future in concurrent.futures.as_completed(futures): 
     count += 1 
     exc = future.exception() 
     print("{}: {} (ex: {})".format(count, p.get_num_fds(), exc)) 
     exc.fp.close() # Close the HTTPResponse

来源

2014-09-19 14:48:01 dano

正是我在找的东西！这解决了问题！另一个解决方案是使用urllib3。 – 2014-09-19 15:16:38

从ThreadPoolExecutor调用时特殊的调用urllib.request.urlopen留下打开的文件描述符

回答

相关问题