Python，请求，线程，python请求关闭其套接字的速度有多快？

我正在尝试使用Python请求进行操作。这里是我的代码：Python，请求，线程，python请求关闭其套接字的速度有多快？

import threading 
import resource 
import time 
import sys 

#maximum Open File Limit for thread limiter. 
maxOpenFileLimit = resource.getrlimit(resource.RLIMIT_NOFILE)[0] # For example, it shows 50. 

# Will use one session for every Thread. 
requestSessions = requests.Session() 
# Making requests Pool bigger to prevent [Errno -3] when socket stacked in CLOSE_WAIT status. 
adapter = requests.adapters.HTTPAdapter(pool_maxsize=(maxOpenFileLimit+100)) 
requestSessions.mount('http://', adapter) 
requestSessions.mount('https://', adapter) 

def threadAction(a1, a2): 
    global number 
    time.sleep(1) # My actions with Requests for each thread. 
    print number = number + 1 

number = 0 # Count of complete actions 

ThreadActions = [] # Action tasks. 
for i in range(50): # I have 50 websites I need to do in parallel threads. 
    a1 = i 
    for n in range(10): # Every website I need to do in 3 threads 
     a2 = n 
     ThreadActions.append(threading.Thread(target=threadAction, args=(a1,a2))) 


for item in ThreadActions: 
    # But I can't do more than 50 Threads at once, because of maxOpenFileLimit. 
    while True: 
     # Thread limiter, analogue of BoundedSemaphore. 
     if (int(threading.activeCount()) < threadLimiter): 
      item.start() 
      break 
     else: 
      continue 

for item in ThreadActions: 
    item.join()

但事实是，经过我得到50个线程时，该Thread limiter开始等待一些线程完成其工作。这是问题。在scrit前往限制器后，lsof -i|grep python|wc -l显示远远少于50个活动连接。但是在限制器之前它已经显示了所有的< = 50个过程。这是为什么发生？或者我应该使用requests.close（）而不是requests.session（）来阻止它使用已经运行的套接字？

来源

2016-10-01 passwd

您的线程限制器进入一个紧密的循环，并消耗大部分处理时间。尝试像“睡眠（.1）”这样的放慢速度。更好的是，使用限制为50个请求的队列，让你的线程读取这些请求。 – tdelaney

关于增加用户操作系统的限制，请查找[ulimit]（http://stackoverflow.com/questions/6774724/why-python-has-limit-for-count-of-file-handles）和[fs .file-MAX]（https://cs.uwaterloo.ca/~brecht/servers/openfiles.html）。在这样做之后，在增加python内部的限制时，请查找[setrlimit]（https://coderwall.com/p/ptq7rw/increase-open-files-limit-and-drop-privileges-in-python）。当然，确保你没有不必要地运行busy-while-loop并且正确地复用你的代码。 – blackpen

是的，我明白，并在我使用BoundedSemaphore的真实脚本。但是为什么在脚本达到极限之后，lsof -i | grep python | wc -l'显示的数字要低得多？ – passwd

您的限制器是一个紧密的循环，占用了大部分处理时间。使用线程池来限制工作人员数量。

import multiprocessing.pool 

# Will use one session for every Thread. 
requestSessions = requests.Session() 
# Making requests Pool bigger to prevent [Errno -3] when socket stacked in CLOSE_WAIT status. 
adapter = requests.adapters.HTTPAdapter(pool_maxsize=(maxOpenFileLimit+100)) 
requestSessions.mount('http://', adapter) 
requestSessions.mount('https://', adapter) 

def threadAction(a1, a2): 
    global number 
    time.sleep(1) # My actions with Requests for each thread. 
    print number = number + 1 # DEBUG: This doesn't update number and wouldn't be 
           # thread safe if it did 

number = 0 # Count of complete actions 

pool = multiprocessing.pool.ThreadPool(50, chunksize=1) 

ThreadActions = [] # Action tasks. 
for i in range(50): # I have 50 websites I need to do in parallel threads. 
    a1 = i 
    for n in range(10): # Every website I need to do in 3 threads 
     a2 = n 
     ThreadActions.append((a1,a2)) 

pool.map(ThreadActons) 
pool.close()

来源

2016-10-01 16:15:50 tdelaney

多处理工作比线程更快吗？这对处理器负载有何影响？ – passwd

它是一个权衡...和windows不同的是linux。使用多处理时，数据需要在父代和子代之间进行序列化（并且在Windows上，通常情况下，需要序列化更多的上下文，因为孩子没有得到父内存空间的克隆），但是您不必担心通过GIL。更高的CPU和/或更低的数据开销使得多处理效果更好。但是如果你主要是I/O绑定的话，线程池就可以。 – tdelaney

Python，请求，线程，python请求关闭其套接字的速度有多快？

回答

相关问题