2015-07-13 45 views
0

我第一次使用线程类中的线程,并且它们在函数运行后似乎没有被释放。我试图一次运行最多5个线程。由于一个线程创建下一个会有一些重叠,但我看到2000+线程同时运行,然后我得到异常“无法启动新线程”。Python释放线程

from threading import Thread 
import string 

URLS = ['LONG LIST OF URLS HERE'] 

currentThread = 0 
LASTTHREAD = len(URLS) - 1 
MAXTHREADS = 5 
threads = [None] * (LASTTHREAD + 1) 

def getURL(threadName, currentThread): 
    print('Thread Name = ' + threadName) 
    print('URL = ' + str(URLS[currentThread])) 
    if currentThread < LASTTHREAD: 
    currentThread = currentThread + 1 
    thisThread = currentThread 
    try: 
     threads[thisThread] = Thread(target = getURL, args = ('thread' + str(thisThread), currentThread,)) 
     threads[thisThread].start() 
     threads[thisThread].join() 
    except Exception,e: 
     print "Error: unable to start thread" 
     print str(e) 

for i in range(0, MAXTHREADS): 
    currentThread = currentThread + 1 
    try: 
    threads[i] = Thread(target = getURL, args = ('thread' + str(i), currentThread,)) 
    threads[i].start() 
    threads[i].join() 
    except Exception,e: 
    print "Error: unable to start thread" 
    print str(e) 

我接受任何其他清理我可以在这里做,以及因为我是很新,Python和全新的穿线。我只是试图在此时正确设置线程。最终这将刮擦URLS。

+0

让你的衍生线程自己产生线程是相当不寻常的。我建议在最低限度的重构,以便您的主线程完成所有的产卵。 – eddiewould

回答

0

我建议看看线程池,让线程从合适的共享数据结构(例如队列)中获取任务,而不是始终开始新线程。

取决于什么是你真正想做的事,如果你使用CPython的(如果你不知道我的意思CPython中,你会),你可能没有真正得到使用线程的任何性能改进(由于全球解释器锁定)。所以你可能最好查看多处理模块。

from Queue import Queue 
from threading import Thread 

def worker(): 
    while True: 
     item = q.get() 
     do_work(item) 
     q.task_done() 

def do_work(url): 
    print "Processing URL:" + url 

q = Queue() 
for i in range(5): 
    t = Thread(target=worker) 
    t.daemon = True 
    t.start() 

for item in ['url_' + str(i) for i in range(2000)]: 
    q.put(item) 

q.join()  # block until all tasks are done 
+0

请参阅文档中的示例:https://docs.python.org/2/library/queue.html(位于底部) – eddiewould