2010-04-14 87 views
2

我试着回答以下问题出于个人兴趣: What is the fastest way to send 100,000 HTTP requests in Python?恼人的扭曲的Python问题

这就是我想出了这么远,但我遇到了一些非常STANGE。

安装信号处理器,它只是挂起。我可以看到DelayedCall实例在reactor._newTimedCalls中,但是processResponse永远不会被调用。

安装信号处理程序错误,它会引发错误并工作。

from twisted.internet import reactor 
from twisted.web.client import Agent 
from threading import Semaphore, Thread 
import time 

concurrent = 100 
s = Semaphore(concurrent) 
reactor.suggestThreadPoolSize(concurrent) 
t=Thread(
    target=reactor.run, 
    kwargs={'installSignalHandlers':True}) 
t.daemon=True 
t.start() 


agent = Agent(reactor) 


def processResponse(response,url): 
    print response.code, url 
    s.release() 

def processError(response,url): 
    print "error", url 
    s.release() 

def addTask(url): 
    req = agent.request('HEAD', url) 
    req.addCallback(processResponse, url) 
    req.addErrback(processError, url) 


for url in open('urllist.txt'): 
    addTask(url.strip())  
    s.acquire() 
while s._Semaphore__value!=concurrent: 
    time.sleep(0.1)  

reactor.stop() 

这里是错误,它会抛出时installSignalHandlers为真: (注:这是预期的行为,现在的问题是,为什么当installSignalHandlers为False这是行不通的!)。

Traceback (most recent call last): 
    File "/usr/lib/python2.6/dist-packages/twisted/internet/base.py", line 396, in fireEvent 
    DeferredList(beforeResults).addCallback(self._continueFiring) 
    File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py", line 224, in addCallback 
    callbackKeywords=kw) 
    File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py", line 213, in addCallbacks 
    self._runCallbacks() 
    File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py", line 371, in _runCallbacks 
    self.result = callback(self.result, *args, **kw) 
--- <exception caught here> --- 
    File "/usr/lib/python2.6/dist-packages/twisted/internet/base.py", line 409, in _continueFiring 
    callable(*args, **kwargs) 
    File "/usr/lib/python2.6/dist-packages/twisted/internet/base.py", line 1165, in _reallyStartRunning 
    self._handleSignals() 
    File "/usr/lib/python2.6/dist-packages/twisted/internet/base.py", line 1105, in _handleSignals 
    signal.signal(signal.SIGINT, self.sigInt) 
exceptions.ValueError: signal only works in main thread 

我在做什么错,什么是正确的方法?我是新来扭曲。

@moshez: 谢谢。它现在:

from twisted.internet import reactor, threads 
from urlparse import urlparse 
import httplib 
import itertools 


concurrent = 100 
finished=itertools.count(1) 
reactor.suggestThreadPoolSize(concurrent) 

def getStatus(ourl): 
    url = urlparse(ourl) 
    conn = httplib.HTTPConnection(url.netloc) 
    conn.request("HEAD", url.path) 
    res = conn.getresponse() 
    return res.status 

def processResponse(response,url): 
    print response, url 
    processedOne() 

def processError(error,url): 
    print "error", url#, error 
    processedOne() 

def processedOne(): 
    if finished.next()==added: 
     reactor.stop() 

def addTask(url): 
    req = threads.deferToThread(getStatus, url) 
    req.addCallback(processResponse, url) 
    req.addErrback(processError, url) 

added=0 
for url in open('urllist.txt'): 
    added+=1 
    addTask(url.strip()) 

try: 
    reactor.run() 
except KeyboardInterrupt: 
    reactor.stop() 
+0

没有理由处理从reactor.run()引发的KeyboardInterrupt。 C-c使reactor.run()返回*,而不是引发异常。 – 2010-04-14 12:37:48

回答

6

您使用waaaaay太多“反应堆召唤”(例如,有这么agent.request呼叫进入反应器的好机会)从主线程。我不确定这是否是您的问题,但仍然不支持 - 从非反应器线程调用的唯一反应堆调用是reactor.callFromThread。

另外,整个架构看起来很奇怪。你为什么不在主线上运行反应堆?即使您一次完成所有操作,从10,000个请求中读取整个文件并将其拆分,也不应该成为从反应器执行的问题。

您可能会碰到不使用任何线程的纯扭曲解决方案。