示例urllib3和蟒蛇中的线程

我想在简单的线程中使用urllib3来获取几个wiki页面。脚本将示例urllib3和蟒蛇中的线程

为每个线程创建1个连接（我不明白为什么）并永久挂起。任何提示，建议或urllib3的简单的例子，线程

import threadpool 
from urllib3 import connection_from_url 

HTTP_POOL = connection_from_url(url, timeout=10.0, maxsize=10, block=True) 

def fetch(url, fiedls): 
    kwargs={'retries':6} 
    return HTTP_POOL.get_url(url, fields, **kwargs) 

pool = threadpool.ThreadPool(5) 
requests = threadpool.makeRequests(fetch, iterable) 
[pool.putRequest(req) for req in requests]

@伦纳特的剧本得到这个错误：

http://en.wikipedia.org/wiki/2010-11_Premier_LeagueTraceback (most recent call last): 
    File "/usr/local/lib/python2.7/dist-packages/threadpool.py", line 156, in run 
http://en.wikipedia.org/wiki/List_of_MythBusters_episodeshttp://en.wikipedia.org/wiki/List_of_Top_Gear_episodes http://en.wikipedia.org/wiki/List_of_Unicode_characters result = request.callable(*request.args, **request.kwds) 
    File "crawler.py", line 9, in fetch 
    print url, conn.get_url(url) 
AttributeError: 'HTTPConnectionPool' object has no attribute 'get_url' 
Traceback (most recent call last): 
    File "/usr/local/lib/python2.7/dist-packages/threadpool.py", line 156, in run 
    result = request.callable(*request.args, **request.kwds) 
    File "crawler.py", line 9, in fetch 
    print url, conn.get_url(url) 
AttributeError: 'HTTPConnectionPool' object has no attribute 'get_url' 
Traceback (most recent call last): 
    File "/usr/local/lib/python2.7/dist-packages/threadpool.py", line 156, in run 
    result = request.callable(*request.args, **request.kwds) 
    File "crawler.py", line 9, in fetch 
    print url, conn.get_url(url) 
AttributeError: 'HTTPConnectionPool' object has no attribute 'get_url' 
Traceback (most recent call last): 
    File "/usr/local/lib/python2.7/dist-packages/threadpool.py", line 156, in run 
    result = request.callable(*request.args, **request.kwds) 
    File "crawler.py", line 9, in fetch 
    print url, conn.get_url(url) 
AttributeError: 'HTTPConnectionPool' object has no attribute 'get_url'

加入import threadpool; import urllib3和tpool = threadpool.ThreadPool(4) @ user318904的代码后得到这个错误：

Traceback (most recent call last): 
    File "crawler.py", line 21, in <module> 
    tpool.map_async(fetch, urls) 
AttributeError: ThreadPool instance has no attribute 'map_async'

来源

2010-09-16 Joey

很明显，它会为每个线程创建一个连接，每个线程应该怎样才能获取一个页面？并且您尝试使用同一个连接，由一个网址制作，适用于所有网址。这不可能是你想要的。

此代码工作得很好：

import threadpool 
from urllib3 import connection_from_url 

def fetch(url): 
    kwargs={'retries':6} 
    conn = connection_from_url(url, timeout=10.0, maxsize=10, block=True) 
    print url, conn.get_url(url) 
    print "Done!" 

pool = threadpool.ThreadPool(4) 
urls = ['http://en.wikipedia.org/wiki/2010-11_Premier_League', 
     'http://en.wikipedia.org/wiki/List_of_MythBusters_episodes', 
     'http://en.wikipedia.org/wiki/List_of_Top_Gear_episodes', 
     'http://en.wikipedia.org/wiki/List_of_Unicode_characters', 
     ] 
requests = threadpool.makeRequests(fetch, urls) 

[pool.putRequest(req) for req in requests] 
pool.wait()

来源

2011-01-13 12:20:42

我用的是这样的：

#excluding setup for threadpool etc 

upool = urllib3.HTTPConnectionPool('en.wikipedia.org', block=True) 

urls = ['/wiki/2010-11_Premier_League', 
     '/wiki/List_of_MythBusters_episodes', 
     '/wiki/List_of_Top_Gear_episodes', 
     '/wiki/List_of_Unicode_characters', 
     ] 

def fetch(path): 
    # add error checking 
    return pool.get_url(path).data 

tpool = ThreadPool() 

tpool.map_async(fetch, urls) 

# either wait on the result object or give map_async a callback function for the results

来源

2011-04-04 08:29:09 user318904

线程编程是很难的，所以我写了workerpool让你在做什么容易。

更具体而言，请参见Mass Downloader示例。

要做到同样的事情urllib3，它看起来是这样的：

import urllib3 
import workerpool 

pool = urllib3.connection_from_url("foo", maxsize=3) 

def download(url): 
    r = pool.get_url(url) 
    # TODO: Do something with r.data 
    print "Downloaded %s" % url 

# Initialize a pool, 5 threads in this case 
pool = workerpool.WorkerPool(size=5) 

# The ``download`` method will be called with a line from the second 
# parameter for each job. 
pool.map(download, open("urls.txt").readlines()) 

# Send shutdown jobs to all threads, and wait until all the jobs have been completed 
pool.shutdown() 
pool.wait()

对于更复杂的代码，看看workerpool.EquippedWorker（和the tests here例如使用）。你可以让游泳池成为你通过的toolbox。

来源

2011-07-27 01:58:52 shazow

示例urllib3和蟒蛇中的线程

回答

相关问题