2013-07-07 183 views
-1

源代码我有以下代码:获取网址

import urllib2 
from itertools import product 

with open('urllist.txt') as urllist: 
    urls=[line.strip() for line in urllist] 

for url in product(urls): 
    usock = urllib2.urlopen(url) 
    data = usock.read() 
    usock.close() 
    sourcecode=open('./sourcecode', 'w+') 
    sourcecode.write(data) 

当我运行它,它给了:

Traceback (most recent call last): 
    File "12.py", line 8, in <module> 
    usock = urllib2.urlopen(url) 
    File "/opt/python2.7.1/lib/python2.7/urllib2.py", line 126, in urlopen 
    return _opener.open(url, data, timeout) 
    File "/opt/python2.7.1/lib/python2.7/urllib2.py", line 383, in open 
    req.timeout = timeout 
AttributeError: 'tuple' object has no attribute 'timeout' 

不知道如何解决它?非常感谢!

+4

那你打算使用'product'实现? –

+0

我想从网址列表中获取源代码。 – Tom

+0

'url'如何看起来像? – matino

回答

3

itertools.product返回一个元组不是项目本身:

>>> from itertools import product 
>>> lis = ['a','b','c'] 
>>> for p in product(lis): 
...  print p 
...  
('a',) 
('b',) 
('c',) 

使用过的URL的简单循环:

for url in urls: 
    usock = urllib2.urlopen(url) 
+0

谢谢!我已经想出了另一种方式来做到这一点。只需将产品(网址)中的网址更改为“网址”中的“url”即可:“ – Tom

+2

@Tom我已经在答案中提到过了。 –