使用urlopen打开网址列表

我有一个python脚本来获取网页并对其进行镜像。它适用于一个特定的页面，但我不能让它工作多个。我以为我可以把多个网址变成一个列表，然后哺养的功能，但我得到这个错误：使用urlopen打开网址列表

Traceback (most recent call last): 
    File "autowget.py", line 46, in <module> 
    getUrl() 
    File "autowget.py", line 43, in getUrl 
    response = urllib.request.urlopen(url) 
    File "/usr/lib/python3.2/urllib/request.py", line 139, in urlopen 
    return opener.open(url, data, timeout) 
    File "/usr/lib/python3.2/urllib/request.py", line 361, in open 
    req.timeout = timeout 
AttributeError: 'tuple' object has no attribute 'timeout'

这里是有问题的代码：

url = ['https://www.example.org/', 'https://www.foo.com/', 'http://bar.com'] 
def getUrl(*url): 
    response = urllib.request.urlopen(url) 
    with urllib.request.urlopen(url) as response, open(file_name, 'wb') as out_file: 
     shutil.copyfileobj(response, out_file) 
getUrl()

我已经用尽谷歌试图找到如何用urlopen（）打开一个列表。我找到了一种这样的作品。它需要一个.txt文档并逐行阅读，将每一行作为URL提供，但我正在使用Python 3编写此文档，并且由于某种原因twillcommandloop将不会导入。另外，这种方法很笨拙，需要（据说）不必要的工作。

无论如何，任何帮助将不胜感激。

来源

2014-04-24 Eric Lagergren

你为什么不简单地用'for'循环迭代你的URL列表？ –

回复sheng的评论时才想起来！它会将特定部分作为字符串返回，对吗？ –

在代码中存在一些误区：

你定义可变参数列表（在错误的元组）getUrls;
您管理getUrls参数作为一个变量（列表，而不是）

您可以使用此代码

import urllib2 
import shutil 

urls = ['https://www.example.org/', 'https://www.foo.com/', 'http://bar.com'] 
def getUrl(urls): 
    for url in urls: 
     #Only a file_name based on url string 
     file_name = url.replace('https://', '').replace('.', '_').replace('/','_') 
     response = urllib2.urlopen(url) 
     with open(file_name, 'wb') as out_file: 
     shutil.copyfileobj(response, out_file) 
getUrl(urls)

来源

2014-04-24 20:34:32

谢谢 - 在我的代码的早些时候，我将文件名设置为'/ path/to/directory''加上'domain'，其中'domain'是http：// www'和'.com'之间的字符串。 URL。脚本既FTP和推（通过Git到GitHub页），所以我必须有一个设置文件路径或否则Git部分将无法正常工作。无论如何，再次感谢！欣赏它！ –

它不支持的元组：

urllib.request.urlopen(url[, data][, timeout]) 
Open the URL url, which can be either a string or a Request object.

而且您的电话是不正确。它应该是：

getUrl(url[0],url[1],url[2])

而在函数内部，使用一个类似于“for u in url”的循环来遍历所有的URL。

来源

2014-04-24 20:21:34 Sheng

所以我将不得不创建多个变量？不幸的。谢谢。我很感激。但是有没有更容易通过一个列表？使用for循环而不是var [0]，var [0]等？ –

不，你应该使用'for'循环。 –

@LukasGraf是的，你是对的。忘了吧。 – Sheng

你应该使用for循环只是遍历你的URL尝试：

import shutil 
import urllib.request 


urls = ['https://www.example.org/', 'https://www.foo.com/'] 

file_name = 'foo.txt' 

def fetch_urls(urls): 
    for i, url in enumerate(urls): 
     file_name = "page-%s.html" % i 
     response = urllib.request.urlopen(url) 
     with open(file_name, 'wb') as out_file: 
      shutil.copyfileobj(response, out_file) 

fetch_urls(urls)

我假设你想把内容保存到单独的文件中，所以我用enumerate这里创建一个uniqe文件名，但是您明显可以使用hash(),uuid模块中的任何内容创建slugs。

来源

2014-04-24 20:39:19

使用urlopen打开网址列表

回答

相关问题