我的代码存在问题。Python在一个循环中下载多个文件
#!/usr/bin/env python3.1
import urllib.request;
# Disguise as a Mozila browser on a Windows OS
userAgent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)';
URL = "www.example.com/img";
req = urllib.request.Request(URL, headers={'User-Agent' : userAgent});
# Counter for the filename.
i = 0;
while True:
fname = str(i).zfill(3) + '.png';
req.full_url = URL + fname;
f = open(fname, 'wb');
try:
response = urllib.request.urlopen(req);
except:
break;
else:
f.write(response.read());
i+=1;
response.close();
finally:
f.close();
当我创建urllib.request.Request对象(称为req)时,问题似乎出现了。我用一个不存在的URL创建它,但后来我改变了它应该是的网址。我这样做,以便我可以使用相同的urllib.request.Request对象,而不必在每次迭代中创建新的。在python中可能有一种机制可以完成,但我不确定它是什么。
EDIT 错误信息是:
>>> response = urllib.request.urlopen(req);
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.1/urllib/request.py", line 121, in urlopen
return _opener.open(url, data, timeout)
File "/usr/lib/python3.1/urllib/request.py", line 356, in open
response = meth(req, response)
File "/usr/lib/python3.1/urllib/request.py", line 468, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python3.1/urllib/request.py", line 394, in error
return self._call_chain(*args)
File "/usr/lib/python3.1/urllib/request.py", line 328, in _call_chain
result = func(*args)
File "/usr/lib/python3.1/urllib/request.py", line 476, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden
编辑2:我的解决方案如下。
import urllib.request;
# Disguise as a Mozila browser on a Windows OS
userAgent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)';
# Counter for the filename.
i = 0;
while True:
fname = str(i).zfill(3) + '.png';
URL = "www.example.com/img" + fname;
f = open(fname, 'wb');
try:
req = urllib.request.Request(URL, headers={'User-Agent' : userAgent});
response = urllib.request.urlopen(req);
except:
break;
else:
f.write(response.read());
i+=1;
response.close();
finally:
f.close();
什么是错误信息?此外,python不需要分号结束一行。 – Dikei 2012-03-28 02:37:03
我已添加错误消息。我知道我不需要分号但我更愿意添加它们。网址和文件存在。唯一的问题是,我用无效的url创建req对象,然后在使用req之前更正了url。这似乎是导致错误。 – s5s 2012-03-28 02:41:08
是的。该网址是有效的。这就是它导致问题的原因。我也可以访问url,wget它并用Python下载它,如果我没有循环,所以我在创建它时将req对象中的url设置为正确。 – s5s 2012-03-28 02:44:11