我正在使用Python 3.3.1。我创建了一个名为download_file()
的函数,该函数下载文件并将其保存到磁盘。为什么不下载文本文件正常工作?
#!/usr/bin/python3
# -*- coding: utf8 -*-
import datetime
import os
import urllib.error
import urllib.request
def download_file(*urls, download_location=os.getcwd(), debugging=False):
"""Downloads the files provided as multiple url arguments.
Provide the url for files to be downloaded as strings. Separate the
files to be downloaded by a comma.
The function would download the files and save it in the folder
provided as keyword-argument for download_location. If
download_location is not provided, then the file would be saved in
the current working directory. Folder for download_location would be
created if it doesn't already exist. Do not worry about trailing
slash at the end for download_location. The code would take carry of
it for you.
If the download encounters an error it would alert about it and
provide the information about the Error Code and Error Reason (if
received from the server).
Normal Usage:
>>> download_file('http://localhost/index.html',
'http://localhost/info.php')
>>> download_file('http://localhost/index.html',
'http://localhost/info.php',
download_location='/home/aditya/Download/test')
>>> download_file('http://localhost/index.html',
'http://localhost/info.php',
download_location='/home/aditya/Download/test/')
In Debug Mode, files are not downloaded, neither there is any
attempt to establish the connection with the server. It just prints
out the filename and its url that would have been attempted to be
downloaded in Normal Mode.
By Default, Debug Mode is inactive. In order to activate it, we
need to supply a keyword-argument as 'debugging=True', like:
>>> download_file('http://localhost/index.html',
'http://localhost/info.php',
debugging=True)
>>> download_file('http://localhost/index.html',
'http://localhost/info.php',
download_location='/home/aditya/Download/test',
debugging=True)
"""
# Append a trailing slash at the end of download_location if not
# already present
if download_location[-1] != '/':
download_location = download_location + '/'
# Create the folder for download_location if not already present
os.makedirs(download_location, exist_ok=True)
# Other variables
time_format = '%Y-%b-%d %H:%M:%S' # '2000-Jan-01 22:10:00'
# "Request Headers" information for the file to be downloaded
accept = 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'
accept_encoding = 'gzip, deflate'
accept_language = 'en-US,en;q=0.5'
connection = 'keep-alive'
user_agent = 'Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:20.0) \
Gecko/20100101 Firefox/20.0'
headers = {'Accept': accept,
'Accept-Encoding': accept_encoding,
'Accept-Language': accept_language,
'Connection': connection,
'User-Agent': user_agent,
}
# Loop through all the files to be downloaded
for url in urls:
filename = os.path.basename(url)
if not debugging:
try:
request_sent = urllib.request.Request(url, None, headers)
response_received = urllib.request.urlopen(request_sent)
except urllib.error.URLError as error_encountered:
print(datetime.datetime.now().strftime(time_format),
':', filename, '- The file could not be downloaded.')
if hasattr(error_encountered, 'code'):
print(' ' * 22, 'Error Code -', error_encountered.code)
if hasattr(error_encountered, 'reason'):
print(' ' * 22, 'Reason -', error_encountered.reason)
else:
read_response = response_received.read()
output_file = download_location + filename
with open(output_file, 'wb') as downloaded_file:
downloaded_file.write(read_response)
print(datetime.datetime.now().strftime(time_format),
':', filename, '- Downloaded successfully.')
else:
print(datetime.datetime.now().strftime(time_format),
': Debugging :', filename, 'would be downloaded from :\n',
' ' * 21, url)
此功能适用于下载PDF文件,图像和其他格式,但它给文本文件如html文件带来麻烦。我怀疑这个问题必须做一些与此行结尾:
with open(output_file, 'wb') as downloaded_file:
所以,我曾试图wt
模式下打开它。也尝试仅使用w
模式。但是这并不能解决问题。
另一个问题可能已经被编码,所以我也包含第二行:
# -*- coding: utf8 -*-
但是,这仍然无法正常工作。可能是什么问题,以及如何使它适用于文本和二进制文件?什么不起作用
例子:
>>>download_file("http://docs.python.org/3/tutorial/index.html")
当我Gedit的打开它,它显示为:
在Firefox打开时同理:
究竟是什么问题/错误? –
@StephaneRolland:它不会给出任何错误。但是,当我在文本编辑器中打开文档时,它会报告有关编码的问题。我会在一会儿上传图片.. – Aditya
哪个文本编辑器? –