2013-11-26 89 views
0

我有这样的代码accroding到文件类型:下载文件并重新命名使用Python的urllib

import urllib 
from bs4 import BeautifulSoup 
url = "http://www.downloadcrew.com/article/28976-flicflac" 
pageurl = urllib.urlopen(url) 
soup = BeautifulSoup(pageurl) 
app_name = soup.find('div',{'id':'articleTop'}).find('h1',{'id':'articleTitle'}).contents[0].strip() 
download_link = "http://www.downloadcrew.com"+soup.find('div',{'class':'downloadLink'}).find('a')['href'].split(',')[1].strip().strip("'") 
source = urllib.urlopen(download_link).read() 
print "Downloading: "+(app_name) 
filename = (app_name) 
files = open(filename,'w') 
files.write(source) 
files.close() 

当我运行此代码,下载的文件应该是名“flicflac.zip” 但我得到的是不'flicflac.zip'。它是一个文件扩展名。 如何使它自动命名如上?

+0

什么是'print'语句的输出? –

回答

3

您可以查看文件的内容类型,并相应地添加扩展名:

from mimetypes import guess_extension 

source = urllib.urlopen(download_link) 
extension = guess_extension(source.info()['Content-Type']) 
if extension: 
    app_name += extension 
else: 
    # what to do? discard? 
    pass 

# later do source.read() 
+0

如何自动命名而不必手动命名? –

+0

更新了我的答案 –

+0

我很困惑。对于source.read(),为什么存档已损坏? –