2011-11-30 95 views
5

我试图从我的服务器每天下载一个备份文件到我的本地存储服务器,但我有一些问题。通过python下载大文件

我写了这个代码(去除无用的部分,如电子邮件功能):

import os 
from time import strftime 
from ftplib import FTP 
import smtplib 
from email.MIMEMultipart import MIMEMultipart 
from email.MIMEBase import MIMEBase 
from email.MIMEText import MIMEText 
from email import Encoders 

day = strftime("%d") 
today = strftime("%d-%m-%Y") 

link = FTP(ftphost) 
link.login(passwd = ftp_pass, user = ftp_user) 
link.cwd(file_path) 
link.retrbinary('RETR ' + file_name, open('/var/backups/backup-%s.tgz' % today, 'wb').write) 
link.delete(file_name) #delete the file from online server 
link.close() 
mail(user_mail, "Download database %s" % today, "Database sucessfully downloaded: %s" % file_name) 
exit() 

和我一起像一个crontab运行此:

40 23 * * * python /usr/bin/backup-transfer.py >> /var/log/backup-transfer.log 2>&1 

它适用于小文件,但与备份文件(约1.7Gb)冻结,下载的文件约1.2Gb,然后永远不会成长(我等了一天),日志文件是空的。

有什么想法?

p.s:im使用Python 2.6.5

+0

为了进一步解决问题,也许你可以使用'FTP.retrbinary'中的'callback'参数来收集更多关于下载进度的信息。另外,使用'maxblocksize'可能会发现一些网络问题。 – jcollado

回答

6

很抱歉,如果我回答我的问题,但我找到了解决办法。

我tryed ftputil没有成功,所以我tryed很多办法,最后,这个工程:

def ftp_connect(path): 
    link = FTP(host = 'example.com', timeout = 5) #Keep low timeout 
    link.login(passwd = 'ftppass', user = 'ftpuser') 
    debug("%s - Connected to FTP" % strftime("%d-%m-%Y %H.%M")) 
    link.cwd(path) 
    return link 

downloaded = open('/local/path/to/file.tgz', 'wb') 

def debug(txt): 
    print txt 

link = ftp_connect(path) 
file_size = link.size(filename) 

max_attempts = 5 #I dont want death loops. 

while file_size != downloaded.tell(): 
    try: 
     debug("%s while > try, run retrbinary\n" % strftime("%d-%m-%Y %H.%M")) 
     if downloaded.tell() != 0: 
      link.retrbinary('RETR ' + filename, downloaded.write, downloaded.tell()) 
     else: 
      link.retrbinary('RETR ' + filename, downloaded.write) 
    except Exception as myerror: 
     if max_attempts != 0: 
      debug("%s while > except, something going wrong: %s\n \tfile lenght is: %i > %i\n" % 
       (strftime("%d-%m-%Y %H.%M"), myerror, file_size, downloaded.tell()) 
      ) 
      link = ftp_connect(path) 
      max_attempts -= 1 
     else: 
      break 
debug("Done with file, attempt to download m5dsum") 
[...] 

在我的日志文件,我发现:

01-12-2011 23.30 - Connected to FTP 
01-12-2011 23.30 while > try, run retrbinary 
02-12-2011 00.31 while > except, something going wrong: timed out 
    file lenght is: 1754695793 > 1754695793 
02-12-2011 00.31 - Connected to FTP 
Done with file, attempt to download m5dsum 

可悲的是,我必须重新连接到FTP即使文件已经完全下载,在我的CAS不是问题,因为我也必须下载md5sum。如您所见,我无法检测超时并重试连接,但当我超时时,我只需重新连接;如果有人知道如何重新连接而不创建新的ftplib.FTP实例,请告诉我;)

2

您可以尝试设置超时。从docs

# timeout in seconds 
link = FTP(host=ftp_host, user=ftp_user, passwd=ftp_pass, acct='', timeout=3600)