并行下载多个文件？（Linux/Python？）

我有一个远程文件位置和本地路径的大列表，我希望它们最终结束。每个文件都很小，但其中有很多。我在Python中生成这个列表。并行下载多个文件？（Linux/Python？）

我想在开箱和处理它们之前尽快（并行）下载所有这些文件。什么是最好的库或Linux命令行工具供我使用？我试图用multiprocessing.pool来实现这个功能，但是这并不适用于FTP库。

我看着pycurl，这似乎是我想要的，但我无法让它在Windows 7 x64上运行。

来源

2013-04-25 Mike Furlender

你的问题说你使用Linux，但你提到的Windows 7，这样的平台，你实际使用，或者你需要一个跨平台解决方案？ – Aya 2013-04-25 16:07:37

ftplib有什么问题？ – tdelaney 2013-04-25 16:29:41

尝试wget，在大多数Linux发行版上安装的命令行实用程序，也可通过Windows上的Cygwin获取。

你也可以看看Scrapy，这是一个用Python编写的库/框架。

来源

2013-04-25 15:55:44 piokuc

什么意思是'wget' perl工具？ – Aya 2013-04-25 16:01:02

它是用Perl编写的，不是吗？ – piokuc 2013-04-25 16:04:26

没有。它是[用C写的]（http://en.wikipedia.org/wiki/Wget）。 – Aya 2013-04-25 16:05:05

我通常使用pscp做这样的事情，然后用subprocess.Popen

例如叫它：

pscp_command = '''"c:\program files\putty\pscp.exe" -pw <pwd> -p -scp -unsafe <file location on my linux machine including machine name and login, can use wildcards here> <where you want the files to go on a windows machine>''' 
p = subprocess.Popen(pscp_command, stdout=subprocess.PIPE, stderr=subprocess.PIPE) 
stdout, stderr = p.communicate() 
p.wait()

当然

我假设的linux - >窗口

来源

2013-04-25 15:59:32 Brad

如果请使用multiprocessing模块中的Pool对象，urllib2应处理FTP。

results = {} 
def get_url(url): 
    try: 
     res = urllib2.urlopen(url) 
     # url should start with 'ftp:' 
     results[url] = res.read() 
    except Exception: 
     # add more meaningful exception handling if you need it. Eg, retry once etc. 
     results[url] = None 
pool = Pool(processes=num_processes) 
result = pool.map_async(get_url, url_list) 
pool.close() 
pool.join()

当然，产卵过程会产生一些严重的开销。如果您可以使用第三方模块（如twisted

），则非阻塞请求几乎肯定会更快。开销是否是严重问题取决于每个文件的下载时间和网络延迟的相对大小。

您可以尝试使用python线程而不是进程来实现它，但它有点棘手。请参阅this question的回答以安全地使用线程来使用urllib2。你也需要使用multiprocessing.pool.ThreadPool而不是常规的Pool

来源

2013-04-25 17:46:25 Felipe

知道这是一个旧的帖子，但有一个完美的Linux实用程序。如果您从远程主机传输文件，lftp太棒了！我主要使用它来快速将东西推送到我的ftp服务器，但它使用mirror命令也可以很好地解决问题。它还有一个选项，可以像您想要的那样并行复制用户定义的文件数量。如果你想从远程路径复制一些文件到本地路径，你的命令行看起来就像这样;

lftp 
open ftp://user:[email protected] 
cd some/remote/path 
lcd some/local/path 
mirror --reverse --parallel=2

尽管这个命令要非常小心，就像其他镜像命令一样，如果你搞砸了，你会删除文件。

更多选项或文档lftp我访问过该网站http://lftp.yar.ru/lftp-man.html

来源

2014-11-18 00:29:36

并行下载多个文件？ （Linux/Python？）

回答

相关问题

并行下载多个文件？（Linux/Python？）