从目录参数中获取文件，按大小排序

我在编写一个程序，它需要一个命令行参数，扫描参数提供的目录树并创建目录中每个文件的列表，然后对其进行排序通过文件的长度。从目录参数中获取文件，按大小排序

我没有太大的脚本家伙 - 但这是我有什么，它不工作：

import sys 
import os 
from os.path import getsize 

file_list = [] 

#Get dirpath 
dirpath = os.path.abspath(sys.argv[0]) 
if os.path.isdir(dirpath): 
    #Get all entries in the directory 
    for root, dirs, files in os.walk(dirpath): 
     for name in files: 
      file_list.append(name) 
     file_list = sorted(file_list, key=getsize) 
     for item in file_list: 
      sys.stdout.write(str(file) + '\n') 

else: 
    print "not found"

任何人都可以点我在正确的方向？

来源

2013-11-27 wadda_wadda

我建议阅读功能['帮助Ø s.walk']（http://docs.python.org/2/library/os.html#os.walk）。这似乎是处理目录树的正确选择。如果你看看这个函数的例子，你会看到，你正在一个很好的方式... – koffein

我认为你最后一行之前是没有必要的。实际上这行是导致错误... – koffein

@koffein我已经更新了我的代码，但它仍然给我一个错误。 –

希望这个功能能帮助你（我使用Python 2.7）：

import os  

def get_files_by_file_size(dirname, reverse=False): 
    """ Return list of file paths in directory sorted by file size """ 

    # Get list of files 
    filepaths = [] 
    for basename in os.listdir(dirname): 
     filename = os.path.join(dirname, basename) 
     if os.path.isfile(filename): 
      filepaths.append(filename) 

    # Re-populate list with filename, size tuples 
    for i in xrange(len(filepaths)): 
     filepaths[i] = (filepaths[i], os.path.getsize(filepaths[i])) 

    # Sort list by file size 
    # If reverse=True sort from largest to smallest 
    # If reverse=False sort from smallest to largest 
    filepaths.sort(key=lambda filename: filename[1], reverse=reverse) 

    # Re-populate list with just filenames 
    for i in xrange(len(filepaths)): 
     filepaths[i] = filepaths[i][0] 

    return filepaths

来源

2013-11-27 21:28:18

''''''''''''''我读了几次，我发现它可行，但我也注意到，你还没有发现所有的小东西，使你的Python代码更漂亮并可读。我希望你能感谢一些建议：每当你认为你需要为范围（len（some_list））写''i，使用['enumerate']（http://docs.python.org/2/library/functions .html＃enumerate）来代替。如果你想重新填充一个列表，放弃你的“数组思路”，尝试使用类似这样的东西：'lst = [do_something（entry）in entry in lst]'... – koffein

但是，如果生成列表中，您想要重新填充而不需要进一步使用，请考虑使用生成器。所以你不必一遍又一遍地重复列表...节省内存，时间... 如果您已经厌倦了阅读本文，请观看此视频......经过多年Python编程后，我的嘴巴张开了！ [转化代码为美丽的，地道的Python（https://www.youtube.com/watch?v=OSGv2VnC0go） – koffein

'dirname'是在'os.path'功能的保留名称，你不应该使用它作为脚本中的变量名称。该功能很棒BTW！ – Gabriel

您正在提取命令，而不是第一个参数argv[0];使用argv[1]为：

dirpath = sys.argv[1] # argv[0] contains the command itself.

出于性能方面的原因，我建议你预取文件的大小，而不是分选过程中多次询问OS关于同一文件的大小（以Koffein所建议的，os.walk是必经之路去）：

files_list = [] 
for path, dirs, files in os.walk(dirpath)): 
    files_list.extend([(os.path.join(path, file), getsize(os.path.join(path, file))) for file in files])

假设你不需要无序列表中，我们将使用就地sort（）方法：

files_list.sort(key=operator.itemgetter(1))

来源

2013-11-27 21:26:12

'files'-list只是文件名的列表，不是吗？我认为你必须加入''''''''''''''' – koffein

这是一种使用生成器的方法。应该是大量文件的速度更快...

这两个示例的开头：

import os, operator, sys 
dirpath = os.path.abspath(sys.argv[0]) 
# make a generator for all file paths within dirpath 
all_files = (os.path.join(basedir, filename) for basedir, dirs, files in os.walk(dirpath) for filename in files )

如果你只是想没有大小的文件的列表，你可以使用这个：

sorted_files = sorted(all_files, key = os.path.getsize)

但是如果你想在列表中的文件和路径，您可以使用此：

# make a generator for tuples of file path and size: ('/Path/to/the.file', 1024) 
files_and_sizes = ((path, os.path.getsize(path)) for path in all_files) 
sorted_files_with_size = sorted(files_and_sizes, key = operator.itemgetter(1))

来源

2013-11-27 21:52:38 koffein

使用'sorted_files_with_size.reverse（）'首先查看最大的文件。这非常快，对于快速了解哪些文件占用空间很有用。 –

从目录参数中获取文件，按大小排序

回答

相关问题