使用os.walk方法获取包含两种类型文件的目录路径

我想在Python 2.7中使用os() .walk方法列出包含docx文件的所有文件夹。我设法用下面编写的代码来做到这一点，但是我想知道是否可以限制这个列表仅显示包含两种特定文件类型的文件夹（例如“docx”和“pdf”）？使用os.walk方法获取包含两种类型文件的目录路径

import os 
import walk 

a = open("output.txt", "w") 
for path, subdirs, files in os.walk(r'C:\Users\Stephen\Desktop'): 
    for filename in files: 
     if filename.endswith(('.docx')): 
      f = os.path.join(path, filename) 
      a.write(str(f) + os.linesep)

来源

2017-01-01 stjepan

只跳过没有至少那两个扩展名的目录;每个目录下的文件列表有限，它的廉价使用any()测试特定扩展名：

for path, subdirs, files in os.walk(r'C:\Users\Stephen\Desktop'): 
    if not (any(f.endswith('.pdf') for f in files) and 
      any(f.endswith('.docx') for f in files)): 
     # no PDF or Word files here, skip 
     continue 
    # directory contains *both* PDF and Word documets

当扩展测试变得有点长的列表，你可能只想创建一组所有可用的扩展：

for path, subdirs, files in os.walk(r'C:\Users\Stephen\Desktop'): 
    extensions = {os.path.splitext(f)[-1] for f in files} 
    if not extensions >= {'.pdf', '.docx', '.odt', '.wpf'}: 
     # directory doesn't contain *all* required file types 
     continue

>=测试如果右侧集是左侧的一个子集（以便extensions为superset of the right-hand set）;所以extensions应至少包含所有命名右侧扩展：

>>> {'.foo', '.docx', '.pdf', '.odt'} >= {'.pdf', '.docx', '.odt', '.wpf'} # missing .wpf 
False 
>>> {'.foo', '.wpf', '.docx', '.pdf', '.odt'} >= {'.pdf', '.docx', '.odt', '.wpf'} # complete 
True

来源

2017-01-01 15:10:25

谢谢！第二个例子像魅力一样工作。 – stjepan

我可以添加条件：docx和pdf文件必须以数字开头，并使用isdigit方法？ – stjepan

@stjepan：当然可以。 'f [0] .isdigit（）'会测试是否有第一个字符是数字。您可能需要重新测试所有测试以涵盖多个方面。 –

这？

import os 

a = open("output.txt", "w") 
for path, subdirs, files in os.walk(r'C:\Users\Stephen\Desktop'): 
    docx = False 
    pdf = False 
    rest = True 
    for filename in files: 
     if filename.endswith(('.docx')): 
      docx = True 
     elif filename.endswith(('.pdf')): 
      pdf = True 
     else: 
      rest = False 
      break 
    if docx and pdf and rest: 
     f = os.path.join(path, filename) 
     a.write(str(f) + os.linesep)

来源

2017-01-01 15:23:16 Organis

谢谢，这也工作。我不得不从f = os.path.join（path，filename）中删除“，filename”来获取路径。 – stjepan

使用os.walk方法获取包含两种类型文件的目录路径

回答

相关问题