2013-07-09 197 views
2

我有两个文件夹,dir1和dir2。我必须找到两个文件夹(或在子文件夹中)具有相同名称但内容不同的文件。Python - 具有相同名称但内容不同的文件

是这样的:so.1.0/P/Q/SEARCH.C so.1.1/P/Q/SEARCH.C不同

任何想法?

我得到的文件,我需要这样:

import os, sys, fnmatch, filecmp 

folder1 = sys.argv[1] 
folder2 = sys.argv[2] 

filelist1 = [] 

filelist2 = [] 

for root, dirs, files in os.walk(folder1): 
    for filename in fnmatch.filter(files, '*.c'): 
     filelist1.append(os.path.join(root, filename)) 

for root, dirs, files, in os.walk(folder1): 
    for filename in fnmatch.filter(files, '*.h'): 
     filelist1.append(os.path.join(root, filename)) 

for root, dirs, files in os.walk(folder2): 
    for filename in fnmatch.filter(files, '*.c'): 
     filelist2.append(os.path.join(root, filename)) 

for root, dirs, files, in os.walk(folder2): 
    for filename in fnmatch.filter(files, '*.h'): 
     filelist2.append(os.path.join(root, filename)) 

现在我想将文件的两个列表比较,得到它们具有相同的文件名中的条目,并检查它们是否为内容的不同。你怎么看?

+0

[你有什么试过](http://mattgemmell.com/2008/12/08/what-have-you-tried/)? – stalk

回答

1

至于@Martijn回答遍历目的,你可以使用os.walk()

for root, dirs, files in os.walk(path): 
    for name in files: 

并为文件名的比较,我会建议filecmp

>>> import filecmp 
>>> filecmp.cmp('undoc.rst', 'undoc.rst') 
True 
>>> filecmp.cmp('undoc.rst', 'index.rst') 
False 

而对于比较fil e内容结帐difflib

2

使用os.walk()产生(相对于自己的根与路径)在任一目录中的文件列表:

import os 

def relative_files(path): 
    """Generate filenames with pathnames relative to the initial path.""" 
    for root, dirnames, files in os.walk(path): 
     relroot = os.path.relpath(root, path) 
     for filename in files: 
      yield os.path.join(relroot, filename) 

从一个创建一组路径:

root_one = 'so.1.0' # use an absolute path here 
root_two = 'so.1.1' # use an absolute path here 
files_one = set(relative_files(root_one)) 

然后找到所有的另一个根中的路径名通过使用集交集相同:

from itertools import izip_longest 

def different_files(root_one, root_two): 
    """Yield files that differ between the two roots 

    Generate pathnames relative to root_one and root_two that are present in both 
    but have different contents. 

    """ 
    files_one = set(relative_files(root_one)) 
    for same in files_one.intersection(relative_files(root_two)): 
     # same is a relative path, so same file in different roots 
     with open(os.path.join(root_one, same)) as f1, open(os.path.join(root_two, same)) as f2: 
      if any(line1 != line2 for line1, line2 in izip_longest(f1, f2)): 
       # lines don't match, so files don't match! 
       yield same 

itertools.izip_longest()在文件上循环有效地配对行;如果一个文件比另一个文件长,其余行将与None配对,以确保您检测到的文件与另一个不同。

演示:

$ mkdir -p /tmp/so.1.0/p/q 
$ mkdir -p /tmp/so.1.1/p/q 
$ echo 'file one' > /tmp/so.1.0/p/q/search.c 
$ echo 'file two' > /tmp/so.1.1/p/q/search.c 
$ echo 'file three' > /tmp/so.1.1/p/q/ignored.c 
$ echo 'matching' > /tmp/so.1.0/p/q/same.c 
$ echo 'matching' > /tmp/so.1.1/p/q/same.c 

>>> for different in different_files('/tmp/so.1.0', '/tmp/so.1.1'): 
...  print different 
... 
p/q/search.c 
+0

太快:)不错 –

相关问题