2013-07-25 249 views
0

我有这样的文件夹列表:查找父文件夹

u'Magazines/testfolder1', 
u'Magazines/testfolder1/folder1/folder2/folder3', 
u'Magazines/testfolder1/folder1/', 
u'Magazines/testfolder1/folder1/folder2/', 
u'Magazines/testfolder2', 
u'Magazines/testfolder2/folder1/folder2/folder3', 
u'Magazines/testfolder2/folder1/', 
u'Magazines/testfolder2/folder1/folder2/', 
u'Magazines/testfolder3', 
u'Magazines/testfolder3/folder1/folder2/folder3', 
u'Magazines/testfolder3/folder1/', 
u'Magazines/testfolder3/folder1/folder2/', 

现在,我要的是唯一的父文件夹列表中。

即在上面的例子中我想,要减少:

u'Magazines/testfolder1', 
u'Magazines/testfolder2', 
u'Magazines/testfolder3', 

,因为它们都包含子文件夹。

我在我的数据库中递归添加文件夹,所以如果我有testfolder1那么脚本会自动递归其子文件夹。所以我不需要列表中的子文件夹,如果他们的父母也在列表中。

我该怎么做?

回答

2

使用set

>>> list_of_folders = [ 
...  u'Magazines/testfolder1', 
...  u'Magazines/testfolder1/folder1/folder2/folder3', 
...  u'Magazines/testfolder1/folder1/', 
...  u'Magazines/testfolder1/folder1/folder2/', 
...  u'Magazines/testfolder2', 
...  u'Magazines/testfolder2/folder1/folder2/folder3', 
...  u'Magazines/testfolder2/folder1/', 
...  u'Magazines/testfolder2/folder1/folder2/', 
...  u'Magazines/testfolder3', 
...  u'Magazines/testfolder3/folder1/folder2/folder3', 
...  u'Magazines/testfolder3/folder1/', 
...  u'Magazines/testfolder3/folder1/folder2/', 
... ] 
>>> result = set() 
>>> for folder in list_of_folders: 
...  for parent in result: 
...   if folder.startswith(parent): 
...    break 
...  else: 
...   result.add(folder) 
... 
>>> result 
{'Magazines/testfolder3', 'Magazines/testfolder2', 'Magazines/testfolder1'} 

UPDATE

list_of_folders = [ 
    ... 
] 
result = set() 
for folder in list_of_folders: 
    if all(not folder.startswith(parent) for parent in result): 
     result.add(folder) 
print result 
+0

我得到空集,我在这里做了小脚本http://codepad.org/CRGLJC4R。你可以看看 – fdsgds

+1

Outdent最后的“else”,它与“for”匹配,而不是与“if”匹配。并确保您阅读Python文档以了解其工作原理。 –

+0

@UlrichEckhardt,谢谢,但如果父文件夹不在顶部,我不会得到期望的结果。我的意思是如果你把'u'Magazines/testfolder1','在底部然后结果是不同的e,g这个http://codepad.org/0Te9aEmK – fdsgds

0

怎么样使用regular expression

import re 

l = [ 
    u'Magazines/testfolder1', 
    u'Magazines/testfolder1/folder1/folder2/folder3', 
    u'Magazines/testfolder1/folder1/', 
    u'Magazines/testfolder1/folder1/folder2/', 
    u'Magazines/testfolder2', 
    u'Magazines/testfolder2/folder1/folder2/folder3', 
    u'Magazines/testfolder2/folder1/', 
    u'Magazines/testfolder2/folder1/folder2/', 
    u'Magazines/testfolder3', 
    u'Magazines/testfolder3/folder1/folder2/folder3', 
    u'Magazines/testfolder3/folder1/', 
    u'Magazines/testfolder3/folder1/folder2/', 
] 

expect = [ 
    u'Magazines/testfolder1', 
    u'Magazines/testfolder2', 
    u'Magazines/testfolder3', 
] 

result = filter(lambda x: re.match('^[^\/]+\/[^\/]+$', x), l) 

assert expect == result 
0

伴侣我下面beleive是你正在寻找

lst = [ 
u'Magazines/testfolder1', 
u'Magazines/testfolder1/folder1/folder2/folder3', 
u'Magazines/testfolder1/folder1/', 
u'Magazines/testfolder1/folder1/folder2/', 
u'Magazines/testfolder2', 
u'Magazines/testfolder2/folder1/folder2/folder3', 
u'Magazines/testfolder2/folder1/', 
u'Magazines/testfolder2/folder1/folder2/', 
u'Magazines/testfolder3', 
u'Magazines/testfolder3/folder1/folder2/folder3', 
u'Magazines/testfolder3/folder1/', 
u'Magazines/testfolder3/folder1/folder2/' 
] 

    for x in lst: 
     for y in lst[:]: 
      if x in y and len(x)<len(y): 
       lst.remove(y) 
    print lst 

输出

[u'Magazines/testfolder1', u'Magazines/testfolder2', u'Magazines/testfolder3'] 

这个程序反复将删除列表中的子文件夹的解决方案,徒留父夹。

0
l =[u'Magazines/testfolder1', 
    u'Magazines/testfolder1/folder1/folder2/folder3', 
    u'Magazines/testfolder1/folder1/', 
    u'Magazines/testfolder1/folder1/folder2/', 
    u'Magazines/testfolder2', 
    u'Magazines/testfolder2/folder1/folder2/folder3', 
    u'Magazines/testfolder2/folder1/', 
    u'Magazines/testfolder2/folder1/folder2/', 
    u'Magazines/testfolder3', 
    u'Magazines/testfolder3/folder1/folder2/folder3', 
    u'Magazines/testfolder3/folder1/', 
    u'Magazines/testfolder3/folder1/folder2/', ] 

mincount = min(s.count('/') for s in l) 
[d for d in sorted(l) if d.count('/') <= mincount] 
#=> [u'Magazines/testfolder1', u'Magazines/testfolder2', u'Magazines/testfolder3'] 

它不是非常聪明,但它有效的地方有一个共同的根。

+0

您认为父文件夹只包含一个'/'或不包含'/'。如果父母文件夹是“a/b/c/d''? – falsetru

+0

@falsetru这里的父文件夹实际上是'Magazines'。无论如何,这不是一项连贯的任务。如果真的困扰你,你可以找到最小数量的分隔符,并将其用作计数。 – Marcin

+0

如果'l'是'['a','a/b','c/d']',你将得到'['a']'。但它应该是'['a','c/d']'。 – falsetru