2012-12-28 21 views
10

好友列表,我的词典列表:Python。手法配合字典

my_list = 
[ 
{'oranges':'big','apples':'green'}, 
{'oranges':'big','apples':'green','bananas':'fresh'}, 
{'oranges':'big','apples':'red'}, 
{'oranges':'big','apples':'green','bananas':'rotten'} 
] 

我想创建一个新的列表,其中部分会去掉重复的。

在我的情况下,该字典必须予以消除:

{'oranges':'big','apples':'green'} 

,因为它重复再词典:

{'oranges':'big','apples':'green','bananas':'fresh'} 
{'oranges':'big','apples':'green','bananas':'rotten'} 

因此,期望的结果:

[ 
{'oranges':'big','apples':'green','bananas':'fresh'}, 
{'oranges':'big','apples':'red'}, 
{'oranges':'big','apples':'green','bananas':'rotten'} 
] 

怎么办它?太感谢了!

+1

是你的意思是,如果一个较短的字典是一个较长的字典子集,那么过滤出来,对不对? –

+0

第一步是决定如何将某些东西标记为部分重复。这只是密钥对发生多次? –

+0

@Shawn。是的先生。完全正确! –

回答

3

尝试以下操作执行

注意,在我的实现,我预分类和选择只有2个组合,以减少迭代次数。 这将确保关键是始终小于或等于在尺寸上与干草

>>> my_list =[ 
{'oranges':'big','apples':'green'}, 
{'oranges':'big','apples':'green','bananas':'fresh'}, 
{'oranges':'big','apples':'red'}, 
{'oranges':'big','apples':'green','bananas':'rotten'} 
] 

#Create a function remove_dup, name it anything you want 
def remove_dup(lst): 
    #import combinations for itertools, mainly to avoid multiple nested loops 
    from itertools import combinations 
    #Create a generator function dup_gen, name it anything you want 
    def dup_gen(lst): 
     #Now read the dict pairs, remember key is always shorter than hay in length 
     for key, hay in combinations(lst, 2): 
      #if key is in hay then set(key) - set(hay) = empty set 
      if not set(key) - set(hay): 
       #and if key is in hay, yield it 
       yield key 
    #sort the list of dict based on lengths after converting to a item tuple pairs 
    #Handle duplicate elements, thanks to DSM for pointing out this boundary case 
    #remove_dup([{1:2}, {1:2}]) == [] 
    lst = sorted(set(tuple(e.items()) for e in lst), key = len) 
    #Now recreate the dictionary from the set difference of 
    #the original list and the elements generated by dup_gen 
    #Elements generated by dup_gen are the duplicates that needs to be removed 
    return [dict(e) for e in set(lst) - set(dup_gen(lst))] 

remove_dup(my_list) 
[{'apples': 'green', 'oranges': 'big', 'bananas': 'fresh'}, {'apples': 'green', 'oranges': 'big', 'bananas': 'rotten'}, {'apples': 'red', 'oranges': 'big'}] 

remove_dup([{1:2}, {1:2}]) 
[{1: 2}] 

remove_dup([{1:2}]) 
[{1: 2}] 

remove_dup([]) 
[] 

remove_dup([{1:2}, {1:3}]) 
[{1: 2}, {1: 3}] 

更快实现

def remove_dup(lst): 
    #sort the list of dict based on lengths after converting to a item tuple pairs 
    #Handle duplicate elements, thanks to DSM for pointing out this boundary case 
    #remove_dup([{1:2}, {1:2}]) == [] 
    lst = sorted(set(tuple(e.items()) for e in lst), key = len) 
     #Generate all the duplicates 
    dups = (key for key, hay in combinations(lst, 2) if not set(key).difference(hay)) 
    #Now recreate the dictionary from the set difference of 
    #the original list and the duplicate elements 
    return [dict(e) for e in set(lst).difference(dups)] 
+1

@MostafaR:{'a':'b','a':'b'}实际上是{' a':'b'}并且通过集合论一个集合是它自己的一个子集 – Abhijit

+1

@MostafaR:'{'a':'b','a':'b'} == {'a':'b' }'。 – Blender

+0

非常感谢,效果很棒! –

2

这里有一个实现你可以使用: -

>>> my_list = [ 
{'oranges':'big','apples':'green'}, 
{'oranges':'big','apples':'green','bananas':'fresh'}, 
{'oranges':'big','apples':'red'}, 
{'oranges':'big','apples':'green','bananas':'rotten'} 
] 

>>> def is_subset(d1, d2): 
     return all(item in d2.items() for item in d1.items()) 
     # or 
     # return set(d1.items()).issubset(set(d2.items())) 

>>> [d for d in my_list if not any(is_subset(d, d1) for d1 in my_list if d1 != d)] 
[{'apples': 'green', 'oranges': 'big', 'bananas': 'fresh'}, 
{'apples': 'red', 'oranges': 'big'}, 
{'apples': 'green', 'oranges': 'big', 'bananas': 'rotten'}] 

对于每个字词dmy_list: -

any(is_subset(d, d1) for d1 in my_list if d1 != d) 

检查是否,它是任何其它dictmy_list一个子集。如果返回True,那么至少有一个字典,其子集为d。所以,我们拿它的not从列表中排除d

+0

非常感谢,效果很棒! –

1

简短的回答

def is_subset(d1, d2): 
    # Check if d1 is subset of d2 
    return all(item in d2.items() for item in d1.items()) 

filter(lambda x: len(filter(lambda y: is_subset(x, y), my_list)) == 1, my_list) 
+0

这真的很聪明,你在世界上是怎么想出来的? – george

+0

你的答案与Rohit的区别不大,只不过你用多个过滤器遮盖了它 – Abhijit

5

第一个[好,第二,有一些编辑..]我想到的事情是这样的:

def get_superdicts(dictlist): 
    superdicts = [] 
    for d in sorted(dictlist, key=len, reverse=True): 
     fd = set(d.items()) 
     if not any(fd <= k for k in superdicts): 
      superdicts.append(fd) 
    new_dlist = map(dict, superdicts) 
    return new_dlist 

这给:

>>> a = [{'apples': 'green', 'oranges': 'big'}, {'apples': 'green', 'oranges': 'big', 'bananas': 'fresh'}, {'apples': 'red', 'oranges': 'big'}, {'apples': 'green', 'oranges': 'big', 'bananas': 'rotten'}] 
>>> 
>>> get_superdicts(a) 
[{'apples': 'red', 'oranges': 'big'}, 
{'apples': 'green', 'oranges': 'big', 'bananas': 'rotten'}, 
{'bananas': 'fresh', 'oranges': 'big', 'apples': 'green'}] 

[原本我在这里用的是frozenset,以为我可以做一些巧妙的设置操作,但显然没有,我们走来了什么]

+0

你可以用'fd <= k'替换'fd.issubset(k)'。 – Blender

+0

@Blender:好点,编辑。它仍然觉得应该有一些基于滑动设置的技巧。 – DSM

1

我觉得它有一个更好的时间顺序:

def is_subset(a, b): 
    return not set(a) - set(b) 

def remove_extra(my_list): 
    my_list = [d.items() for d in my_list] 
    my_list.sort() 

    result = [] 
    for i in range(len(my_list) - 1): 
     if not is_subset(my_list[i], my_list[i + 1]): 
      result.append(dict(my_list[i])) 
    result.append(dict(my_list[-1])) 

    return result 

print remove_extra([ 
     {'oranges':'big','apples':'green'}, 
     {'oranges':'big','apples':'green','bananas':'fresh'}, 
     {'oranges':'big','apples':'red'}, 
     {'oranges':'big','apples':'green','bananas':'rotten'} 
    ])