2017-09-09 47 views
1

我试图从一个嵌套列表看起来像这样删除重复的子列表:删除嵌套表副本(不除去子列表重复元素)

result_set = [ 
    ['MEMS', 'MEMS', 'MEMS', 'MEMS'], 
    ['Microfluidics', 'Microfluidics', 'Microfluidics', 'Microfluidics', 'Microfluidics', 'Microfluidics', 'Microfluidics'], 
    ['Microfabrication', 'Microfabrication', 'Microfabrication', 'Clean-Room Microfabrication', 'Microfabrication', 'Microfabrication'], 
    ['Photolithography', 'Photolithography', 'Lithography', 'Photolithography'], 
    ['MEMS', 'MEMS', 'MEMS', 'MEMS'] 
    ] 

我想输出如下:

result_set = [ 
    ['MEMS', 'MEMS', 'MEMS', 'MEMS'], 
    ['Microfluidics', 'Microfluidics', 'Microfluidics', 'Microfluidics', 'Microfluidics', 'Microfluidics', 'Microfluidics'], 
    ['Microfabrication', 'Microfabrication', 'Microfabrication', 'Clean-Room Microfabrication', 'Microfabrication', 'Microfabrication'], 
    ['Photolithography', 'Photolithography', 'Lithography', 'Photolithography'] 
    ] 

请注意,基本上最后一个元素['MEMS','MEMS','MEMS','MEMS']不再存在。 Similar questions一直在问我从那里改编下面的代码:

result_set = set(frozenset(x) for x in result) 
lst = [list(x) for x in result_set] 

我的问题是,我得到以下输出:

result_set = [['MEMS'], ['Microfluidics'], ['Microfabrication', 'Clean-Room Microfabrication'], ['Photolithography', 'Lithography']] 

注意到它还会删除子表中的重复元素。我不想要这个,因为我之后的目标是绘制直方图。比如说 - > MEMS有4次发生。因此,我想跟踪每个子列表最初的元素数量。

+1

如果你的问题得到回答,你应该[接受](https://stackoverflow.com/help/someone-answers),帮助大部分的答案。 –

回答

3

如果顺序并不重要,你可以使用一个set

final_data = list(map(list, set(map(tuple, result_set)))) 

输出:

[['Microfabrication', 'Microfabrication', 'Microfabrication', 'Clean-Room Microfabrication', 'Microfabrication', 'Microfabrication'], ['Microfluidics', 'Microfluidics', 'Microfluidics', 'Microfluidics', 'Microfluidics', 'Microfluidics', 'Microfluidics'], ['Photolithography', 'Photolithography', 'Lithography', 'Photolithography'], ['MEMS', 'MEMS', 'MEMS', 'MEMS']] 

如果为了事情呢,你可以试试这个:

final_data = [] 
for result in result_set: 
    if result not in final_data: 
     final_data.append(result) 

输出:

[['MEMS', 'MEMS', 'MEMS', 'MEMS'], ['Microfluidics', 'Microfluidics', 'Microfluidics', 'Microfluidics', 'Microfluidics', 'Microfluidics', 'Microfluidics'], ['Microfabrication', 'Microfabrication', 'Microfabrication', 'Clean-Room Microfabrication', 'Microfabrication', 'Microfabrication'], ['Photolithography', 'Photolithography', 'Lithography', 'Photolithography']] 
0

使用collections.OrderedDict来重新训练独特项目的顺序。

from collections import OrderedDict 

out = list(
      map(
       list, OrderedDict.fromkeys(map(tuple, result_set)).keys() 
      ) 
    ) 
print(out) 

[['MEMS', 'MEMS', 'MEMS', 'MEMS'], 
['Microfluidics', 
    'Microfluidics', 
    'Microfluidics', 
    'Microfluidics', 
    'Microfluidics', 
    'Microfluidics', 
    'Microfluidics'], 
['Microfabrication', 
    'Microfabrication', 
    'Microfabrication', 
    'Clean-Room Microfabrication', 
    'Microfabrication', 
    'Microfabrication'], 
['Photolithography', 'Photolithography', 'Lithography', 'Photolithography']] 
0

排序列表,然后使用itertools.groupby()生成的密钥创建一个新列表。

import itertools 
result_set.sort() 
new_set = [k for k,g in itertools.groupby(result_set)]