2014-09-04 30 views
0

我有一个三元组元组列表。前两个项目是经常重复(GPS坐标),而最后一个项目是一个分数(信号强度)去重复元组列表,喜欢某些元组

[(62.45807, -114.41026, 8), 
(62.45807, -114.41026, 11), 
(62.45807, -114.41026, 18), 
(62.45807, -114.41026, 16), 
(62.45807, -114.41026, 9), 
(62.45785, -114.41003, 23), 
(62.45785, -114.41003, 19), 
(62.45785, -114.41003, 11), 
(62.45785, -114.41003, 17), 
(62.45785, -114.41003, 14), 
(62.45785, -114.41003, 11), 
(62.45785, -114.41003, 15), 
(62.45765, -114.40978, 28), 
(62.45765, -114.40978, 16), 
(62.45765, -114.40978, 10), 
(62.45765, -114.40978, 15), 
(62.45765, -114.40978, 25)] 

我想知道如何删除重复的GPS坐标,而宁愿得分最高与此落得:

[(62.45807, -114.41026, 18), 
(62.45785, -114.41003, 23), 
(62.45765, -114.40978, 28)] 

怎么办相同,但平均得分像这样的东西

[(62.45807, -114.41026, 12), 
(62.45785, -114.41003, 16), 
(62.45765, -114.40978, 19)] 
+0

你是怎么试图解决这个问题的? – APerson 2014-09-04 13:15:47

+0

熊猫有你想要的功能。类似的问题在这里:http://stackoverflow.com/questions/12497402/python-pandas-remove-duplicates-by-columns-a-keeping-the-row-with-the-highest – Vicky 2014-09-04 13:22:55

+0

答案如何“太宽泛', 请?我提供了样本输入,预期输出并描述了从一个到另一个的条件。我也得到了及时的答复。我想了解这个问题如何能够做得更好以备将来参考。谢谢。 – user3481267 2014-09-04 16:12:50

回答

2

落得听起来像是工作3210:

>>> from itertools import groupby 

最大:

>>> [max(g, key=lambda x:x[-1]) for k, g in groupby(data, key= lambda x:x[:2])] 
[(62.45807, -114.41026, 18), 
(62.45785, -114.41003, 23), 
(62.45765, -114.40978, 28)] 

平均:

>>> [a + (round(sum(c for _, _, c in b)/float(len(b))),) 
         for a, b in ((k, list(g)) for k, g in 
              groupby(data, key= lambda x:x[:2]))] 
[(62.45807, -114.41026, 12.0), 
(62.45785, -114.41003, 16.0), 
(62.45765, -114.40978, 19.0)] 
+0

谢谢!这是简洁的,并有诀窍。 – user3481267 2014-09-04 16:09:52

0

你可以做一个函数,每个值映射到一个字典,一个关键的GPS坐标,其中该值为分数列表

def create_gps_score_dict(gps_score_list): 
    gps_score_dict = {} 
    for gps_score in gps_score_list: 
     if (gps_score[0], gps_score[1]) in gps_score_dict.keys(): 
      gps_score_dict[(gps_score[0], gps_score[1])].append(gps_score[2]) 
     else: 
      gps_score_dict[(gps_score[0], gps_score[1])] = [gps_score[2]] 
    return gps_score_dict 

现在你可以生成看这个简单字典的结果。

def max_gps_scores(gps_score_dict): 
    gps_score_list = [] 
    for gps, score in gps_score_dict.items(): 
     gps_score_list.append((gps[0], gps[1], max(score)) 

>>> gps_score_list=[(62.45807, -114.41026, 8), 
    (62.45807, -114.41026, 11), 
    (62.45807, -114.41026, 18), 
    (62.45807, -114.41026, 16), 
    (62.45807, -114.41026, 9), 
    (62.45785, -114.41003, 23), 
    (62.45785, -114.41003, 19), 
    (62.45785, -114.41003, 11), 
    (62.45785, -114.41003, 17), 
    (62.45785, -114.41003, 14), 
    (62.45785, -114.41003, 11), 
    (62.45785, -114.41003, 15), 
    (62.45765, -114.40978, 28), 
    (62.45765, -114.40978, 16), 
    (62.45765, -114.40978, 10), 
    (62.45765, -114.40978, 15), 
    (62.45765, -114.40978, 25)] 

>>> max_gps_scores(create_gps_score_dict(gps_score_list)) 
[(62.45807, -114.41026, 18), (62.45765, -114.40978, 28), (62.45785, -114.41003,23)] 

我会离开平均高达你!