总数符合标准的唯一

我有一个列表，rods这是由length和position元组的类似项目。 position始终是一个给定的length独特。我想找到最频繁的棒长度，然后找出所有唯一（由position）相邻棒（包括最频繁）出现的总数。细分：总数符合标准的唯一

首先我想找到最频繁的杆length。
然后我想包括所有其他的杆有相邻的length通过一些标准（在这个例子中为+ -1），但只有他们有一个唯一的位置 - 不会被认为是（通过'最常见'杆最初的小组，或者通过满足相邻标准添加到该小组中的“新杆”）。
并找到这个新的总频率。

我能够通过以下方式来实现这一点，通过排序和使用套，但也许有一个更好的解决方案：

import itertools 
#tuples of (length, position) 
rods = [(18, 21), (17, 2), (15, 3), (14, 21), (14, 5), (13, 6), (13, 7), 
     (13, 8), (13, 9), (13, 10), (13, 11), (13, 12), (13, 13), (13, 14), 
     (13, 15), (13, 16), (13, 17), (13, 18), (13, 19), (13, 20), (13, 21), 
     (13, 22), (13, 23), (13, 24), (13, 25), (13, 26), (12, 5), (12, 21), 
     (12, 2)] 

lengths = [length for length, position in rods] 

#gives tuples of lengths and their frequencies: 
length_freq = (sorted([(k,len(list(j))) for k,j in itertools.groupby(sorted(lengths))], 
       key=lambda x: x[1],reverse=1)) 
best_length = length_freq[0][0] 

#cumulative frequency of rods near best_length, with unique position: 
tally = (len(set((best_length,v) for j,v in rods 
     if best_length - 1 <= j <=best_length + 1))) 

print length_freq 
#output: 
#[(13, 21), (12, 3), (14, 2), (15, 1), (17, 1), (18, 1)] 
print tally 
#output: 
#23

注23为这个测试数据的正确答案。如同length= 14两个杆被定位在点也通过杆与length=15（位置21，和5）占用。并且在position=21处也存在lengths 13 and 12的重叠。

来源

2012-04-19 fraxel

我觉得你是一个全面合理的解决方案，如果有点过度压缩。我的主要建议是将其分解一点。此外，而不是在这里使用groupby，最好使用Counter如果可能的话，或defaultdict如果不是。 groupby适用于预分类材料的惰性操作;如果它没有预先排序并且你不需要它是懒惰的，你可能不应该使用它。

由于Nolen Royalty提供了基于defaultdict的解决方案，因此我将在此处使用Counter，但请参阅下面的替代方案。结果是O（n）算法;由于你的排序，你的是O（n日志n），所以这是一个小的改进。

import collections 

#tuples of (length, position) 
rods = [(18, 21), (17, 2), (15, 3), (14, 21), (14, 5), (13, 6), (13, 7), 
     (13, 8), (13, 9), (13, 10), (13, 11), (13, 12), (13, 13), (13, 14), 
     (13, 15), (13, 16), (13, 17), (13, 18), (13, 19), (13, 20), (13, 21), 
     (13, 22), (13, 23), (13, 24), (13, 25), (13, 26), (12, 5), (12, 21), 
     (12, 2)] 

lengths = (length for length, position in rods) 
length_freq = collections.Counter(lengths) 
((best_length, _),) = length_freq.most_common(1) 
print best_length 

#cumulative frequency of rods near best_length, with unique position: 
rod_filter = ((l, p) for l, p in rods if best_length - 1 <= l <= best_length + 1) 
tally = len(set((best_length, p) for l, p in rod_filter)) 

print length_freq 
print tally

既然你不能使用Counter，为了完整起见，下面是一个替代方案。它是这两条线一个简易替换：

length_freq = collections.Counter(lengths) 
((best_length, _),) = length_freq.most_common(1)

只需更换他们这样的：

length_freq = collections.defaultdict(int) 
for l in lengths: 
    length_freq[l] += 1 
best_length = max(length_freq, key=length_freq.get)

另外请注意，我前面的代码有错误;它现在已经修复了。

来源

2012-04-19 14:25:59 senderle

谢谢，我在2.6这么不幸我不能使用Counter。但是这看起来不错。 – fraxel 2012-04-19 14:38:41

这里是一个似乎是相当合理的，我一个非常简单的方法：

>>> from collections import defaultdict 
>>> rods = [(18, 21), (17, 2), (15, 3), (14, 21), (14, 5), (13, 6), (13, 7), 
...   (13, 8), (13, 9), (13, 10), (13, 11), (13, 12), (13, 13), (13, 14), 
...   (13, 15), (13, 16), (13, 17), (13, 18), (13, 19), (13, 20), (13, 21), 
...   (13, 22), (13, 23), (13, 24), (13, 25), (13, 26), (12, 5), (12, 21), 
...   (12, 2)] 
>>> neighbor_cutoff = 1 
>>> length_to_count = defaultdict(int) 
>>> neighbors_for_length = defaultdict(set) 
>>> for rod in rods: 
...  length_to_count[rod[0]] += 1 
...  neighbors_for_length[rod[0]].add(rod[1]) 
...  for i in range(1, neighbor_cutoff+1): 
...   neighbors_for_length[rod[0]-i].add(rod[1]) 
...   neighbors_for_length[rod[0]+i].add(rod[1]) 
... 
>>> sorted([(length, length_to_count[length]) for length in length_to_count], key=lambda x: x[1], reverse=True) 
[(13, 21), (12, 3), (14, 2), (15, 1), (17, 1), (18, 1)] 
>>> [(length, len(neighbors_for_length[length])) for length in neighbors_for_length] 
[(11, 3), (12, 23), (13, 23), (14, 23), (15, 3), (16, 2), (17, 2), (18, 2), (19, 1)] 
>>> sorted(_, key=lambda x: x[1], reverse=True) 
[(12, 23), (13, 23), (14, 23), (11, 3), (15, 3), (16, 2), (17, 2), (18, 2), (19, 1)] 
>>> neighbors_for_length 
defaultdict(<type 'set'>, {11: set([2, 5, 21]), 12: set([2, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26]), 
13: set([2, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26]), 
14: set([3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26]), 
15: set([3, 21, 5]), 16: set([2, 3]), 17: set([2, 21]), 18: set([2, 21]), 19: set([21])})

来源

2012-04-19 14:20:37

谢谢，我正在努力遵循这一点。最后一行中的所有元素是否与（13,23）虚假不同？元素（11,3）和（19,1）是什么？没有长度为11或19的杆。 – fraxel 2012-04-19 14:51:48

@fraxel'neighbors_for_length'将长度映射到一组具有唯一位置的所有杆，这些位置是该长度的“邻居”（即具有小于1的长度，等于，或比长度大1）。这意味着，由于长度为12的3根不同的定位杆，即使杆不存在，长度为11的杆也有3个“邻居”。我已经添加了neighbors_to_length看起来像什么，也许这会使它更清晰。 – 2012-04-19 15:06:48

总数符合标准的唯一

回答

相关问题