这里是uniqueifies每个字符串一个纯Python的解决方案,加入集合,然后计算结果(使用Divakar的例子列表)
>>> li=['er', 'IS' , 'you', 'Is', 'is', 'er', 'IS']
>>> Counter(e for sl in map(list, map(set, li)) for e in sl)
Counter({'I': 3, 'e': 2, 's': 2, 'S': 2, 'r': 2, 'o': 1, 'i': 1, 'u': 1, 'y': 1})
如果你想上限和下限案件算作同一封信:
>>> Counter(e for sl in map(list, map(set, [s.lower() for s in li])) for e in sl)
Counter({'i': 4, 's': 4, 'e': 2, 'r': 2, 'o': 1, 'u': 1, 'y': 1})
现在,让我们一次:
from __future__ import print_function
from collections import Counter
import numpy as np
import pandas as pd
def dawg(li):
return Counter(e for sl in map(list, map(set, li)) for e in sl)
def nump(a):
chars = np.asarray(a).view('S1')
valid_chars = chars[chars!='']
unqchars, count = np.unique(valid_chars, return_counts=1)
return pd.DataFrame({'char':unqchars, 'count':count})
if __name__=='__main__':
import timeit
li=['er', 'IS' , 'you', 'Is', 'is', 'er', 'IS']
for f in (dawg, nump):
print(" ",f.__name__, timeit.timeit("f(li)", setup="from __main__ import f, li", number=100))
结果:
dawg 0.00134205818176
nump 0.0347728729248
Python的解决方案显著加快在这种情况下
灿你解释“最常见的独特性格”是什么意思?并包含一些示例输入和输出数据 –
@Chris_Rands如果您需要更多,请使用示例编辑lmk。 – ZtoYi
所以你只想要最频繁的角色或所有角色的频率? –