2017-02-13 33 views
4

试图实现apriori算法,并使其达到可以提取所有事务中一起出现的子集的地步。如何计算Python中列表中包含的集合的出现次数?

这是我有:

subsets = [set(['Breakfast & Brunch', 'Restaurants']), set(['American (Traditional)', 'Breakfast & Brunch']), set(['American (Traditional)', 'Restaurants']), set(['American (Traditional)', 'Breakfast & Brunch']), set(['Breakfast & Brunch', 'Restaurants']), set(['American (Traditional)', 'Restaurants'])] 

例如set(['Breakfast & Brunch', 'Restaurants'])发生两次 ,我需要用相应的方式跟踪的出现次数一起。

我试着使用:

from collections import Counter 

support_set = Counter() 
# some code that generated the list above 

support_set.update(subsets) 

但它生成此错误:

supported = itemsets_support(transactions, candidates) 
    File "apriori.py", line 77, in itemsets_support 
    support_set.update(subsets) 
    File"/usr/local/Cellar/python/2.7.12/Frameworks/Python.framework/Versions/2.7/lib/python2.7/collections.py", line 567, in update 
    self[elem] = self_get(elem, 0) + 1 
TypeError: unhashable type: 'set' 

任何想法?

+0

这可能不是先验了,你要实现的,但“频繁项集”的想法的幼稚和低效逼近。基准与一些较大的数据集反对ELKI或R的'arules'包装。将所有内容放入“计数器”不会缩放。尝试超市数据集。 –

+0

它是Apriori的一部分。如果它缩小或者不是这个问题,那么它就不是为生产而建造的! – flamenco

+0

不,不是。 Apriori关于这样做效率低下但效率不高。如果你忽视效率方面的话,它不再是Apriori。 –

回答

4

您可以打开设置到frozenset情况下它们是可哈希:

>>> from collections import Counter 
>>> subsets = [set(['Breakfast & Brunch', 'Restaurants']), set(['American (Traditional)', 'Breakfast & Brunch']), set(['American (Traditional)', 'Restaurants']), set(['American (Traditional)', 'Breakfast & Brunch']), set(['Breakfast & Brunch', 'Restaurants']), set(['American (Traditional)', 'Restaurants'])] 
>>> c = Counter(frozenset(s) for s in subsets) 
>>> c 
Counter({frozenset(['American (Traditional)', 'Restaurants']): 2, frozenset(['Breakfast & Brunch', 'Restaurants']): 2, frozenset(['American (Traditional)', 'Breakfast & Brunch']): 2}) 
+0

这解决了我的问题。干杯! – flamenco

相关问题