2015-06-07 82 views
1

我在Spark做了一个小实验,我遇到了麻烦。火花减少和地图问题

wordCounts is : [('rat', 2), ('elephant', 1), ('cat', 2)] 


# TODO: Replace <FILL IN> with appropriate code 
from operator import add 
totalCount = (wordCounts 
       .map(lambda x: (x,1)) <==== something wrong with this line maybe 
       .reduce(sum))   <====omething wrong with this line maybe 
average = totalCount/float(wordsRDD.map(lambda x: (x,1)).reduceByKey(add).count()) 
print totalCount 
print round(average, 2) 

# TEST Mean using reduce (3b) 
Test.assertEquals(round(average, 2), 1.67, 'incorrect value of average') 

回答

2

我想通了,我的解决方案:

from operator import add 
totalCount = (wordCounts 
       .map(lambda x: x[1]) 
       .reduce(add)) 
average = totalCount/float(wordsRDD.map(lambda x: (x,1)).reduceByKey(add).count()) 
print totalCount 
print round(average, 2) 
+2

如果您解决了自己的问题,请将您的答案标记为已接受。不要把这些信息放在问题中。 – Anthon

1

我自己并不确定,但从查看您的代码我可以看到一些问题。 'map'函数不能与'list_name.map(some stuff)'列表一起使用,你需要像这样调用map函数:'variable = map(function,arguments)',如果你使用的是python 3,你需要做'variable = list(map(function,arguments))'。 希望帮助有些:)

0

另一个类似的方式: 您还可以阅读清单为重点,值对,并使用distinct()

from operator import add 
totalCount = (wordCounts 
      .map(lambda (k,v) : v) 
      .reduce(add)) 
average = totalCount/float(wordCounts.distinct().count()) 
print totalCount 
print round(average, 2)