火花减少和地图问题

我在Spark做了一个小实验，我遇到了麻烦。火花减少和地图问题

wordCounts is : [('rat', 2), ('elephant', 1), ('cat', 2)] 


# TODO: Replace <FILL IN> with appropriate code 
from operator import add 
totalCount = (wordCounts 
       .map(lambda x: (x,1)) <==== something wrong with this line maybe 
       .reduce(sum))   <====omething wrong with this line maybe 
average = totalCount/float(wordsRDD.map(lambda x: (x,1)).reduceByKey(add).count()) 
print totalCount 
print round(average, 2) 

# TEST Mean using reduce (3b) 
Test.assertEquals(round(average, 2), 1.67, 'incorrect value of average')

来源

2015-06-07 BufBills

我想通了，我的解决方案：

from operator import add 
totalCount = (wordCounts 
       .map(lambda x: x[1]) 
       .reduce(add)) 
average = totalCount/float(wordsRDD.map(lambda x: (x,1)).reduceByKey(add).count()) 
print totalCount 
print round(average, 2)

来源

2015-06-07 18:47:52 BufBills

如果您解决了自己的问题，请将您的答案标记为已接受。不要把这些信息放在问题中。 – Anthon

我自己并不确定，但从查看您的代码我可以看到一些问题。 'map'函数不能与'list_name.map（some stuff）'列表一起使用，你需要像这样调用map函数：'variable = map（function，arguments）'，如果你使用的是python 3，你需要做'variable = list（map（function，arguments））'。希望帮助有些:)

来源

2015-06-07 18:47:16 R21

另一个类似的方式：您还可以阅读清单为重点，值对，并使用distinct（）

from operator import add 
totalCount = (wordCounts 
      .map(lambda (k,v) : v) 
      .reduce(add)) 
average = totalCount/float(wordCounts.distinct().count()) 
print totalCount 
print round(average, 2)

来源

2016-07-27 07:04:06

火花减少和地图问题

回答

相关问题