我有以下数据,我想要做的是PySpark reduceByKey?添加键/元组
[(13, 'D'), (14, 'T'), (32, '6'), (45, 'T'), (47, '2'), (48, '0'), (49, '2'), (50, '0'), (51, 'T'), (53, '2'), (54, '0'), (13, 'A'), (14, 'T'), (32, '6'), (45, 'A'), (47, '2'), (48, '0'), (49, '2'), (50, '0'), (51, 'X')]
针对每一个琴键计值的情况下(1串字符)。所以,我首先做了一个地图:
.map(lambda x: (x[0], [x[1], 1]))
使现在的关键/元组:
[(13, ['D', 1]), (14, ['T', 1]), (32, ['6', 1]), (45, ['T', 1]), (47, ['2', 1]), (48, ['0', 1]), (49, ['2', 1]), (50, ['0', 1]), (51, ['T', 1]), (53, ['2', 1]), (54, ['0', 1]), (13, ['A', 1]), (14, ['T', 1]), (32, ['6', 1]), (45, ['A', 1]), (47, ['2', 1]), (48, ['0', 1]), (49, ['2', 1]), (50, ['0', 1]), (51, ['X', 1])]
我只是不能在最后一部分搞清楚那封信如何为每个键计数的情况下, 。例如键13将有1 d和1个A.虽然14将有2度T的等
你想第一个'groupByKey',然后在已分组的角色执行的计数。 – ohruunuruus