可能是一个非常蹩脚的问题。 我有两个文件,我想找到这两个文件在地图中减少的方式,然后比较重叠的重叠(可以说我有一些措施来做到这一点)还原剂后进一步处理
所以这是我的想法:
1) Run the normal wordcount job on one document (https://sites.google.com/site/hadoopandhive/home/hadoop-how-to-count-number-of-times-a-word-appeared-in-a-file-using-map-reduce-framework)
2) But rather than saving a file, save everything in a HashMap(word,true)
3) Pass that HashMap along the second wordcount mapreduce program and then as I am processing the second document, check the words against the HashMap to find whether the word is present or not.
所以,像这样
1) HashMap<String, boolean> hm = runStepOne(); <-- map reduce job
2) runSteptwo(HashMap<String, boolean>)
如何在Hadoop中做到这一点