使用MapReduce，如何修改以下字数计数码，使其仅输出高于某个计数阈值的字？（例如，我想添加某种键值对的过滤。）MapReduce：如果值不在阈值以上，则筛选出键值对

输入：

ant bee cat 
bee cat dog 
cat dog

输出：让说计数阈值是2个或更多

cat 3 
dog 2

继代码是：http://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html#Source+Code

public static class Map1 extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { 
    private final static IntWritable one = new IntWritable(1); 
    private Text word = new Text(); 

    public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { 
    String line = value.toString(); 
    StringTokenizer tokenizer = new StringTokenizer(line); 
    while (tokenizer.hasMoreTokens()) { 
     word.set(tokenizer.nextToken()); 
     output.collect(word, one); 
    } 
    } 
} 

public static class Reduce1 extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { 
    public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { 
    int sum = 0; 
    while (values.hasNext()) { 
     sum += values.next().get(); 
    } 
    output.collect(key, new IntWritable(sum)); 
    } 
}

编辑：RE：约输入/测试用例

输入文件（ “example.dat”）和一个简单的测试的情况下（ “测试用例”）被在这里找到：https://github.com/csiu/tokens/tree/master/other/SO-26695749

编辑：

问题不是代码。这是由于org.apache.hadoop.mapred包之间的一些奇怪行为造成的。（Is it better to use the mapred or the mapreduce package to create a Hadoop Job?）。

点：使用if语句代替`org.apache.hadoop.mapreduce`

来源

2014-11-02 csiu

尝试增加的收集输出降低了。

if(sum >= 2) 
    output.collect(key, new IntWritable(sum));

来源

2014-11-02 03:34:23 irrelephant

当我做这样的事情，我错过了大约一半我的预期产出。 Reducer不收集/发出键值对是否合理？ – csiu 2014-11-02 03:43:04

不，这不应该发生。你能否在这个问题上发表更多细节？ – irrelephant 2014-11-02 03:45:03

当我尝试了你的建议（在实际输入'example.dat' - 请参阅上面的链接）时，我预计单词“0”的计数为594。但是，当我将阈值设置为590时，没有返回此值的计数。 – csiu 2014-11-02 04:23:30

你可以做过滤在降低1类：

if (sum>=2) { 
    output.collect(key. new IntWritable(sum)); 
}

来源

2014-11-02 03:34:55

当我做这样的事情时，我大概错过了我预期产出的一半。 Reducer不收集/发出键值对是否合理？ – csiu 2014-11-02 03:43:45

你可以显示一些导致这个问题的输入行吗？ – 2014-11-02 03:55:34

问题是我在做例如检查时发现的。字“0” - 我预计计数为594，但计数在设置590的阈值时未返回。 – csiu 2014-11-02 04:21:52

MapReduce：如果值不在阈值以上，则筛选出键值对

点：使用if语句代替org.apache.hadoop.mapreduce

回答

相关问题

点：使用if语句代替`org.apache.hadoop.mapreduce`