Hadoop MapReduce：返回文本文件中单词的排序列表

所以我的任务是返回一个包含在文本文件中的所有单词的排序列表，同时保留重复。Hadoop MapReduce：返回文本文件中单词的排序列表

{生存还是毁灭} - →{是不或向}

我的想法是把每个单词为重点，以及价值。这样，因为hadoop对键进行排序，它们将自动按字母顺序排序。在Reduce阶段，我只需将具有相同键（所以基本上相同的单词）的所有单词附加到单个文本值。

public class WordSort { 

    public static class Map extends Mapper<LongWritable, Text, Text, Text> { 

    private Text word = new Text(); 

    public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { 
     String line = value.toString(); 
     StringTokenizer tokenizer = new StringTokenizer(line); 
     while (tokenizer.hasMoreTokens()) { 
     word.set(tokenizer.nextToken()); 
     // transform to lower case 
     String lower = word.toString().toLowerCase(); 
     context.write(new Text(lower), new Text(lower)); 
     } 
    } 
    } 

    public static class Reduce extends Reducer<Text, Text, Text, Text> { 

    public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException { 
     String result = ""; 
     for (Text value : values){ 
     res += value.toString() + " "; 
     } 
     context.write(key, new Text(result)); 
    } 
    }

但我的问题是，如何我只是在我的输出文件返回值？目前，我有这样的：

be be be 
not not 
or or 
to to to

因此，在每一行我有钥匙，然后再价值观，但我只想让我得到返回的值是：

be be 
not 
or 
to to

是这甚至可能或者我必须从每个单词的值中删除一个条目？

来源

2012-11-03 gaussd

声明：我不是Hadoop用户，但我用CouchDB做了很多Map/Reduce。

如果你只是需要钥匙，为什么你不发出一个空值？

此外，它听起来像你不想减少它们，因为你想获得每一个事件的关键。

来源

2012-11-03 10:31:17

哦，我觉得只是一个冒落空值是显而易见的解决方案：d！是的，用maprecude解决这个任务对我来说似乎也很奇怪......但我没有创造它......我的老师做到了。 – gaussd

确实有很多情况下，您只使用Map/Reduce的“地图”部分... –

只是试图与Hadoop的的MaxTemperature例子 - 权威指南和下面的代码工作

context.write(null, new Text(result));

来源

2012-11-03 11:18:53

那么这将是什么类型？ NullWritable？ – gaussd

had job.setOutputKeyClass（Text.class）;在代码中。所以，它应该适用于任何可写类型。 –

Hadoop MapReduce：返回文本文件中单词的排序列表

回答

相关问题