的Hadoop MapReduce的：自定义输入格式

我有一个有文本和 “^” 之间数据的文件：的Hadoop MapReduce的：自定义输入格式

SOME TEXT^GOES HERE^
和几个^更多
GOES HERE

我我正在编写自定义输入格式来使用“^”字符分隔行。即映射器的输出应该是这样的：

SOME TEXT
GOES HERE
和几个
更多GOES HERE

我写延伸FileInputFormat，也写了一个书面的自定义输入格式自定义记录阅读器，扩展RecordReader。下面给出了我的自定义记录阅读器的代码。我不知道如何处理这段代码。 WHILE循环部分中的nextKeyValue（）方法有问题。我应该如何从分割中读取数据并生成我的自定义键值？我正在使用所有新的mapreduce包而不是旧的mapred包。

public class MyRecordReader extends RecordReader<LongWritable, Text> 
    { 
     long start, current, end; 
     Text value; 
     LongWritable key; 
     LineReader reader; 
     FileSplit split; 
     Path path; 
     FileSystem fs; 
     FSDataInputStream in; 
     Configuration conf; 

     @Override 
     public void initialize(InputSplit inputSplit, TaskAttemptContext cont) throws IOException, InterruptedException 
     { 
      conf = cont.getConfiguration(); 
      split = (FileSplit)inputSplit; 
      path = split.getPath(); 
      fs = path.getFileSystem(conf); 
      in = fs.open(path); 
      reader = new LineReader(in, conf); 
      start = split.getStart(); 
      current = start; 
      end = split.getLength() + start; 
     } 

     @Override 
     public boolean nextKeyValue() throws IOException 
     { 
      if(key==null) 
       key = new LongWritable(); 

      key.set(current); 
      if(value==null) 
       value = new Text(); 

      long readSize = 0; 
      while(current<end) 
      { 
       Text tmpText = new Text(); 
       readSize = read //here how should i read data from the split, and generate key-value? 

       if(readSize==0) 
        break; 

       current+=readSize;    
      } 

      if(readSize==0) 
      { 
       key = null; 
       value = null; 
       return false; 
      } 

      return true; 

     } 

     @Override 
     public float getProgress() throws IOException 
     { 

     } 

     @Override 
     public LongWritable getCurrentKey() throws IOException 
     { 

     } 

     @Override 
     public Text getCurrentValue() throws IOException 
     { 

     } 

     @Override 
     public void close() throws IOException 
     { 

     } 


    }

来源

2014-09-20 aiman

有没有必要自己实现。您可以简单地将配置值textinputformat.record.delimiter设置为拨音字符。

conf.set("textinputformat.record.delimiter", "^");

这应该正常工作，正常TextInputFormat。

来源

2014-09-20 11:12:41

的Hadoop MapReduce的：自定义输入格式

回答

相关问题