2014-09-20 40 views
1

我有一个有文本和 “^” 之间数据的文件:的Hadoop MapReduce的:自定义输入格式

SOME TEXT^GOES HERE^
和几个^更多
GOES HERE

我我正在编写自定义输入格式来使用“^”字符分隔行。即映射器的输出应该是这样的:

SOME TEXT
GOES HERE
和几个
更多GOES HERE

我写延伸FileInputFormat,也写了一个书面的自定义输入格式自定义记录阅读器,扩展RecordReader。下面给出了我的自定义记录阅读器的代码。我不知道如何处理这段代码。 WHILE循环部分中的nextKeyValue()方法有问题。我应该如何从分割中读取数据并生成我的自定义键值?我正在使用所有新的mapreduce包而不是旧的mapred包。

public class MyRecordReader extends RecordReader<LongWritable, Text> 
    { 
     long start, current, end; 
     Text value; 
     LongWritable key; 
     LineReader reader; 
     FileSplit split; 
     Path path; 
     FileSystem fs; 
     FSDataInputStream in; 
     Configuration conf; 

     @Override 
     public void initialize(InputSplit inputSplit, TaskAttemptContext cont) throws IOException, InterruptedException 
     { 
      conf = cont.getConfiguration(); 
      split = (FileSplit)inputSplit; 
      path = split.getPath(); 
      fs = path.getFileSystem(conf); 
      in = fs.open(path); 
      reader = new LineReader(in, conf); 
      start = split.getStart(); 
      current = start; 
      end = split.getLength() + start; 
     } 

     @Override 
     public boolean nextKeyValue() throws IOException 
     { 
      if(key==null) 
       key = new LongWritable(); 

      key.set(current); 
      if(value==null) 
       value = new Text(); 

      long readSize = 0; 
      while(current<end) 
      { 
       Text tmpText = new Text(); 
       readSize = read //here how should i read data from the split, and generate key-value? 

       if(readSize==0) 
        break; 

       current+=readSize;    
      } 

      if(readSize==0) 
      { 
       key = null; 
       value = null; 
       return false; 
      } 

      return true; 

     } 

     @Override 
     public float getProgress() throws IOException 
     { 

     } 

     @Override 
     public LongWritable getCurrentKey() throws IOException 
     { 

     } 

     @Override 
     public Text getCurrentValue() throws IOException 
     { 

     } 

     @Override 
     public void close() throws IOException 
     { 

     } 


    } 

回答

9

有没有必要自己实现。您可以简单地将配置值textinputformat.record.delimiter设置为拨音字符。

conf.set("textinputformat.record.delimiter", "^"); 

这应该正常工作,正常TextInputFormat

相关问题