2014-02-06 59 views
0

输出文件夹包含part-00000文件,没有内容!我的MapReduce程序产生零输出

这里是命令跟踪在那里我看也不例外,

[[email protected] ~]$ hadoop jar testmr.jar TestMR /tmp/example.csv /user/cloudera/output 
14/02/06 11:45:24 WARN conf.Configuration: session.id is deprecated. Instead, use dfs.metrics.session-id 
14/02/06 11:45:24 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 
14/02/06 11:45:24 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 
14/02/06 11:45:25 INFO mapred.FileInputFormat: Total input paths to process : 1 
14/02/06 11:45:25 INFO mapred.JobClient: Running job: job_local1238439569_0001 
14/02/06 11:45:25 INFO mapred.LocalJobRunner: OutputCommitter set in config null 
14/02/06 11:45:25 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapred.FileOutputCommitter 
14/02/06 11:45:25 INFO mapred.LocalJobRunner: Waiting for map tasks 
14/02/06 11:45:25 INFO mapred.LocalJobRunner: Starting task: attempt_local1238439569_0001_m_000000_0 
14/02/06 11:45:26 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead 
14/02/06 11:45:26 INFO util.ProcessTree: setsid exited with exit code 0 
14/02/06 11:45:26 INFO mapred.Task: Using ResourceCalculatorPlugin : [email protected] 
14/02/06 11:45:26 INFO mapred.MapTask: Processing split: hdfs://localhost.localdomain:8020/tmp/example.csv:0+2963382 
14/02/06 11:45:26 WARN mapreduce.Counters: Counter name MAP_INPUT_BYTES is deprecated. Use FileInputFormatCounters as group name and BYTES_READ as counter name instead 
14/02/06 11:45:26 INFO mapred.MapTask: numReduceTasks: 1 
14/02/06 11:45:26 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer 
14/02/06 11:45:26 INFO mapred.MapTask: io.sort.mb = 50 
14/02/06 11:45:26 INFO mapred.MapTask: data buffer = 39845888/49807360 
14/02/06 11:45:26 INFO mapred.MapTask: record buffer = 131072/163840 
14/02/06 11:45:26 INFO mapred.JobClient: map 0% reduce 0% 
14/02/06 11:45:28 INFO mapred.MapTask: Starting flush of map output 
14/02/06 11:45:28 INFO compress.CodecPool: Got brand-new compressor [.snappy] 
14/02/06 11:45:28 INFO mapred.Task: Task:attempt_local1238439569_0001_m_000000_0 is done. And is in the process of commiting 
14/02/06 11:45:28 INFO mapred.LocalJobRunner: hdfs://localhost.localdomain:8020/tmp/example.csv:0+2963382 
14/02/06 11:45:28 INFO mapred.Task: Task 'attempt_local1238439569_0001_m_000000_0' done. 
14/02/06 11:45:28 INFO mapred.LocalJobRunner: Finishing task: attempt_local1238439569_0001_m_000000_0 
14/02/06 11:45:28 INFO mapred.LocalJobRunner: Map task executor complete. 
14/02/06 11:45:28 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead 
14/02/06 11:45:28 INFO mapred.Task: Using ResourceCalculatorPlugin : [email protected] 
14/02/06 11:45:28 INFO mapred.LocalJobRunner: 
14/02/06 11:45:28 INFO mapred.Merger: Merging 1 sorted segments 
14/02/06 11:45:28 INFO compress.CodecPool: Got brand-new decompressor [.snappy] 
14/02/06 11:45:28 INFO mapred.Merger: Down to the last merge-pass, with 0 segments left of total size: 0 bytes 
14/02/06 11:45:28 INFO mapred.LocalJobRunner: 
14/02/06 11:45:28 INFO mapred.Task: Task:attempt_local1238439569_0001_r_000000_0 is done. And is in the process of commiting 
14/02/06 11:45:28 INFO mapred.LocalJobRunner: 
14/02/06 11:45:28 INFO mapred.Task: Task attempt_local1238439569_0001_r_000000_0 is allowed to commit now 
14/02/06 11:45:28 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local1238439569_0001_r_000000_0' to hdfs://localhost.localdomain:8020/user/cloudera/output 
14/02/06 11:45:28 INFO mapred.LocalJobRunner: reduce > reduce 
14/02/06 11:45:28 INFO mapred.Task: Task 'attempt_local1238439569_0001_r_000000_0' done. 
14/02/06 11:45:28 INFO mapred.JobClient: map 100% reduce 100% 
14/02/06 11:45:28 INFO mapred.JobClient: Job complete: job_local1238439569_0001 
14/02/06 11:45:28 INFO mapred.JobClient: Counters: 26 
14/02/06 11:45:28 INFO mapred.JobClient: File System Counters 
14/02/06 11:45:28 INFO mapred.JobClient:  FILE: Number of bytes read=7436 
14/02/06 11:45:28 INFO mapred.JobClient:  FILE: Number of bytes written=199328 
14/02/06 11:45:28 INFO mapred.JobClient:  FILE: Number of read operations=0 
14/02/06 11:45:28 INFO mapred.JobClient:  FILE: Number of large read operations=0 
14/02/06 11:45:28 INFO mapred.JobClient:  FILE: Number of write operations=0 
14/02/06 11:45:28 INFO mapred.JobClient:  HDFS: Number of bytes read=5926764 
14/02/06 11:45:28 INFO mapred.JobClient:  HDFS: Number of bytes written=0 
14/02/06 11:45:28 INFO mapred.JobClient:  HDFS: Number of read operations=10 
14/02/06 11:45:28 INFO mapred.JobClient:  HDFS: Number of large read operations=0 
14/02/06 11:45:28 INFO mapred.JobClient:  HDFS: Number of write operations=4 
14/02/06 11:45:28 INFO mapred.JobClient: Map-Reduce Framework 
14/02/06 11:45:28 INFO mapred.JobClient:  Map input records=24518 
14/02/06 11:45:28 INFO mapred.JobClient:  Map output records=0 
14/02/06 11:45:28 INFO mapred.JobClient:  Map output bytes=0 
14/02/06 11:45:28 INFO mapred.JobClient:  Input split bytes=129 
14/02/06 11:45:28 INFO mapred.JobClient:  Combine input records=0 
14/02/06 11:45:28 INFO mapred.JobClient:  Combine output records=0 
14/02/06 11:45:28 INFO mapred.JobClient:  Reduce input groups=0 
14/02/06 11:45:28 INFO mapred.JobClient:  Reduce shuffle bytes=0 
14/02/06 11:45:28 INFO mapred.JobClient:  Reduce input records=0 
14/02/06 11:45:28 INFO mapred.JobClient:  Reduce output records=0 
14/02/06 11:45:28 INFO mapred.JobClient:  Spilled Records=0 
14/02/06 11:45:28 INFO mapred.JobClient:  CPU time spent (ms)=0 
14/02/06 11:45:28 INFO mapred.JobClient:  Physical memory (bytes) snapshot=0 
14/02/06 11:45:28 INFO mapred.JobClient:  Virtual memory (bytes) snapshot=0 
14/02/06 11:45:28 INFO mapred.JobClient:  Total committed heap usage (bytes)=221126656 
14/02/06 11:45:28 INFO mapred.JobClient: org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter 
14/02/06 11:45:28 INFO mapred.JobClient:  BYTES_READ=2963382 
[[email protected] ~]$ 

下面是我的MR代码,

import java.io.IOException; 
import java.util.*; 
import java.text.SimpleDateFormat; 

import org.apache.hadoop.fs.Path; 
import org.apache.hadoop.conf.*; 
import org.apache.hadoop.io.*; 
import org.apache.hadoop.mapred.*; 
import org.apache.hadoop.mapred.Reducer; 
import org.apache.hadoop.util.*; 

public class TestMR 
{ 
    public static class Map extends MapReduceBase implements Mapper<LongWritable,Text,Text,Text> 
    { 
     public void map(LongWritable key, Text line, OutputCollector<Text, Text> output, Reporter reporter) throws IOException 
     { 
      final String [] split = line.toString().split(","); 

      if(split[2].equals("Test")) 
      { 
       output.collect(new Text(split[0]), new Text(split[4] + "|" + split[7])); 
      } 
     } 
    } 

    public static class Reduce extends MapReduceBase implements Reducer<Text,Text,Text,DoubleWritable> 
    { 
     public void reduce(Text key, Iterator<Text> values, OutputCollector<Text, DoubleWritable> output, Reporter reporter) throws IOException 
     { 
      while(values.hasNext()) 
      { 
       long t1=0, t2=0; 
       SimpleDateFormat df = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss"); 

       String [] tmpBuf_1 = values.next().toString().split("|"); 
       String v1 = tmpBuf_1[0]; 
       try 
       { 
        t1 = df.parse(tmpBuf_1[1]).getTime(); 
       } 
       catch (java.text.ParseException e) 
       { 
        System.out.println("Unable to parse date string: "+ tmpBuf_1[1]); 
        continue; 
       } 

       if(!values.hasNext()) 
        break; 

       String [] tmpBuf_2 = values.next().toString().split("|");  
       String v2 = tmpBuf_2[0]; 
       try 
       { 
        t2 = df.parse(tmpBuf_2[1]).getTime(); 
       } 
       catch (java.text.ParseException e) 
       { 
        System.out.println("Unable to parse date string: "+ tmpBuf_2[1]); 
        continue; 
       }  

       int vDiff = Integer.parseInt(v2) - Integer.parseInt(v1);  
       long tDiff = (t2 - t1)/1000; 
       if(tDiff > 600) 
        break; 

       double declineV = vDiff/tDiff; 

       output.collect(key, new DoubleWritable(declineV)); 
      } 
     } 
    } 

    public static void main(String[] args) throws Exception 
    { 
     JobConf conf = new JobConf(TestMR.class); 
     conf.setJobName("TestMapReduce"); 
     conf.set("mapred.job.tracker", "local"); 

     conf.setOutputKeyClass(Text.class); 
     conf.setOutputValueClass(DoubleWritable.class); 

     conf.setMapperClass(Map.class); 
     conf.setCombinerClass(Reduce.class); 
     conf.setReducerClass(Reduce.class); 

     conf.setInputFormat(TextInputFormat.class); 
     conf.setOutputFormat(TextOutputFormat.class); 

     FileInputFormat.setInputPaths(conf, new Path(args[0])); 
     FileOutputFormat.setOutputPath(conf, new Path(args[1])); 

     JobClient.runJob(conf); 
    } 
} 

这是我第一次的MapReduce程序,我无法找到它不会产生输出的原因! 如果在我的代码中存在任何问题或者运行MapReduce作业以获取输出的更好方式,请告诉我。

仅供参考,testmr.jar文件位于本地文件系统中,HDFS中的CSV和输出文件夹位于本地文件系统中。

+0

只需添加:请避免使用与关键字或函数名匹配的变量名 –

回答

3

如果你看一下日志,你可以看到地图方法不产生任何输出:

14/02/06 11:45:28 INFO mapred.JobClient:  Map input records=24518 
14/02/06 11:45:28 INFO mapred.JobClient:  Map output records=0 
14/02/06 11:45:28 INFO mapred.JobClient:  Map output bytes=0 

正如你可以看到,地图方法获取输入记录,但它是生产0输出记录。所以必须在你的地图的方法有点问题的逻辑:

final String [] split = line.toString().split(","); 

     if(split[2].equals("Test")) 
     { 
      output.collect(new Text(split[0]), new Text(split[4] + "|" + split[7])); 
     } 

我建议你测试这个逻辑与一些样本输入数据的简单Java代码,并确保它的工作原理,然后编辑您的MapReduce代码尝试再次运行该工作。