2016-08-30 21 views
0

我正在运行采用.csv文件作为hdfs输入并通过java地图程序插入到hbase的样本地图缩减作业。我试着用下面的选项来避免java堆错误。hadoop集群中的堆空间问题,同时从命令提示符运行map reducer程序

configuration.set("mapreduce.map.java.opts", "-Xmx5g"); 
configuration.set("mapreduce.map.memory.mb", "-1"); 

但是我在运行map reduce程序时遇到了Java堆问题。

2016-08-30 12:47:26,764 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords 2016-08-30 12:50:57,663 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.OutOfMemoryError: Java heap space at com.google.protobuf.ByteString.copyFrom(ByteString.java:194) at com.google.protobuf.CodedInputStream.readBytes(CodedInputStream.java:324) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$MutationProto$ColumnValue$QualifierValue.(ClientProtos.java:9144) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$MutationProto$ColumnValue$QualifierValue.(ClientProtos.java:9089) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$MutationProto$ColumnValue$QualifierValue$1.parsePartialFrom(ClientProtos.java:9198) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$MutationProto$ColumnValue$QualifierValue$1.parsePartialFrom(ClientProtos.java:9193)

Driver program configuration as below : 
Job job = new Job(configuration);  
job.setJarByClass(HbaseTest.class);  
job.setJobName("Data loading to HBase Table::"+TABLE_NAME);  
job.setInputFormatClass(TextInputFormat.class);  
job.setMapOutputKeyClass(ImmutableBytesWritable.class);  
job.setMapperClass(HbaseTestMapper.class);  
job.setNumReduceTasks(0); 
FileInputFormat.addInputPaths(job, args[0]);   
FileSystem.getLocal(getConf()).delete(new Path(outputPath), true);  
FileOutputFormat.setOutputPath(job, new Path(outputPath));  
job.setMapOutputValueClass(Put.class); 

我使用Hadoop 2.x中有三个节点群集,每个节点都有32GB。我的输入文件大小是831MB。请帮助我解决问题以及如何解决。

回答

0

可以增加像

configuration.set("mapreduce.child.java.opts", "-Xmx6553m"); 
configuration.set("mapreduce.map.memory.mb", "8192"); 
配置