2011-07-15 37 views
3

我正在处理8节点Hadoop集群,并且尝试使用指定的配置执行简单的流式作业。在多节点Hadoop集群上执行流作业中的“子错误”(cloudera distribution CDH3u0 Hadoop 0.20.2)

hadoop jar /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u0.jar \-D mapred.map.max.tacker.failures=10 \-D mared.map.max.attempts=8 \-D mapred.skip.attempts.to.start.skipping=8 \-D mapred.skip.map.max.skip.records=8 \-D mapred.skip.mode.enabled=true \-D mapred.max.map.failures.percent=5 \-input /user/hdfs/ABC/ \-output "/user/hdfs/output1/" \-mapper "perl -e 'while (<>) { chomp; print; }; exit;" \-reducer "perl -e 'while (<>) { ~s/LR\>/LR\>\n/g; print ; }; exit;" 

我使用cloudera的hadoop CDH3u0分布与hadoop 0.20.2。执行这项工作的问题是每次工作都失败。这项工作是给错误:

java.lang.Throwable: Child Error 
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:242) 
Caused by: java.io.IOException: Task process exit with nonzero status of 1. 
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:229) 

------- 
java.lang.Throwable: Child Error 
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:242) 
Caused by: java.io.IOException: Task process exit with nonzero status of 1. 
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:229) 

STDERR on the datanodes: 
    Exception in thread "main" java.io.IOException: Exception reading file:/mnt/hdfs/06/local/taskTracker/hdfs/jobcache/job_201107141446_0001/jobToken 
    at org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:146) 
    at org.apache.hadoop.mapreduce.security.TokenCache.loadTokens(TokenCache.java:159) 
    at org.apache.hadoop.mapred.Child.main(Child.java:107) 
Caused by: java.io.FileNotFoundException: File file:/mnt/hdfs/06/local/taskTracker/hdfs/jobcache/job_201107141446_0001/jobToken does not exist. 

因为我已经检查了以下几点错误的原因,仍然导致崩溃对此我无法理解的原因。

1. All the temp directories are in place 
2. Memory is way more than it might be required for job (running a small job) 
3. Permissions verified. 
4. Nothing Fancier done in the configuration just usual stuff. 

最奇怪的是,某个时候作业成功运行,大部分时间都失败。有关这些问题的任何指导/帮助将非常有帮助。我从过去4天开始处理这个错误,我无法弄清楚任何事情。请帮忙!!!

感谢&问候, 阿图尔

+0

运行mapreduce时检查磁盘空间是否用完。记录目录相关的磁盘空间 – Infinity

回答

1

我都面临着同样的问题,它发生,如果任务服务器无法分配指定的内存给孩子JVM的任务。

当群集不忙的时候再次尝试执行同一个任务,并且这个任务会通过或者推测执行为真,那样的话hadoop会在另一个任务跟踪器中执行相同的任务。