2013-07-03 27 views
0

我有A R脚本,工作中的R Colsole完全正常,但是当我在Hadoop中正在运行的流它与失败下面错误的地图相.Find任务尝试登录使用Hadoop流作业失败运行A R脚本子与代码失败1

Hadoop的流命令我:

/home/Bibhu/hadoop-0.20.2/bin/hadoop jar \ 
    /home/Bibhu/hadoop-0.20.2/contrib/streaming/*.jar \ 
    -input hdfs://localhost:54310/user/Bibhu/BookTE1.csv \ 
    -output outsid -mapper `pwd`/code1.sh 

标准错误日志

Loading required package: class 
Error in read.table(file = file, header = header, sep = sep, quote = quote, : 
    no lines available in input 
Calls: read.csv -> read.table 
Execution halted 
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1 
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362) 
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:572) 
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:136) 
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) 
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36) 
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) 
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) 
    at org.apache.hadoop.mapred.Child.main(Child.java:170) 

的syslog日志

2013-07-03 19:32:36,080 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=MAP, sessionId= 
2013-07-03 19:32:36,654 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 1 
2013-07-03 19:32:36,675 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb = 100 
2013-07-03 19:32:36,835 INFO org.apache.hadoop.mapred.MapTask: data buffer = 79691776/99614720 
2013-07-03 19:32:36,835 INFO org.apache.hadoop.mapred.MapTask: record buffer = 262144/327680 
2013-07-03 19:32:36,899 INFO org.apache.hadoop.streaming.PipeMapRed: PipeMapRed exec [/home/Bibhu/Downloads/SentimentAnalysis/Sid/smallFile/code1.sh] 
2013-07-03 19:32:37,256 INFO org.apache.hadoop.streaming.PipeMapRed: Records R/W=0/1 
2013-07-03 19:32:38,509 INFO org.apache.hadoop.streaming.PipeMapRed: MRErrorThread done 
2013-07-03 19:32:38,509 INFO org.apache.hadoop.streaming.PipeMapRed: PipeMapRed failed! 
2013-07-03 19:32:38,557 WARN org.apache.hadoop.mapred.TaskTracker: Error running child 
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1 
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362) 
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:572) 
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:136) 
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) 
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36) 
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) 
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) 
    at org.apache.hadoop.mapred.Child.main(Child.java:170) 
2013-07-03 19:32:38,631 INFO org.apache.hadoop.mapred.TaskRunner: Runnning cleanup for the task 
+1

你的脚本** code1.sh **异常退出。我认为可能有一些意外的输入会导致你的R脚本崩溃。 – zsxwing

回答

0
  1. 与完整版写hadoopStreamming罐子像hadoop-streaming-1.0.4.jar
  2. 指定单独的文件路径映射器&减速器-file选项
  3. 告诉的Hadoop这与-mapper & -reducer您的映射&减速码选项

更多资料请参阅Running WordCount on Hadoop using R script

0

你需要从你的映射器和减压器找到日志,因为这是工作失败(由java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1表示)的地方。这就是说你的R脚本崩溃了。

如果您正在使用Hadoop的Hortonworks distribuion,最简单的方法就是打开你的jobhistory。它应该在http://127.0.0.1:19888/jobhistory。应该可以使用命令行在文件系统中找到日志,但我还没有找到位置。

  1. 打开http://127.0.0.1:19888/jobhistory在Web浏览器
  2. 单击失败作业的作业ID
  3. 单击指示失败的作业计数
  4. 点击一个尝试链接
  5. 点击日志链接

你会看到一个页面,看起来像

Log Type: stderr 
Log Length: 418 
Traceback (most recent call last): 
    File "/hadoop/yarn/local/usercache/root/appcache/application_1404203309115_0003/container_1404203309115_0003_01_000002/./mapper.py", line 45, in <module> 
    mapper() 
    File "/hadoop/yarn/local/usercache/root/appcache/application_1404203309115_0003/container_1404203309115_0003_01_000002/./mapper.py", line 37, in mapper 
    for record in reader: 
_csv.Error: newline inside string 

这是从我的Python脚本错误,从R上的错误看起来有点不同。

来源:http://hortonworks.com/community/forums/topic/map-reduce-job-log-files/

-2

我收到此错误今晚一样,同时开发的Map Reduce工作流与R

我正在一个10节点集群上,各有12个内核,并试图在提交时提供:

-D mapred.map.tasks=200\ 
-D mapred.reduce.tasks=200 

尽管当我改变了这些以

-D mapred.map.tasks=10\ 
-D mapred.reduce.tasks=10 

作业成功完成这是一个神秘的解决方案,今天晚上可能会出现更多的上下文。但是,如果有读者可以澄清,请做!