原始问题(长版本以下)。短版本:使用ruby脚本运行hadoop流,因为安装在所有群集节点上的映射器和rvm不起作用。因为ruby未被hadoop启动的shell识别(并且rvm未被正确加载)。为什么?使用RVM进行Hadoop流式处理无法找到Gem
我想使用wukong
作为gem创建hadoop的map/reduce作业。问题是wukong
gem无法通过hadoop加载(即未找到)。 Hadoop作业给我以下错误:
/usr/local/rvm/rubies/ruby-1.9.3-p194/lib/ruby/site_ruby/1.9.1/rubygems/custom_require.rb:36:in `require': cannot load such file -- wukong (LoadError)
from /usr/local/rvm/rubies/ruby-1.9.3-p194/lib/ruby/site_ruby/1.9.1/rubygems/custom_require.rb:36:in `require'
from /tmp/mapr-hadoop/mapred/local/taskTracker/admin/jobcache/job_201207061102_0068/attempt_201207061102_0068_m_000000_0/work/./test.rb:6:in `<main>'
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:572)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:136)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:394)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:327)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1109)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
但是,这样做对所有的集群机器cat somefile | ./test.rb --map
按预期工作。此外,我还在我的测试文件中包含了一些调试打印,我可以从中检索hadoop日志。当运行
$stderr.puts `gem list`
它产生所有的宝石,包括wukong
,也
$stderr.puts $LOAD_PATH.inspect
产生了examt相同的路径,因为它打印$LOAD_PATH
运行的本地(而不是由Hadoop的推出)Ruby脚本时一样。
为什么hadoop启动ruby脚本没有找到gem这是明确安装并正常工作?
Hadoop是推出为:
hadoop jar /opt/mapr/hadoop/hadoop-0.20.2/contrib/streaming/hadoop-0.20.2-dev-streaming.jar \
-libjars /opt/hypertable/current/lib/java/hypertable-0.9.5.6.jar,/opt/hypertable/current/lib/java/libthrift-0.8.0.jar \
-Dmapred.child.env="PATH=$PATH:/usr/local/rvm/bin/rvm" \
-mapper '/home/admin/wukong/test.rb --map' \
-file /home/admin/wukong/test.rb \
-reducer /bin/cat \
-input /test/test.rb \
-output /test/something2