2013-12-14 19 views
1

我尝试使用Hadoop集群中运行一个python工作,MRJob和我的包装脚本如下:MRJob错误

#!/bin/bash 

. /etc/profile 
module load use.own 
module load python/python2.7 
module load python/mrjob 

python path_to_python-script/mr_word_freq_count.py path_to_input_file/input.txt -r hadoop `> path_to_output_file/output.txt  #note the output file already exists before I submit the job` 

所以一旦我使用的qsub提交此脚本集群myscript.sh

我得到两个文件的输出文件和错误文件:

错误文件具有以下内容:

no configs found; falling back on auto-configuration 
no configs found; falling back on auto-configuration 
Traceback (most recent call last): 
    File "homefolder/privatemodules/python/examples/mr_word_freq_count.py", line 37, in <module> 
    MRWordFreqCount.run() 
    File "/homefolder/.local/lib/python2.7/site-packages/mrjob/job.py", line 500, in run 
    mr_job.execute() 
    File "/homefolder/.local/lib/python2.7/site-packages/mrjob/job.py", line 518, in execute 
    super(MRJob, self).execute() 
    File "/homefolder/.local/lib/python2.7/site-packages/mrjob/launch.py", line 146, in execute 
    self.run_job() 
    File "/homefolder/.local/lib/python2.7/site-packages/mrjob/launch.py", line 206, in run_job 
    with self.make_runner() as runner: 
    File "/homefolder/.local/lib/python2.7/site-packages/mrjob/job.py", line 541, in make_runner 
    return super(MRJob, self).make_runner() 
    File "/homefolder/.local/lib/python2.7/site-packages/mrjob/launch.py", line 164, in make_runner 
    return HadoopJobRunner(**self.hadoop_job_runner_kwargs()) 
    File "/homefolder/.local/lib/python2.7/site-packages/mrjob/hadoop.py", line 179, in __init__ 
    super(HadoopJobRunner, self).__init__(**kwargs) 
    File "/homefolder/.local/lib/python2.7/site-packages/mrjob/runner.py", line 352, in __init__ 
    self._opts = self.OPTION_STORE_CLASS(self.alias, opts, conf_paths) 
    File "/homefolder/.local/lib/python2.7/site-packages/mrjob/hadoop.py", line 132, in __init__ 
    'you must set $HADOOP_HOME, or pass in hadoop_home explicitly') 
Exception: you must set $HADOOP_HOME, or pass in hadoop_home explicitly 

第一个问题我怎么找到$ HOME HADOOP?当我做回声$ HADOOP_HOME没有打印,这意味着它没有设置。所以,即使我必须设置它是什么,我必须将它设置为路径?是否应将其设置为群集中Hadoop name_node的路径?

第二个问题是什么“没有发现CONFIGS”错误说明什么?它有做的$ HADOOP_HOME没有设置或者它出现的其他配置文件中被明确地传递?

任何帮助将非常感激。

在此先感谢!

+0

为什么要创建两个不同的问题? – aa8y

回答

3

首先,$HADOOP_HOME应设置为你的机器的本地Hadoop的安装路径,几乎所有的Hadoop应用程序假定$HADOOP_HOME/bin/hadoop是Hadoop的可执行文件。所以,如果您安装的Hadoop系统默认的路径,你应该export HADOOP_HOME=/usr/,否则你应该export HADOOP_HOME=/path/to/hadoop

其次,你可以为mrjob提供一个特定的配置,如果没有,mrjob会使用自动配置。在大多数情况下,提供HADOOP_HOME和使用自动配置是好的,对于高级用户,请参阅http://pythonhosted.org/mrjob/guides/configs-basics.html

+0

不起作用。没有任何工作。 ''你必须设置$ HADOOP_HOME,或者显式地传入hadoop_home')例外:你必须设置$ HADOOP_HOME,或者明确地传入hadoop_home。 – nottinhill