2013-06-20 89 views
0

我是hadoop环境的新手,您对如何解决此错误有任何想法,或者可能是此错误背后的原因?hadoop流式传输错误,mapreduce与python

[email protected]:~/hduser/hadoop$ sudo ./bin/hadoop jar contrib/streaming/hadoop-streaming-1.0.4.jar -file /home/hduser/map.py -mapper /home/hduser/map.py -file /home/hduser/red.py -reducer /home/hduser/red.py -input /home/hduser/tmp/cddb.txt -output /home/hduser/op1 
packageJobJar: [/home/hduser/map.py, /home/hduser/red.py] [] /tmp/streamjob7455767556382290755.jar tmpDir=null 
13/06/20 12:43:55 INFO util.NativeCodeLoader: Loaded the native-hadoop library 
13/06/20 12:43:55 WARN snappy.LoadSnappy: Snappy native library not loaded 
13/06/20 12:43:55 INFO mapred.FileInputFormat: Total input paths to process : 1 
13/06/20 12:43:55 WARN mapred.LocalJobRunner: LocalJobRunner does not support symlinking into current working dir. 
13/06/20 12:43:56 INFO streaming.StreamJob: getLocalDirs(): [/tmp/hadoop-root/mapred/local] 
13/06/20 12:43:56 INFO streaming.StreamJob: Running job: job_local_0001 
13/06/20 12:43:56 INFO streaming.StreamJob: Job running in-process (local Hadoop) 
13/06/20 12:43:56 INFO util.ProcessTree: setsid exited with exit code 0 
13/06/20 12:43:56 INFO mapred.Task: Using ResourceCalculatorPlugin : [email protected] 
13/06/20 12:43:56 INFO mapred.MapTask: numReduceTasks: 1 
13/06/20 12:43:56 INFO mapred.MapTask: io.sort.mb = 100 
13/06/20 12:43:56 INFO mapred.MapTask: data buffer = 79691776/99614720 
13/06/20 12:43:56 INFO mapred.MapTask: record buffer = 262144/327680 
13/06/20 12:43:56 INFO streaming.PipeMapRed: PipeMapRed exec [/home/hduser/hduser/hadoop/./map.py] 
13/06/20 12:43:56 INFO streaming.PipeMapRed: R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s] 
13/06/20 12:43:57 INFO streaming.StreamJob: map 0% reduce 0% 
13/06/20 12:44:02 INFO mapred.LocalJobRunner: file:/home/hduser/tmp/cddb.txt:0+1205 
13/06/20 12:44:03 INFO streaming.StreamJob: map 100% reduce 0% 
13/06/20 12:48:11 INFO streaming.PipeMapRed: Records R/W=9/1 
13/06/20 12:48:11 INFO streaming.PipeMapRed: MRErrorThread done 
13/06/20 12:48:11 INFO streaming.PipeMapRed: mapRedFinished 
13/06/20 12:48:11 INFO mapred.MapTask: Starting flush of map output 
13/06/20 12:48:11 INFO mapred.MapTask: Finished spill 0 
13/06/20 12:48:11 INFO mapred.Task: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting 
13/06/20 12:48:11 INFO mapred.LocalJobRunner: Records R/W=9/1 
13/06/20 12:48:11 INFO mapred.Task: Task 'attempt_local_0001_m_000000_0' done. 
13/06/20 12:48:11 INFO mapred.Task: Using ResourceCalculatorPlugin : [email protected] 
13/06/20 12:48:11 INFO mapred.LocalJobRunner: 
13/06/20 12:48:11 INFO mapred.Merger: Merging 1 sorted segments 
13/06/20 12:48:11 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 1356 bytes 
13/06/20 12:48:11 INFO mapred.LocalJobRunner: 
13/06/20 12:48:11 INFO streaming.PipeMapRed: PipeMapRed exec [/home/hduser/hduser/hadoop/./red.py] 
13/06/20 12:48:11 INFO streaming.PipeMapRed: R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s] 
13/06/20 12:48:11 INFO streaming.PipeMapRed: R/W/S=10/0/0 in:NA [rec/s] out:NA [rec/s] 
Traceback (most recent call last): 
    File "/home/hduser/hduser/hadoop/./red.py", line 30, in <module> 
    main() 
    File "/home/hduser/hduser/hadoop/./red.py", line 19, in main 
    for similarity, group in groupby(data, itemgetter(0), reverse=True): 
TypeError: groupby() takes at most 2 arguments (3 given) 
13/06/20 12:48:11 INFO streaming.PipeMapRed: MRErrorThread done 
13/06/20 12:48:11 INFO streaming.PipeMapRed: PipeMapRed failed! 
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1 
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362) 
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:576) 
    at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:137) 
    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:529) 
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420) 
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260) 
13/06/20 12:48:11 WARN mapred.LocalJobRunner: job_local_0001 
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1 
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362) 
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:576) 
    at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:137) 
    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:529) 
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420) 
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260) 
13/06/20 12:48:12 INFO streaming.StreamJob: Job running in-process (local Hadoop) 
13/06/20 12:48:12 ERROR streaming.StreamJob: Job not successful. Error: NA 
13/06/20 12:48:12 INFO streaming.StreamJob: killJob... 
Streaming Command Failed! 

我使用Hadoop 1.0.4,并写了地图降低蟒蛇(Hadoop的数据流时) 。

+0

请将代码发布在您的问题的正文代码块(无pastebin) –

回答

0

错误是显而易见的:

Traceback (most recent call last): 
    File "/home/hduser/hduser/hadoop/./red.py", line 30, in <module> 
    main() 
    File "/home/hduser/hduser/hadoop/./red.py", line 19, in main 
    for similarity, group in groupby(data, itemgetter(0), reverse=True): 
TypeError: groupby() takes at most 2 arguments (3 given) 

GROUPBY只接受2个参数。这里是groupby的文件。