2014-07-07 34 views
1

这是我的MR作业的控制台上的输出。这项工作顺利完成。但我有两个顾虑。为什么reducer的数量大于我在Hadoop中指定的数量?

1)我指定-D mapred.reduce.slowstart.completed.maps=0.75。然而,减速机没有启动时,地图是75%完成如下图所示

2)我指定-D mapred.reduce.tasks=2。但是推出的减速机任务数量是3(如下所示)。

为什么这两个参数都没有被照顾?

hadoop jar hadoop-examples-1.2.1.jar wordcount -D mapred.reduce.slowstart.completed.maps=0.75 -D mapred.reduce.tasks=2 /data /output/result1 
2014-07-06 22:25:11.733 java[3236:1903] Unable to load realm info from SCDynamicStore 
14/07/06 22:25:13 INFO input.FileInputFormat: Total input paths to process : 4 
14/07/06 22:25:13 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 
14/07/06 22:25:13 WARN snappy.LoadSnappy: Snappy native library not loaded 
14/07/06 22:25:13 INFO mapred.JobClient: Running job: job_201407061919_0015 
14/07/06 22:25:14 INFO mapred.JobClient: map 0% reduce 0% 
14/07/06 22:25:49 INFO mapred.JobClient: map 25% reduce 0% 
14/07/06 22:25:50 INFO mapred.JobClient: map 50% reduce 0% 
14/07/06 22:26:08 INFO mapred.JobClient: map 75% reduce 0% 
14/07/06 22:26:14 INFO mapred.JobClient: map 100% reduce 0% 
14/07/06 22:26:23 INFO mapred.JobClient: map 100% reduce 8% 
14/07/06 22:26:26 INFO mapred.JobClient: map 100% reduce 33% 
14/07/06 22:26:29 INFO mapred.JobClient: map 100% reduce 37% 
14/07/06 22:26:30 INFO mapred.JobClient: map 100% reduce 54% 
14/07/06 22:26:33 INFO mapred.JobClient: map 100% reduce 66% 
14/07/06 22:26:37 INFO mapred.JobClient: map 100% reduce 86% 
14/07/06 22:26:39 INFO mapred.JobClient: map 100% reduce 100% 
14/07/06 22:26:50 INFO mapred.JobClient: Job complete: job_201407061919_0015 
14/07/06 22:26:50 INFO mapred.JobClient: Counters: 29 
14/07/06 22:26:50 INFO mapred.JobClient: Job Counters 
14/07/06 22:26:50 INFO mapred.JobClient:  Launched reduce tasks=3 
14/07/06 22:26:50 INFO mapred.JobClient:  SLOTS_MILLIS_MAPS=107522 
14/07/06 22:26:50 INFO mapred.JobClient:  Total time spent by all reduces waiting after reserving slots (ms)=0 
14/07/06 22:26:50 INFO mapred.JobClient:  Total time spent by all maps waiting after reserving slots (ms)=0 
14/07/06 22:26:50 INFO mapred.JobClient:  Launched map tasks=4 
14/07/06 22:26:50 INFO mapred.JobClient:  Data-local map tasks=4 
14/07/06 22:26:50 INFO mapred.JobClient:  SLOTS_MILLIS_REDUCES=51153 
14/07/06 22:26:50 INFO mapred.JobClient: File Output Format Counters 
14/07/06 22:26:50 INFO mapred.JobClient:  Bytes Written=880862 
14/07/06 22:26:50 INFO mapred.JobClient: FileSystemCounters 
14/07/06 22:26:50 INFO mapred.JobClient:  FILE_BYTES_READ=2217446 
14/07/06 22:26:50 INFO mapred.JobClient:  HDFS_BYTES_READ=3672001 
14/07/06 22:26:50 INFO mapred.JobClient:  FILE_BYTES_WRITTEN=4030974 
14/07/06 22:26:50 INFO mapred.JobClient:  HDFS_BYTES_WRITTEN=880862 
14/07/06 22:26:50 INFO mapred.JobClient: File Input Format Counters 
14/07/06 22:26:50 INFO mapred.JobClient:  Bytes Read=3671571 
14/07/06 22:26:50 INFO mapred.JobClient: Map-Reduce Framework 
14/07/06 22:26:50 INFO mapred.JobClient:  Map output materialized bytes=1474437 
14/07/06 22:26:50 INFO mapred.JobClient:  Map input records=77934 
14/07/06 22:26:50 INFO mapred.JobClient:  Reduce shuffle bytes=1474437 
14/07/06 22:26:50 INFO mapred.JobClient:  Spilled Records=255974 
14/07/06 22:26:50 INFO mapred.JobClient:  Map output bytes=6076197 
14/07/06 22:26:50 INFO mapred.JobClient:  Total committed heap usage (bytes)=589447168 
14/07/06 22:26:50 INFO mapred.JobClient:  CPU time spent (ms)=19030 
14/07/06 22:26:50 INFO mapred.JobClient:  Combine input records=629184 
14/07/06 22:26:50 INFO mapred.JobClient:  SPLIT_RAW_BYTES=430 
14/07/06 22:26:50 INFO mapred.JobClient:  Reduce input records=102328 
14/07/06 22:26:50 INFO mapred.JobClient:  Reduce input groups=82339 
14/07/06 22:26:50 INFO mapred.JobClient:  Combine output records=102328 
14/07/06 22:26:50 INFO mapred.JobClient:  Physical memory (bytes) snapshot=888221696 
14/07/06 22:26:50 INFO mapred.JobClient:  Reduce output records=82339 
14/07/06 22:26:50 INFO mapred.JobClient:  Virtual memory (bytes) snapshot=6509461504 
14/07/06 22:26:50 INFO mapred.JobClient:  Map output records=629184 

编辑:而对命令行任何参数:

hadoop jar hadoop-examples-1.2.1.jar wordcount /data/ /output/results2 
2014-07-06 20:05:29.428 java[2869:1903] Unable to load realm info from SCDynamicStore 
14/07/06 20:05:29 INFO input.FileInputFormat: Total input paths to process : 4 
14/07/06 20:05:29 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 
14/07/06 20:05:29 WARN snappy.LoadSnappy: Snappy native library not loaded 
14/07/06 20:05:30 INFO mapred.JobClient: Running job: job_201407061919_0009 
14/07/06 20:05:31 INFO mapred.JobClient: map 0% reduce 0% 
14/07/06 20:05:43 INFO mapred.JobClient: map 25% reduce 0% 
14/07/06 20:05:47 INFO mapred.JobClient: map 50% reduce 0% 
14/07/06 20:05:48 INFO mapred.JobClient: map 100% reduce 0% 
14/07/06 20:05:55 INFO mapred.JobClient: map 100% reduce 33% 
14/07/06 20:05:57 INFO mapred.JobClient: map 100% reduce 100% 
14/07/06 20:06:00 INFO mapred.JobClient: Job complete: job_201407061919_0009 
14/07/06 20:06:00 INFO mapred.JobClient: Counters: 29 
14/07/06 20:06:00 INFO mapred.JobClient: Job Counters 
14/07/06 20:06:00 INFO mapred.JobClient:  Launched reduce tasks=1 
14/07/06 20:06:00 INFO mapred.JobClient:  SLOTS_MILLIS_MAPS=53468 
14/07/06 20:06:00 INFO mapred.JobClient:  Total time spent by all reduces waiting after reserving slots (ms)=0 
14/07/06 20:06:00 INFO mapred.JobClient:  Total time spent by all maps waiting after reserving slots (ms)=0 
14/07/06 20:06:00 INFO mapred.JobClient:  Launched map tasks=4 
14/07/06 20:06:00 INFO mapred.JobClient:  Data-local map tasks=4 
14/07/06 20:06:00 INFO mapred.JobClient:  SLOTS_MILLIS_REDUCES=14440 
14/07/06 20:06:00 INFO mapred.JobClient: File Output Format Counters 
14/07/06 20:06:00 INFO mapred.JobClient:  Bytes Written=880862 
14/07/06 20:06:00 INFO mapred.JobClient: FileSystemCounters 
14/07/06 20:06:00 INFO mapred.JobClient:  FILE_BYTES_READ=2214915 
14/07/06 20:06:00 INFO mapred.JobClient:  HDFS_BYTES_READ=3672001 
14/07/06 20:06:00 INFO mapred.JobClient:  FILE_BYTES_WRITTEN=3974001 
14/07/06 20:06:00 INFO mapred.JobClient:  HDFS_BYTES_WRITTEN=880862 
14/07/06 20:06:00 INFO mapred.JobClient: File Input Format Counters 
14/07/06 20:06:00 INFO mapred.JobClient:  Bytes Read=3671571 
14/07/06 20:06:00 INFO mapred.JobClient: Map-Reduce Framework 
14/07/06 20:06:00 INFO mapred.JobClient:  Map output materialized bytes=1474413 
14/07/06 20:06:00 INFO mapred.JobClient:  Map input records=77934 
14/07/06 20:06:00 INFO mapred.JobClient:  Reduce shuffle bytes=1474413 
14/07/06 20:06:00 INFO mapred.JobClient:  Spilled Records=255974 
14/07/06 20:06:00 INFO mapred.JobClient:  Map output bytes=6076197 
14/07/06 20:06:00 INFO mapred.JobClient:  Total committed heap usage (bytes)=557662208 
14/07/06 20:06:00 INFO mapred.JobClient:  CPU time spent (ms)=10370 
14/07/06 20:06:00 INFO mapred.JobClient:  Combine input records=629184 
14/07/06 20:06:00 INFO mapred.JobClient:  SPLIT_RAW_BYTES=430 
14/07/06 20:06:00 INFO mapred.JobClient:  Reduce input records=102328 
14/07/06 20:06:00 INFO mapred.JobClient:  Reduce input groups=82339 
14/07/06 20:06:00 INFO mapred.JobClient:  Combine output records=102328 
14/07/06 20:06:00 INFO mapred.JobClient:  Physical memory (bytes) snapshot=802287616 
14/07/06 20:06:00 INFO mapred.JobClient:  Reduce output records=82339 
14/07/06 20:06:00 INFO mapred.JobClient:  Virtual memory (bytes) snapshot=5418221568 
14/07/06 20:06:00 INFO mapred.JobClient:  Map output records=629184 
+0

确保你要么拼写它-Dproperty = value(没有空格)或-Dproperty的值(有一个空格),否则它可能被解析错了 – rVr

+0

@rVr:没有指定我看到我的reducer是1. so设置的价值,确实改变它 –

+0

看起来像你也使用一个组合器,可能或不可能被框架调用?你可以在设置mapred.reduce.tasks = 2时粘贴完整的打印输出吗? – rVr

回答

2

1)I指​​定-D mapred.reduce.slowstart.completed.maps = 0.75。然而,当地图完成75%时,减速器未启动,如下所示

地图进度的75%并不一定意味着75%的地图任务已完成。这个命令意味着当地图任务的75%(在你的情况下是4个地图中的3个地图任务)是完成时,减速器将开始混洗阶段。有关如何定义进度的更多详细信息,请参见this post

2)我指定-D mapred.reduce.tasks = 2。但是推出的减速机任务数量是3(如下所示)。

可以在几个节点上启动相同的reduce任务(称为“speculative execution”)。当其中一个节点首先完成时,另一个节点为此任务获取“终止”信号。

另一种可能性是其中一个reduce任务在一个节点中失败并在另一个节点中成功执行。

相关问题