最近安装的hadoop 2.7.1以伪分布模式与Yarn在一个8核心,28GB RAM的虚拟机上运行ubuntu 14.04 LTS。hadoop 2.7.1伪分布式模式 - 只看到1个reducer
我们的文件通常是20-40GB,因此试图为单个虚拟机找到最佳配置。 我们已经在mapred-site.xml(下面)中设置了配置,允许运行多个映射器和reducer(使用slowstart = 1按顺序运行它们)。 我看到多个mappers,但只有1个减速器。
我们以前的hadoop(2.2.0)群集在2-4节点上,下面的许多配置都来自该设置。
mapred-site.xml中:
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.task.io.sort.factor</name>
<value>48</value>
</property>
<property>
<name>mapreduce.task.io.sort.mb</name>
<value>512</value>
</property>
<property>
<name>mapred.child.java.opts</name>
<value>-Xmx4096m</value>
</property>
<property>
<name>mapreduce.map.memory.mb</name>
<value>3072</value>
<description>upper memory limit (MB) that Hadoop allows allocated to a mapper</description>
</property>
<property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx2048m</value>
<description>maximum JVM heap size for map tasks</description>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>5120</value>
<description>upper memory limit (MB) that Hadoop allows allocated to a reducer</description>
</property>
<property>
<name>mapreduce.reduce.java.opts</name>
<value>-Xmx4096m</value>
<description>maximum JVM heap size for reduce tasks</description>
</property>
<property>
<name>mapreduce.tasktracker.map.tasks.maximum</name>
<value>8</value>
<description>maximum MAP tasks that can be run in parallel on this node </description>
</property>
<property>
<name>mapreduce.tasktracker.reduce.tasks.maximum</name>
<value>4</value>
<description>maximum REDUCE tasks that can be run in parallel on this node </description>
</property>
<property>
<name>mapred.reduce.slowstart.completed.maps</name>
<value>1</value>
<description>Fraction of the number of maps in the job which should be complete before reduces are scheduled for the job.</description>
</property>
芯-site.xml中:
<property>
<name>hadoop.tmp.dir</name>
<value>/mnt/drive1/cluster/hadoop/tmp</value>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
HDFS-site.xml中
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/mnt/drive1/cluster/hadoop/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/mnt/drive1/cluster/hadoop/hdfs/datanode</value>
</property>
纱-site.xml中:
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>24576</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
<property>
<name>yarn.nodemanager.pmem-check-enabled</name>
<value>false</value>
</property>
你的mapper/reducer代码在哪里?你试图完成什么任务?你为什么认为你应该有多个减速器? – SMA
映射器/缩减器代码太大,无法发布,也不会编辑任何配置。它在这里运行良好,我们在多节点集群中看到了多个减速器。 – Vishal
您是否尝试过在工作中指定减速器的数量?我相信默认是一个减速器。 –