我已经手动安装具有以下配置的三个节点的集群:Hadoop - Cloudera MRV1集群规划 - 理想集群节点的最小数量是多少?它看起来如何?
Master/Slave Node 0 - NameNode, Secondary NameNode, JobTracker, HMaster,
DataNode, TaskTracker, HRegionServer,
Hive MetaStore, Database for Hive/Sqoop, HiveServer2, HCatalog,
Oozie Server,
Zookeeper,
Oozie-client, Hive-client, pig-client, M/R client tools, Sqoop
Slave Node 1 - DataNode, TaskTracker, HRegionServer,
Oozie-client, Hive-client, pig-client, M/R client tools, Sqoop
Slave Node 2 - DataNode, TaskTracker, HRegionServer,
Oozie-client, Hive-client, pig-client, M/R client tools, Sqoop
我希望有一个更现实的集群。我想使用12-14节点以下:
Master 0: Name Node
Master 1: Secondary NameNode
Master 2: JobTracker
Master 3: HMaster
Slave 0: DataNode, TraskTracker, HRegionServer
Slave 1: DataNode, TraskTracker, HRegionServer
Slave 2: DataNode, TraskTracker, HRegionServer
Hive/Catalog Node: Hive MetaStore,
Sqoop MetaStore
MySQL/PostgreSQL Database for Hive/Sqoop,
HCatalog,
HiveServer (Or is it better to break HiveServer into its own node?)
Oozie-Server (Or is it better to break Oozie-server into its own node?)
Zookeeper Ensemble: 3 Nodes with Zookeper installed
客户端节点:Oozie的客户端,蜂房客户端,猪的客户端,M/R客户端工具,Sqoop
或者,在图格式:
我知道了Cloudera喜欢你有:
A separate Master Node for each Master Process (NameNode, Secondary NameNode, JobTracker, HMaster)
3 Slave nodes with DataNode, TaskTracker, and HRegionServer
3 Zookeeper Nodes
"The database, the HiveServer process, and the metastore service can all
be on the same host, but running the HiveServer process on a separate host
provides better availability and scalability."
我对我的Hive数据库和我的Oozie数据库使用了相同的MySQL实例,并且认为可以再次执行。我也在计算HiveServer,Oozie-server可以和Hive/Oozie MetaStore一起运行在同一台主机上,以及HCatalog。
现在在我的三节点集群上,我已经在每个节点上安装了所有客户端软件,以便我可以从任何节点执行M/R,Hive,Oozie,HBase,Pig等客户端调用。这些客户端工具应该在独立于主节点和从节点的节点上执行吗?说到这一点,我一直在我的三节点集群中将所有的java/python/pig代码放在主节点上。这个数据是否更好地放在一个单独的客户端节点上?
我在正确的道路上吗?做出最小但理想的群集的正确方法是什么?