如何将配置单元分区读入Apache Crunch管道？

我能够将hdfs中的文本文件读入apache crunch管道。但现在我需要阅读配置单元分区。问题是，根据我们的设计，我不应该直接访问该文件。因此，现在我需要一些方式来使用像HCatalog这样的分区来访问分区。如何将配置单元分区读入Apache Crunch管道？

来源

2014-10-20 Jijo Mathew

您可以使用org.apache.hadoop.hive.metastore API或HCat API。这是一个使用hive.metastore的简单例子。除非你想加入Mapper/Reducer中的一些Hive分区，否则你必须在你的Pipeline开始之前或之前打电话。

HiveMetaStoreClient hmsc = new HiveMetaStoreClient(hiveConf) 
HiveMetaStoreClient hiveClient = getHiveMetastoreConnection(); 
List<Partition> partitions = hiveClient.listPartittions("default", "my_hive_table", 1000) 
for(Partition partition: partitions) { 
    System.out.println("HDFS data location of the partition: " + partition.getSd().getLocation()) 
}

你需要的唯一的另一件事是出口蜂巢的conf目录：

export HIVE_CONF_DIR=/home/mmichalski/hive/conf

来源

2014-11-21 23:02:29 Marcin

如何将配置单元分区读入Apache Crunch管道？

回答

相关问题