我将DynamoDB表导出为s3作为备份(通过EMR)。当我导出时,我将数据存储为lzo压缩文件。我的配置单元查询在下面,但基本上我遵循了“使用数据压缩将Amazon DynamoDB表导出到Amazon S3存储桶”http://docs.amazonwebservices.com/amazondynamodb/latest/developerguide/EMR_Hive_Commands.html将s3中的压缩(lzo)数据导入配置单元
我现在想做相反的操作 - 取出我的LZO文件并获取他们回到一个蜂巢表。你怎么做到这一点?我期待看到一些hive configuration property的输入,但没有。我搜索了一下,发现了一些提示,但没有明确的说明,也没有什么可行的。
文件中S3的格式为:S3:// [mybucket] /backup/year=2012/month=08/day=01/000000.lzo
这里是我的HQL,做出口:
SET dynamodb.throughput.read.percent=1.0;
SET hive.exec.compress.output=true;
SET io.seqfile.compression.type=BLOCK;
SET mapred.output.compression.codec = com.hadoop.compression.lzo.LzopCodec;
CREATE EXTERNAL TABLE hiveSBackup (id bigint, periodStart string, allotted bigint, remaining bigint, created string, seconds bigint, served bigint, modified string)
STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'
TBLPROPERTIES ("dynamodb.table.name" = "${DYNAMOTABLENAME}",
"dynamodb.column.mapping" = "id:id,periodStart:periodStart,allotted:allotted,remaining:remaining,created:created,seconds:seconds,served:served,modified:modified");
CREATE EXTERNAL TABLE s3_export (id bigint, periodStart string, allotted bigint, remaining bigint, created string, seconds bigint, served bigint, modified string)
PARTITIONED BY (year string, month string, day string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION 's3://<mybucket>/backup';
INSERT OVERWRITE TABLE s3_export
PARTITION (year="${PARTITIONYEAR}", month="${PARTITIONMONTH}", day="${PARTITIONDAY}")
SELECT * from hiveSBackup;
任何想法如何从S3,解压缩到蜂巢表?
你有一个完整的示例hql脚本,它的工作原理?我试过你提到的没有成功。我的数据再次分区。我只是想导入蜂巢,而不是发电机。 – rynop 2012-09-18 15:44:10
编辑我的答案添加一个例子。 – Tim 2012-09-18 16:47:42