2015-01-08 40 views
1

我fluming二进制对象HDFS和有我的水槽剂和水槽的设置是这样读二进制的Avro猪

a1.sinks.k1.type = hdfs 
a1.sinks.k1.channel = c1 
a1.sinks.k1.hdfs.path = /user/%y-%m-%d/%H%M/%S 
a1.sinks.k1.hdfs.filePrefix = events- 
a1.sinks.k1.hdfs.round = true 
a1.sinks.k1.hdfs.roundValue = 10 
a1.sinks.k1.hdfs.roundUnit = minute 

a1.sinks.k1.hdfs.fileType = DataStream 
a1.sinks.k1.hdfs.serializer = avro_event 
a1.sinks.k1.hdfs.serializer.syncIntervalBytes = 4096000 
a1.sinks.k1.hdfs.serializer.compressionCodec = snappy 
a1.sinks.k1.hdfs.serializer.appendNewline = false 
a1.sinks.k1.hdfs.fileSuffix=.avro 
a1.sinks.k1.hdfs.writeFormat=TEXT 

现在我想读取HDFS文件(something.avro)使用这种

data = LOAD 'something.avro' 
     USING org.apache.pig.piggybank.storage.avro.AvroStorage(); 
dump data; 

我不断获取此异常,任何想法,为什么我收到该异常或有另一种方式来读取猪脚本二进制的Avro对象而不提供的Avro架构

Caused by: java.io.IOException: Not a data file. 
at org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:105) 
at org.apache.avro.file.DataFileStream.<init>(DataFileStream.java:84) 
at org.apache.pig.piggybank.storage.avro.AvroStorageUtils.getSchema(AvroStorageUtils.java:718) 
at org.apache.pig.piggybank.storage.avro.AvroStorage.getSchema(AvroStorage.java:349) 
at org.apache.pig.piggybank.storage.avro.AvroStorage.getAvroSchema(AvroStorage.java:277) 
at org.apache.pig.piggybank.storage.avro.AvroStorage.getAvroSchema(AvroStorage.java:248) 
at org.apache.pig.piggybank.storage.avro.AvroStorage.setInputAvroSchema(AvroStorage.java:226) 
at org.apache.pig.piggybank.storage.avro.AvroStorage.getSchema(AvroStorage.java:434) 
at org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:175) 

回答

0

这里有同样的问题,我认为是因为我们正在读取avro二进制数据,它与AVRO文件不同。

你可以试着和使用Avro的工具的fragtojson

java -jar avro-tools-1.7.7.jar fragtojson part0.avro --schema-file schema.avsc

,看看它的工作原理读取文件!发布任何发现,如果你设法阅读它的猪。