1
我fluming二进制对象HDFS和有我的水槽剂和水槽的设置是这样读二进制的Avro猪
a1.sinks.k1.type = hdfs
a1.sinks.k1.channel = c1
a1.sinks.k1.hdfs.path = /user/%y-%m-%d/%H%M/%S
a1.sinks.k1.hdfs.filePrefix = events-
a1.sinks.k1.hdfs.round = true
a1.sinks.k1.hdfs.roundValue = 10
a1.sinks.k1.hdfs.roundUnit = minute
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.serializer = avro_event
a1.sinks.k1.hdfs.serializer.syncIntervalBytes = 4096000
a1.sinks.k1.hdfs.serializer.compressionCodec = snappy
a1.sinks.k1.hdfs.serializer.appendNewline = false
a1.sinks.k1.hdfs.fileSuffix=.avro
a1.sinks.k1.hdfs.writeFormat=TEXT
现在我想读取HDFS文件(something.avro)使用这种
data = LOAD 'something.avro'
USING org.apache.pig.piggybank.storage.avro.AvroStorage();
dump data;
我不断获取此异常,任何想法,为什么我收到该异常或有另一种方式来读取猪脚本二进制的Avro对象而不提供的Avro架构
Caused by: java.io.IOException: Not a data file.
at org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:105)
at org.apache.avro.file.DataFileStream.<init>(DataFileStream.java:84)
at org.apache.pig.piggybank.storage.avro.AvroStorageUtils.getSchema(AvroStorageUtils.java:718)
at org.apache.pig.piggybank.storage.avro.AvroStorage.getSchema(AvroStorage.java:349)
at org.apache.pig.piggybank.storage.avro.AvroStorage.getAvroSchema(AvroStorage.java:277)
at org.apache.pig.piggybank.storage.avro.AvroStorage.getAvroSchema(AvroStorage.java:248)
at org.apache.pig.piggybank.storage.avro.AvroStorage.setInputAvroSchema(AvroStorage.java:226)
at org.apache.pig.piggybank.storage.avro.AvroStorage.getSchema(AvroStorage.java:434)
at org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:175)