1
我想使用hadoop流式分类文件。Hadoop流式分类错误
的文件格式是如下
<ID> <TextID> <Offset> <Text> - where ID is alpha numeric, TextID is alpha numeric and Offset is numeric
我想去做点一些BY ID,文本ID,偏移ASC
我使用下面的Hadoop流命令:
hadoop jar /apollo/env/SEOHadoopClient/lib/hadoop-streaming-0.20.205.0.jar \
-D mapred.output.key.comparator.class=org.apache.hadoop.mapred.lib.KeyFieldBasedComparator \
-D stream.num.map.output.key.fields=3 \
-D mapred.text.key.comparator.options="-k1 -k2 -k3n" \
-D mapred.text.key.partitioner.options=-k1,1 \
-input /user/sakul/hadoop-streaming-sort/output \
-output /user/sakul/hadoop-streaming-sort/sort-output \
-mapper org.apache.hadoop.mapred.lib.IdentityMapper \
-reducer org.apache.hadoop.mapred.lib.IdentityReducer \
-partitioner org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner
但我在映射程序中得到以下例外:
java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1014)
at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:592)
at org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:38)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
我在这里做错了什么?
谢谢。我通过使用-mapper =“cat”-reducer =“sort -k1”等工作。 –