1

我正在尝试写一个运行Pig的EMR作业,写入我们将用于服务的DSE。不幸的是,我无法让Pig写入DSE,所以我将问题分解为连接到DSE节点并尝试写入。 下面是我在做什么从外部Pig作业(Pig - > DSE连接器)写入DSE

在卡桑德拉节点:

cqlsh> CREATE KEYSPACE cql3ks WITH replication = 
    {'class': 'SimpleStrategy', 'replication_factor': 1 }; 
cqlsh> USE cql3ks 
cqlsh:cql3ks> CREATE TABLE test (a int PRIMARY KEY, b int); 

从本机

export PIG_INITIAL_ADDRESS=<cassandra node IP> 
export PIG_RPC_PORT=9160 
export PIG_PARTITIONER=org.apache.cassandra.dht.Murmur3Partitioner 
pig -x local 

grunt> REGISTER /var/lib/cassandra/resources/cassandra/lib/libthrift-0.7.0.jar; 
grunt> REGISTER /var/lib/cassandra/resources/cassandra/lib/cassandra-thrift-1.2.13.2.jar; 
grunt> REGISTER /var/lib/cassandra/resources/cassandra/lib/cassandra-all-1.2.13.2.jar; 
grunt> DEFINE CqlStorage org.apache.cassandra.hadoop.pig.CqlStorage(); 
grunt> moretestvalues= LOAD 'cql://cql3ks/test/' USING CqlStorage; 
grunt> insertformat= FOREACH moretestvalues GENERATE TOTUPLE(TOTUPLE('a',a)),TOTUPLE(b); 
grunt> STORE insertformat INTO 'cql://cql3ks/test?output_query=UPDATE+cql3ks.test+set+b+%3D+%3F' USING CqlStorage(); 

当我这样做,我收到以下错误:

2014-02-25 18:50:27,952 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN 
2014-02-25 18:50:28,506 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected 
2014-02-25 18:50:28,506 [main] WARN org.apache.pig.tools.grunt.Grunt - There is no log file to write to. 
2014-02-25 18:50:28,506 [main] ERROR org.apache.pig.tools.grunt.Grunt - java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected 
at org.apache.cassandra.hadoop.AbstractColumnFamilyOutputFormat.checkOutputSpecs(AbstractColumnFamilyOutputFormat.java:75) 
at org.apache.pig.newplan.logical.rules.InputOutputFileValidator$InputOutputFileVisitor.visit(InputOutputFileValidator.java:80) 
at org.apache.pig.newplan.logical.relational.LOStore.accept(LOStore.java:66) 
at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:64) 
at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66) 
at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66) 
at org.apache.pig.newplan.DepthFirstWalker.walk(DepthFirstWalker.java:53) 
at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52) 
at org.apache.pig.newplan.logical.rules.InputOutputFileValidator.validate(InputOutputFileValidator.java:45) 
at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:288) 
at org.apache.pig.PigServer.compilePp(PigServer.java:1322) 
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1247) 
at org.apache.pig.PigServer.execute(PigServer.java:1239) 
at org.apache.pig.PigServer.access$400(PigServer.java:121) 
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1553) 
at org.apache.pig.PigServer.registerQuery(PigServer.java:516) 
at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:991) 
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:412) 
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194) 
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170) 
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69) 
at org.apache.pig.Main.run(Main.java:538) 
at org.apache.pig.Main.main(Main.java:157) 
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
at java.lang.reflect.Method.invoke(Method.java:622) 
at org.apache.hadoop.util.RunJar.main(RunJar.java:212) 

回答

1

这是一个版本问题。您可能正在使用hadoop 2.x,而Cassandra库正在使用hadoop 1.x api。如果不检查是否使用正确的罐子。

下一步卡桑德拉bugfix版本(2.0.6)将包括兼容性apis或至少这issue这样说。