2015-04-03 59 views
3

我一直被困在这个问题上好几天。所以任何帮助将不胜感激。将Cassandra表复制到Hive

我正在尝试制作cassandra表的副本以配置单元(以便我可以将它放入配置单元Metastore中,然后从Tableau访问它)。 Hive - > Tableau部分工作,但不是Cassandra到Hive部分。数据未被复制到Hive Metastore。

下面是我所采取的步骤:https://github.com/tuplejump/cash/tree/master/cassandra-handler

我产生蜂房cassandra-:

我跟着从这个项目的自述文件的说明。 .jar,将它复制到cassandra-all- .jar,cassandra-thrift - *。jar到配置单元的lib文件夹。

然后我开始蜂房,试过如下:

hive> add jar /usr/lib/hive/apache-hive-1.1.0/lib/hive-cassandra-1.2.6.jar; 
Added [/usr/lib/hive/apache-hive-1.1.0/lib/hive-cassandra-1.2.6.jar] to class path 
Added resources: [/usr/lib/hive/apache-hive-1.1.0/lib/hive-cassandra-1.2.6.jar] 
hive> list jars; 
/usr/lib/hive/apache-hive-1.1.0/lib/hive-cassandra-1.2.6.jar 
hive> create temporary function tmp as 'org.apache.hadoop.hive.cassandra.cql3.CqlStorageHandler' 
    > ; 
FAILED: Class org.apache.hadoop.hive.cassandra.cql3.CqlStorageHandler not found 

我不知道为什么蜂巢看不到CqlStorageHandler ...

谢谢!

回答

1

另一种可以考虑的方法是编写一个简单的java程序,将数据写入文件,然后将其加载到配置单元中。

package com.company.cassandra; 

import com.datastax.driver.core.Cluster; 
import com.datastax.driver.core.Cluster.Builder; 
import com.datastax.driver.core.ResultSet; 
import com.datastax.driver.core.ResultSetFuture; 
import com.datastax.driver.core.Row; 
import com.datastax.driver.core.Session; 

public class CassandraExport { 

    public static Session session; 


    public static void connect(String username, String password, String host, int port, String keyspace) { 
     Builder builder = Cluster.builder().addContactPoint(host); 
     builder.withPort(port); 
     if (username != null && password != null) { 
      builder.withCredentials(username, password); 
     } 

     Cluster cluster = builder.build(); 
     session = cluster.connect(keyspace); 
    } 

    public static void main(String[] args) { 
     //Prod 
     connect("user", "password", "server", 9042, "keyspace"); 

     ResultSetFuture future = session.executeAsync("SELECT * FROM table;"); 
     ResultSet results = future.getUninterruptibly(); 
     for (Row row : results) { 
      //Print the columns in the following order 
      String out = row.getString("col1") + "\t" + 
          String.valueOf(row.getInt("col2")) + "\t" + 
          String.valueOf(row.getLong("col3")) + "\t" + 
          String.valueOf(row.getLong("col4")); 
      System.out.println(out); 
     } 

     session.close(); 
     session.getCluster().close(); 
    } 


} 

将输出写入文件,然后加载到配置单元。

hive -e "use schema; load data local inpath '/tmp/cassandra-table' overwrite into table mytable;"