2012-09-25 52 views
1

我正在尝试使用Hive读取存储在Cassandra文件系统(CFS)中的固定宽度文本文件。当我从配置单元客户端运行时,我能够查询文件。但是,当我尝试从Hadoop Hive JDBC运行时,它说表格不可用或连接不良。以下是我遵循的步骤。DataStax Cassandra文件系统 - 固定宽度文本文件 - Hive集成问题

输入文件(employees.dat):

21736Ambalavanar    Thirugnanam    BOY-EAG  2005-05-091992-11-18 
21737Anand     Jeyamani     BOY-AST  2005-05-091985-02-12 
31123Muthukumar    Rajendran    BOY-EES  2009-08-121983-02-23 

启动蜂房客户

bash-3.2# dse hive; 
Logging initialized using configuration in file:/etc/dse/hive/hive-log4j.properties 
Hive history file=/tmp/root/hive_job_log_root_201209250900_157600446.txt 
hive> use HiveDB; 
OK 
Time taken: 1.149 seconds 

创建蜂房外部表指向固定宽度的格式的文本文件

hive> CREATE EXTERNAL TABLE employees (empid STRING, firstname STRING, lastname STRING, dept STRING, dateofjoining STRING, dateofbirth STRING) 
    > ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' 
    > WITH SERDEPROPERTIES ("input.regex" = "(.{5})(.{25})(.{25})(.{15})(.{10})(.{10}).*") 
    > LOCATION 'cfs://hostname:9160/folder/'; 
OK 
Time taken: 0.524 seconds 

从表中选择*。

hive> select * from employees; 
OK 
21736 Ambalavanar      Thirugnanam      BOY-EAG  2005-05-09  1992-11-18 
21737 Anand       Jeyamani      BOY-AST  2005-05-09  1985-02-12 
31123 Muthukumar      Rajendran      BOY-EES  2009-08-12  1983-02-23 
Time taken: 0.698 seconds 

不要从蜂巢表的具体字段选择抛出权限错误(首次发行)

hive> select empid, firstname from employees; 
Total MapReduce jobs = 1 
Launching Job 1 out of 1 
Number of reduce tasks is set to 0 since there's no reduce operator 
java.io.IOException: The ownership/permissions on the staging directory cfs:/tmp/hadoop-root/mapred/staging/root/.staging is not as expected. It is owned by root and permissions are rwxrwxrwx. The directory must be owned by the submitter root or by root and permissions must be rwx------ 
     at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:108) 
     at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:856) 
     at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850) 
     at java.security.AccessController.doPrivileged(Native Method) 
     at javax.security.auth.Subject.doAs(Subject.java:416) 
     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093) 
     at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850) 
     at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:824) 
     at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:452) 
     at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:136) 
     at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:133) 
     at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) 
     at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1332) 
     at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1123) 
     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:931) 
     at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:255) 
     at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:212) 
     at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403) 
     at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:671) 
     at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:554) 
     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
     at java.lang.reflect.Method.invoke(Method.java:616) 
     at org.apache.hadoop.util.RunJar.main(RunJar.java:156) 
Job Submission failed with exception 'java.io.IOException(The ownership/permissions on the staging directory cfs:/tmp/hadoop-root/mapred/staging/root/.staging is not as expected. It is owned by root and permissions are rwxrwxrwx. The directory must be owned by the submitter root or by root and permissions must be rwx------)' 
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MapRedTask 

第二个问题是,当我尝试从JDBC蜂巢选择*查询驱动程序(在dse/cassandra节点之外),它表示表员工不可用。创建的外部表格就像一个临时表格,并且不会持续。当我使用“配置单元>显示表”时,员工表未列出。任何人都可以请帮我找出问题吗?

回答

3

我没有第一个问题的直接答案,但第二个看起来像是由于一个已知问题。

DSE 2.1中存在一个错误,它会在运行show表时删除从Metastore的CFS文件创建的外部表。只有表格元数据被删除,数据保留在CFS中,所以如果重新创建表格定义,您不必重新加载它。由Cassandra ColumnFamilies支持的表不受此错误的影响。这已经在DSE的2.2版本中得到修复,该版本即将发布。

我不熟悉Hive JDBC驱动程序,但如果它在任何时候发出Show Tables命令,都可能触发此错误。

+0

谢谢Beobal。我将在DataStax论坛中提出这个问题。 – Ambal