我正在尝试使用Hive读取存储在Cassandra文件系统(CFS)中的固定宽度文本文件。当我从配置单元客户端运行时,我能够查询文件。但是,当我尝试从Hadoop Hive JDBC运行时,它说表格不可用或连接不良。以下是我遵循的步骤。DataStax Cassandra文件系统 - 固定宽度文本文件 - Hive集成问题
输入文件(employees.dat):
21736Ambalavanar Thirugnanam BOY-EAG 2005-05-091992-11-18
21737Anand Jeyamani BOY-AST 2005-05-091985-02-12
31123Muthukumar Rajendran BOY-EES 2009-08-121983-02-23
启动蜂房客户
bash-3.2# dse hive;
Logging initialized using configuration in file:/etc/dse/hive/hive-log4j.properties
Hive history file=/tmp/root/hive_job_log_root_201209250900_157600446.txt
hive> use HiveDB;
OK
Time taken: 1.149 seconds
创建蜂房外部表指向固定宽度的格式的文本文件
hive> CREATE EXTERNAL TABLE employees (empid STRING, firstname STRING, lastname STRING, dept STRING, dateofjoining STRING, dateofbirth STRING)
> ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
> WITH SERDEPROPERTIES ("input.regex" = "(.{5})(.{25})(.{25})(.{15})(.{10})(.{10}).*")
> LOCATION 'cfs://hostname:9160/folder/';
OK
Time taken: 0.524 seconds
从表中选择*。
hive> select * from employees;
OK
21736 Ambalavanar Thirugnanam BOY-EAG 2005-05-09 1992-11-18
21737 Anand Jeyamani BOY-AST 2005-05-09 1985-02-12
31123 Muthukumar Rajendran BOY-EES 2009-08-12 1983-02-23
Time taken: 0.698 seconds
不要从蜂巢表的具体字段选择抛出权限错误(首次发行)
hive> select empid, firstname from employees;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
java.io.IOException: The ownership/permissions on the staging directory cfs:/tmp/hadoop-root/mapred/staging/root/.staging is not as expected. It is owned by root and permissions are rwxrwxrwx. The directory must be owned by the submitter root or by root and permissions must be rwx------
at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:108)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:856)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:824)
at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:452)
at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:136)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:133)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1332)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1123)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:931)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:255)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:212)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:671)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:554)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Job Submission failed with exception 'java.io.IOException(The ownership/permissions on the staging directory cfs:/tmp/hadoop-root/mapred/staging/root/.staging is not as expected. It is owned by root and permissions are rwxrwxrwx. The directory must be owned by the submitter root or by root and permissions must be rwx------)'
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MapRedTask
第二个问题是,当我尝试从JDBC蜂巢选择*查询驱动程序(在dse/cassandra节点之外),它表示表员工不可用。创建的外部表格就像一个临时表格,并且不会持续。当我使用“配置单元>显示表”时,员工表未列出。任何人都可以请帮我找出问题吗?
谢谢Beobal。我将在DataStax论坛中提出这个问题。 – Ambal