2017-02-09 23 views
0

只是想澄清如果 spark-submit --keytab --principal & & --proxy-user参数可以共存一起?在hadoop kerberos中使用--proxy-user,--keytab和--principal参数进行火花提交

我们有作为真正的商业用户提交工作的要求,但用户在hadoop kdc中没有委托人。

每当使用proxy-user和kerberos将它们放在一起时,我会得到异常。

17/02/09 13:51:43 INFO DFSClient: Created HDFS_DELEGATION_TOKEN token 379 for atlas on 10.12.118.92:8020 
Exception in thread "main" java.io.IOException: java.lang.reflect.UndeclaredThrowableException 
     at org.apache.hadoop.crypto.key.kms.KMSClientProvider.addDelegationTokens(KMSClientProvider.java:888) 
     at org.apache.hadoop.crypto.key.KeyProviderDelegationTokenExtension.addDelegationTokens(KeyProviderDelegationTokenExtension.java:8 
     at org.apache.hadoop.hdfs.DistributedFileSystem.addDelegationTokens(DistributedFileSystem.java:2243) 
     at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:121) 
     at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100) 
     at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80) 
     at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:206) 
     at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315) 
     at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199) 
     at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) 
     at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) 
     at scala.Option.getOrElse(Option.scala:120) 
     at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) 
     at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) 
     at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) 
     at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) 
     at scala.Option.getOrElse(Option.scala:120) 
     at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) 
     at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) 
     at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) 
     at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) 
     at scala.Option.getOrElse(Option.scala:120) 
     at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) 
     at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1293) 
     at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) 
     at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111) 
     at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) 
     at org.apache.spark.rdd.RDD.take(RDD.scala:1288) 
     at org.apache.spark.rdd.RDD$$anonfun$first$1.apply(RDD.scala:1328) 
     at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) 
     at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111) 
     at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) 
     at org.apache.spark.rdd.RDD.first(RDD.scala:1327) 
     at com.databricks.spark.csv.CsvRelation.firstLine$lzycompute(CsvRelation.scala:269) 
     at com.databricks.spark.csv.CsvRelation.firstLine(CsvRelation.scala:265) 
     at com.databricks.spark.csv.CsvRelation.inferSchema(CsvRelation.scala:242) 
     at com.databricks.spark.csv.CsvRelation.<init>(CsvRelation.scala:74) 
     at com.databricks.spark.csv.DefaultSource.createRelation(DefaultSource.scala:171) 
     at com.databricks.spark.csv.DefaultSource.createRelation(DefaultSource.scala:44) 
     at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:158) 
     at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119) 
     at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:109) 
     at org.sandbox.Main$.main(Main.scala:39) 
     at org.sandbox.Main.main(Main.scala) 
     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
     at java.lang.reflect.Method.invoke(Method.java:497) 
     at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) 
     at org.apache.spark.deploy.SparkSubmit$$anon$1.run(SparkSubmit.scala:163) 
     at org.apache.spark.deploy.SparkSubmit$$anon$1.run(SparkSubmit.scala:161) 
     at java.security.AccessController.doPrivileged(Native Method) 
     at javax.security.auth.Subject.doAs(Subject.java:422) 
     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) 
     at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:161) 
     at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) 
     at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) 
     at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 
Caused by: java.lang.reflect.UndeclaredThrowableException 
     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1672) 
     at org.apache.hadoop.crypto.key.kms.KMSClientProvider.addDelegationTokens(KMSClientProvider.java:870) 
     ... 57 more 
Caused by: org.apache.hadoop.security.authentication.client.AuthenticationException: Authentication failed, status: 403, message: Forbidde 
     at org.apache.hadoop.security.authentication.client.AuthenticatedURL.extractToken(AuthenticatedURL.java:274) 
     at org.apache.hadoop.security.authentication.client.PseudoAuthenticator.authenticate(PseudoAuthenticator.java:77) 
     at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:128 
     at org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:214) 
  1. 如果代理用户与主要参数不能同时共存,做你们有有关文件?
  2. kerberos hadoop环境中代理用户参数的真实用例是什么?
+0

Hadoop的“代理用户”的典型案例有'oozie'(作业调度器)和'hue'(网关UI) - 他们可以启动作业的你,不需要密码。没有你连接,在Oozie的情况下。 –

回答

0
+0

即使我首先执行kinit并在spark-submit过程中删除了principal和keytab,我仍然得到相同的异常。任何想法 ? – Adelave

0

我能使用--proxy用户,--principal使用火花提交--keytab在一起。上面的问题是由于DELEGATIONTOKEN请求许可给KMS Ranger。

因此,我在“Custom kms site”中添加了以下条目以使其起作用。

hadoop.kms.proxyuser.xxx.users=* 
hadoop.kms.proxyuser.xxx.hosts=* 
相关问题