2016-05-14 77 views
3

这是一个基本问题,但是,我试图使用Apache Spark服务中Analytics上的Bluemix笔记本中Scala中的代码检索文件的内容,并且关于认证的错误不断弹出。有人有一个用于访问文件的Scala认证示例吗?先谢谢你!Bluemix Apache Spark服务 - Scala - 读取文件

我尝试以下简单的脚本:

val file = sc.textFile("swift://notebooks.keystone/kdd99.data") 
file.take(1) 

我也试过:

def setConfig(name:String) : Unit = { 
    val pfx = "fs.swift.service." + name 
    val conf = sc.getConf 
    conf.set(pfx + "auth.url", "hardcoded") 
    conf.set(pfx + "tenant", "hardcoded") 
    conf.set(pfx + "username", "hardcoded") 
    conf.set(pfx + "password", "hardcoded") 
    conf.set(pfx + "apikey", "hardcoded") 
    conf.set(pfx + "auth.endpoint.prefix", "endpoints") 
} 
setConfig("keystone") 

我也试着从以前的问题,这个脚本:

import scala.collection.breakOut 
val name= "keystone" 
val YOUR_DATASOURCE = """auth_url:https://identity.open.softlayer.com 
project: hardcoded 
project_id: hardcoded 
region: hardcoded 
user_id: hardcoded 
domain_id: hardcoded 
domain_name: hardcoded 
username: hardcoded 
password: hardcoded 
filename: hardcoded 
container: hardcoded 
tenantId: hardcoded 
""" 

val settings:Map[String,String] = YOUR_DATASOURCE.split("\\n"). 
    map(l=>(l.split(":",2)(0).trim(), l.split(":",2)(1).trim()))(breakOut) 

val conf = sc.getConf  conf.set("fs.swift.service.keystone.auth.url",settings.getOrElse("auth_url","")) 
conf.set("fs.swift.service.keystone.tenant", settings.getOrElse("tenantId", "")) 
conf.set("fs.swift.service.keystone.username", settings.getOrElse("username", "")) 
conf.set("fs.swift.service.keystone.password", settings.getOrElse("password", "")) 
conf.set("fs.swift.service.keystone.apikey", settings.getOrElse("password", "")) 
conf.set("fs.swift.service.keystone.auth.endpoint.prefix", "endpoints") 
println("sett: "+ settings.getOrElse("auth_url","")) 
val file = sc.textFile("swift://notebooks.keystone/kdd99.data") 

/* The following line gives errors */ 
file.take(1) 

误差低于:

姓名:org.apache.hadoop.fs.swift.exceptions.SwiftConfigurationException 消息:缺少必需的配置选项:fs.swift.service.keystone.auth.url

编辑

这将是一个Python的好选择。我试过以下,以“火花”作为配置名称为两个不同的文件:

def set_hadoop_config(credentials): 
    prefix = "fs.swift.service." + credentials['name'] 
    hconf = sc._jsc.hadoopConfiguration() 
    hconf.set(prefix + ".auth.url", credentials['auth_url']+'/v3/auth/tokens') 
    hconf.set(prefix + ".auth.endpoint.prefix", "endpoints") 
    hconf.set(prefix + ".tenant", credentials['project_id']) 
    hconf.set(prefix + ".username", credentials['user_id']) 
    hconf.set(prefix + ".password", credentials['password']) 
    hconf.setInt(prefix + ".http.port", 8080) 
    hconf.set(prefix + ".region", credentials['region']) 
    hconf.setBoolean(prefix + ".public", True) 

回答

2

要从Scala中的对象存储中访问文件,以下命令序列可在Scala笔记本中工作: (当您为数据源中显示的文件执行“插入到代码”链接时,凭据将填充到单元格中笔记本):

IN [1]:

var credentials = scala.collection.mutable.HashMap[String, String](
    "auth_url"->"https://identity.open.softlayer.com", 
    "project"->"object_storage_b3c0834b_0936_4bbe_9f29_ef45e018cec9", 
    "project_id"->"68d053dff02e42b1a947457c6e2e3290", 
    "region"->"dallas", 
    "user_id"->"e7639268215e4830a3662f708e8c4a5c", 
    "domain_id"->"2df6373c549e49f8973fb6d22ab18c1a", 
    "domain_name"->"639347", 
    "username"->"Admin_XXXXXXXXXXXX”, 
    "password”->”””XXXXXXXXXX”””, 
    "filename"->"2015_small.csv", 
    "container"->"notebooks", 
    "tenantId"->"sefe-f831d4ccd6da1f-42a9cf195d79" 
) 

IN [2]:

credentials("name")="keystone" 

IN [3]:

def setHadoopConfig(name: String, tenant: String, url: String, username: String, password: String, region: String) = { 
    sc.hadoopConfiguration.set(f"fs.swift.service.$name.auth.url",url+"/v3/auth/tokens") 
    sc.hadoopConfiguration.set(f"fs.swift.service.$name.auth.endpoint.prefix","endpoints") 
    sc.hadoopConfiguration.set(f"fs.swift.service.$name.tenant",tenant) 
    sc.hadoopConfiguration.set(f"fs.swift.service.$name.username",username) 
    sc.hadoopConfiguration.set(f"fs.swift.service.$name.password",password) 
    sc.hadoopConfiguration.setInt(f"fs.swift.service.$name.http.port",8080) 
    sc.hadoopConfiguration.set(f"fs.swift.service.$name.region",region) 
    sc.hadoopConfiguration.setBoolean(f"fs.swift.service.$name.public",true) 
} 

IN [4]:

setHadoopConfig(credentials("name"), credentials("project_id"), credentials("auth_url"), credentials("user_id"), credentials("password"), credentials("region")) 

IN [5]:

var testcount = sc.textFile("swift://notebooks.keystone/2015_small.csv") 
testcount.count() 

IN [6]:

testcount.take(1) 
+0

谢谢NSHUKLA – tbuda

+0

我已经用Python版本编辑了这个问题。你能看看吗? – tbuda

+0

对于Python,代码似乎是正确的(您可以参考示例“Analytics Notebooks和Apache Spark”,它具有用于def set_hadoop_config(凭证)的python代码。 我尝试过使用keystone名称的.csv和.txt文件。您是否遇到spark问题,如.data文件中的配置文件,如您所说的与.txt文件一起使用的文件? – NSHUKLA

3

我认为你需要使用“火花”作为配置名称,而不是重点因为你正试图从IBM Bluemix笔记本电脑访问对象存储UI。

sc.textFile(“SWIFT://notebooks.spark/2015_small.csv”)

现在,这里是工作示例的例子

https://console.ng.bluemix.net/data/notebooks/4dda9ee7-bf26-4ebc-bccf-dcb1b7ef63c8/view?access_token=37bff7ab682ee255b753fca485d49de50fed69d2a25217a7c748dd1463222c3b

注意考虑改变容器名称。 containername.configname。

另请在上例中的YOUR_DATASOURCE变量中替换您的凭据。

笔记本是默认的容器。

谢谢, Charles。

+0

这是它!非常感谢你。只是“keystone”不是一个好的配置名称。为什么“火花”现在的作品?这是一个新规则吗?以前keystone也能正常工作。 – tbuda

+0

keystone可能工作我认为....我认为IBM BM对象存储的API似乎升级到V3,它可能需要v3 api URL/v3/auth/tokens ..我还没有测试,但如下文所述@ NSHUKLA,你可能需要更新的URL来使用keystone ... –

+0

谢谢Charles – tbuda

相关问题