2017-03-29 32 views
0

我在Windows 10上运行Spark 2.1.0。我连接到MySQL数据库以使用JDBC将数据导入spark。如下所示,每当我执行一个操作时,我会得到以下警告,这让我怀疑数据是从数据库中为每个操作检索的。Spark每次执行转换/操作时都会连接到数据库?

scala> val jdbcDF2 = spark.read.jdbc("jdbc:mysql:dbserver", "schema.tablename", connectionProperties) 
Wed Mar 29 15:05:23 IST 2017 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification. 
jdbcDF2: org.apache.spark.sql.DataFrame = [id: bigint, site: bigint ... 15 more fields] 

scala> jdbcDF2.count 
Wed Mar 29 15:09:09 IST 2017 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification. 

如果是这样的话,有没有办法让我可以将数据保存到火花本地对象就像一个数据帧,这样它没有连接到数据库,所有的时间?

我试图cache the table并成功运行,但我无法使用星火-SQL放在桌子上

scala> jdbcDF2.cache() 
res6: jdbcDF2.type = [id: bigint, site: bigint ... 15 more fields] 
scala> val unique = sql("SELECT DISTINCT site FROM jdbcDF2") 
org.apache.spark.sql.AnalysisException: Table or view not found: jdbcDF2; 

回答

0

你是对缓存的数据帧以备后用,为了在不查询数据库每个星火行动(收集,统计,第一,...)

但使用SQL查询您的数据帧,首先你要做的:

jdbcDF2.createOrReplaceTempView("my_table") 

和N:

sql("SELECT DISTINCT site FROM my_table") 
1

您可以直接使用

val unique = jdbcDF2.selectExpr("count(distinct site)")

val unique = jdbcDF2.select("site").distinct.count

或 创建从您的数据帧的临时观点和你的数据帧缓存后执行您查询通过sqlContext访问它

jdbcDF2.createOrReplaceTempView("jdbcDF2") 
val unique = sql("SELECT DISTINCT site FROM jdbcDF2") 
相关问题