我在Windows 10上运行Spark 2.1.0。我连接到MySQL数据库以使用JDBC将数据导入spark。如下所示,每当我执行一个操作时,我会得到以下警告,这让我怀疑数据是从数据库中为每个操作检索的。Spark每次执行转换/操作时都会连接到数据库?
scala> val jdbcDF2 = spark.read.jdbc("jdbc:mysql:dbserver", "schema.tablename", connectionProperties)
Wed Mar 29 15:05:23 IST 2017 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
jdbcDF2: org.apache.spark.sql.DataFrame = [id: bigint, site: bigint ... 15 more fields]
scala> jdbcDF2.count
Wed Mar 29 15:09:09 IST 2017 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
如果是这样的话,有没有办法让我可以将数据保存到火花本地对象就像一个数据帧,这样它没有连接到数据库,所有的时间?
我试图cache the table并成功运行,但我无法使用星火-SQL放在桌子上
scala> jdbcDF2.cache()
res6: jdbcDF2.type = [id: bigint, site: bigint ... 15 more fields]
scala> val unique = sql("SELECT DISTINCT site FROM jdbcDF2")
org.apache.spark.sql.AnalysisException: Table or view not found: jdbcDF2;