-1
我试图连接从笔记本电脑到红移,到目前为止,我已经做了以下 -加载外部罐到火花笔记本失败
配置的元数据的笔记本
"customDeps": [
"com.databricks:spark-redshift_2.10:3.0.0-preview1",
"com.databricks:spark-avro_2.11:3.2.0",
"com.databricks:spark-csv_2.11:1.5.0"
]
经过浏览器控制台,以确保该库重新启动内核后加载
ui-logs-1422> [Tue Aug 22 2017 09:46:26 GMT+0530 (IST)] [notebook.util.CoursierDeps$] Fetched artifact to:/Users/xxxx/.m2/repository/com/databricks/spark-avro_2.10/3.0.0/spark-avro_2.10-3.0.0.jar
kernel.js:978 ui-logs-1452> [Tue Aug 22 2017 09:46:26 GMT+0530 (IST)] [notebook.util.CoursierDeps$] Fetched artifact to:/Users/xxxx/.coursier/cache/v1/http/repo1.maven.org/maven2/com/databricks/spark-redshift_2.10/3.0.0-preview1/spark-redshift_2.10-3.0.0-preview1.jar
kernel.js:978 ui-logs-1509> [Tue Aug 22 2017 09:46:26 GMT+0530 (IST)] [notebook.util.CoursierDeps$] Fetched artifact to:/Users/xxxx/.coursier/cache/v1/http/repo1.maven.org/maven2/com/databricks/spark-csv_2.11/1.5.0/spark-csv_2.11-1.5.0.jar
kernel.js:978 ui-logs-1526> [Tue Aug 22 2017 09:46:26 GMT+0530 (IST)] [notebook.util.CoursierDeps$] Fetched artifact to:/Users/xxxx/.coursier/cache/v1/http/repo1.maven.org/maven2/com/databricks/spark-avro_2.11/3.2.0/spark-avro_2.11-3.2.0.jar
When i try to load a table - i run into class not found exception,
java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.redshift. Please find packages at http://spark.apache.org/third-party-projects.html
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:594)
at org.apache.spark.sql.execution.datasources.DataSource.providingClass$lzycompute(DataSource.scala:86)
at org.apache.spark.sql.execution.datasources.DataSource.providingClass(DataSource.scala:86)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:325)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:125)
... 63 elided
Caused by: java.lang.ClassNotFoundException: com.databricks.spark.redshift.DefaultSource
at scala.reflect.internal.util.AbstractFileClassLoader.findClass(AbstractFileClassLoader.scala:62)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$25$$anonfun$apply$13.apply(DataSource.scala:579)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$25$$anonfun$apply$13.apply(DataSource.scala:579)
at scala.util.Try$.apply(Try.scala:192)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$25.apply(DataSource.scala:579)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$25.apply(DataSource.scala:579)
at scala.util.Try.orElse(Try.scala:84)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:579)
其他人遇到此问题或已解决此问题?
我注意到与另一个依赖关系类似的问题,以及配置中是否缺少任何东西?
试行时间序列样品中的笔记本 - 笔记本/时间序列/火花Timeseries.snb.ipynb 通知在元数据中的现有条目进行自定义的依赖 -
"customDeps": [
"com.cloudera.sparkts % sparkts % 0.3.0"
]
快速验证该包的可用性@https://spark-packages.org/package/sryza/spark-timeseries (更新元数据,包括此行)
"com.cloudera.sparkts:sparkts:0.4.1"
重启内核后 - 验证库加载
ui-logs-337> [Wed Aug 23 2017 09:29:25 GMT+0530 (IST)] [notebook.util.CoursierDeps$] Will fetch these customDeps artifacts:Set(Dependency(com.cloudera.sparkts:sparkts,0.3.0,,Set(),Attributes(,),false,true), Dependency(com.cloudera.sparkts:sparkts,0.4.1,,Set(),Attributes(,),false,true))
kernel.js:978 ui-logs-347> [Wed Aug 23 2017 09:29:37 GMT+0530 (IST)] [notebook.util.CoursierDeps$] Fetched artifact to:/Users/xxxx/.coursier/cache/v1/http/repo1.maven.org/maven2/com/cloudera/sparkts/sparkts/0.4.1/sparkts-0.4.1.jar
Error message -
<console>:69: error: object cloudera is not a member of package com
import com.cloudera.sparkts._
^
<console>:70: error: object cloudera is not a member of package com
import com.cloudera.sparkts.stats.TimeSeriesStatisticalTests