从IntelliJ IDEA的

提交连接一个Spark应用到卡桑德拉数据库我发现了一个类似的问题在这里：How to submit code to a remote Spark cluster from IntelliJ IDEA 从IntelliJ IDEA的

我要提交一个Spark应用程序在其上安装Spark和卡桑德拉集群。

我的应用程序在Windows操作系统中。该应用程序是使用被写入的IntelliJ：

Maven的
斯卡拉
星火

下面的代码片段：

val spark = SparkSession 
    .builder().master("spark://...:7077") // the actual code contains the IP of the master node from the cluster 
    .appName("Cassandra App") 
    .config("spark.cassandra.connection.host", cassandraHost) // is the same as the IP of the master node from the cluster 
    .getOrCreate() 

val sc = spark.sparkContext 

val trainingdata = sc.cassandraTable("sparkdb", "trainingdata").map(a => a.get[String]("attributes"))

群集包含两个节点哪个Ubuntu安装。另外，Cassandra和Spark安装在每个节点上。

当我用local[*]而不是spark://...:7077一切正常。然而，当我使用这个职位描述的版本，我得到了一个错误：

WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

在集群上，误差进一步详述：

java.lang.ClassNotFoundException: MyApplication$$anonfun$1 
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)

另外，我要指出的是，在Windows上编写的应用程序使用Spark作为Maven依赖项。

我想知道是否有可能将这个Spark应用程序从Windows节点提交到Ubuntu集群，如果不可能，我应该使用什么替代方法。如果我必须从Scala对象创建jar，我应该使用什么方法从IntelliJ调用集群？

来源

2017-04-26 dorinmoldovan

为了启动你的应用程序，它应该保持在群集上，也就是说你的打包jar应该驻留在群集中，或者保存在群集中的HDFS或者同一路径上的每个节点上。然后你可以使用ssh客户端或者RESTfull interface或其他任何能够触发spark-submit命令的东西。

来源

2017-04-26 13:25:32 FaigB

非常感谢您的回答！ – dorinmoldovan

从IntelliJ IDEA的

回答

相关问题