2017-06-01 32 views
4

我已经使用sbt构建了一个简单的Spark应用程序。这里是我的代码:运行Spark SBT应用程序时,为什么需要添加“fork in run:= true”?

import org.apache.spark.sql.SparkSession 

object HelloWorld { 
    def main(args: Array[String]): Unit = { 
    val spark = SparkSession.builder().master("local").appName("BigApple").getOrCreate() 

    import spark.implicits._ 

    val ds = Seq(1, 2, 3).toDS() 
    ds.map(_ + 1).foreach(x => println(x)) 
    } 
} 

以下是我build.sbt

name := """sbt-sample-app""" 

version := "1.0" 

scalaVersion := "2.11.7" 

libraryDependencies += "org.scalatest" %% "scalatest" % "2.2.6" % "test" 
libraryDependencies += "org.apache.spark" % "spark-sql_2.11" % "2.1.1" 

现在,当我尝试做sbt run,它给了我以下错误:

$ sbt run 
[info] Loading global plugins from /home/user/.sbt/0.13/plugins 
[info] Loading project definition from /home/user/Projects/sample-app/project 
[info] Set current project to sbt-sample-app (in build file:/home/user/Projects/sample-app/) 
[info] Running HelloWorld 
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 
17/06/01 10:09:10 INFO SparkContext: Running Spark version 2.1.1 
17/06/01 10:09:11 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 
17/06/01 10:09:11 WARN Utils: Your hostname, user-Vostro-15-3568 resolves to a loopback address: 127.0.1.1; using 127.0.0.1 instead (on interface enp3s0) 
17/06/01 10:09:11 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address 
17/06/01 10:09:11 INFO SecurityManager: Changing view acls to: user 
17/06/01 10:09:11 INFO SecurityManager: Changing modify acls to: user 
17/06/01 10:09:11 INFO SecurityManager: Changing view acls groups to: 
17/06/01 10:09:11 INFO SecurityManager: Changing modify acls groups to: 
17/06/01 10:09:11 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(user); groups with view permissions: Set(); users with modify permissions: Set(user); groups with modify permissions: Set() 
17/06/01 10:09:12 INFO Utils: Successfully started service 'sparkDriver' on port 39662. 
17/06/01 10:09:12 INFO SparkEnv: Registering MapOutputTracker 
17/06/01 10:09:12 INFO SparkEnv: Registering BlockManagerMaster 
17/06/01 10:09:12 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information 
17/06/01 10:09:12 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up 
17/06/01 10:09:12 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-c6db1535-6a00-4760-93dc-968722e3d596 
17/06/01 10:09:12 INFO MemoryStore: MemoryStore started with capacity 408.9 MB 
17/06/01 10:09:13 INFO SparkEnv: Registering OutputCommitCoordinator 
17/06/01 10:09:13 INFO Utils: Successfully started service 'SparkUI' on port 4040. 
17/06/01 10:09:13 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://127.0.0.1:4040 
17/06/01 10:09:13 INFO Executor: Starting executor ID driver on host localhost 
17/06/01 10:09:13 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 34488. 
17/06/01 10:09:13 INFO NettyBlockTransferService: Server created on 127.0.0.1:34488 
17/06/01 10:09:13 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy 
17/06/01 10:09:13 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 127.0.0.1, 34488, None) 
17/06/01 10:09:13 INFO BlockManagerMasterEndpoint: Registering block manager 127.0.0.1:34488 with 408.9 MB RAM, BlockManagerId(driver, 127.0.0.1, 34488, None) 
17/06/01 10:09:13 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 127.0.0.1, 34488, None) 
17/06/01 10:09:13 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 127.0.0.1, 34488, None) 
17/06/01 10:09:14 INFO SharedState: Warehouse path is 'file:/home/user/Projects/sample-app/spark-warehouse'. 
[error] (run-main-0) scala.ScalaReflectionException: class scala.Option in JavaMirror with ClasspathFilter(
[error] parent = URLClassLoader with NativeCopyLoader with RawResources(
[error] urls = List(/home/user/Projects/sample-app/target/scala-2.11/classes, ...,/home/user/.ivy2/cache/org.apache.parquet/parquet-jackson/jars/parquet-jackson-1.8.1.jar), 
[error] parent = [email protected], 
[error] resourceMap = Set(app.class.path, boot.class.path), 
[error] nativeTemp = /tmp/sbt_c2afce 
[error]) 
[error] root = [email protected] 
[error] cp = Set(/home/user/.ivy2/cache/org.glassfish.jersey.core/jersey-common/jars/jersey-common-2.22.2.jar, ..., /home/user/.ivy2/cache/net.razorvine/pyrolite/jars/pyrolite-4.13.jar) 
[error]) of type class sbt.classpath.ClasspathFilter with classpath [<unknown>] and parent being URLClassLoader with NativeCopyLoader with RawResources(
[error] urls = List(/home/user/Projects/sample-app/target/scala-2.11/classes, ..., /home/user/.ivy2/cache/org.apache.parquet/parquet-jackson/jars/parquet-jackson-1.8.1.jar), 
[error] parent = [email protected], 
[error] resourceMap = Set(app.class.path, boot.class.path), 
[error] nativeTemp = /tmp/sbt_c2afce 
[error]) of type class sbt.classpath.ClasspathUtilities$$anon$1 with classpath [file:/home/user/Projects/sample-app/target/scala-2.11/classes/,...openjdk-amd64/jre/lib/jfr.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/classes] not found. 
scala.ScalaReflectionException: class scala.Option in JavaMirror with ClasspathFilter(
    parent = URLClassLoader with NativeCopyLoader with RawResources(
    urls = List(/home/user/Projects/sample-app/target/scala-2.11/classes, ..., /home/user/.ivy2/cache/org.apache.parquet/parquet-jackson/jars/parquet-jackson-1.8.1.jar), 
    parent = [email protected], 
    resourceMap = Set(app.class.path, boot.class.path), 
    nativeTemp = /tmp/sbt_c2afce 
) 
    root = [email protected] 
    cp = Set(/home/user/.ivy2/cache/org.glassfish.jersey.core/jersey-common/jars/jersey-common-2.22.2.jar, ..., /home/user/.ivy2/cache/net.razorvine/pyrolite/jars/pyrolite-4.13.jar) 
) of type class sbt.classpath.ClasspathFilter with classpath [<unknown>] and parent being URLClassLoader with NativeCopyLoader with RawResources(
    urls = List(/home/user/Projects/sample-app/target/scala-2.11/classes, ..., /home/user/.ivy2/cache/org.apache.parquet/parquet-jackson/jars/parquet-jackson-1.8.1.jar), 
    parent = [email protected], 
    resourceMap = Set(app.class.path, boot.class.path), 
    nativeTemp = /tmp/sbt_c2afce 
) of type class sbt.classpath.ClasspathUtilities$$anon$1 with classpath [file:/home/user/Projects/sample-app/target/scala-2.11/classes/,.../jre/lib/charsets.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/jfr.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/classes] not found. 
    at scala.reflect.internal.Mirrors$RootsBase.staticClass(Mirrors.scala:123) 
    at scala.reflect.internal.Mirrors$RootsBase.staticClass(Mirrors.scala:22) 
    at org.apache.spark.sql.catalyst.ScalaReflection$$typecreator42$1.apply(ScalaReflection.scala:614) 
    at scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe$lzycompute(TypeTags.scala:232) 
    at scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe(TypeTags.scala:232) 
    at org.apache.spark.sql.catalyst.ScalaReflection$class.localTypeOf(ScalaReflection.scala:782) 
    at org.apache.spark.sql.catalyst.ScalaReflection$.localTypeOf(ScalaReflection.scala:39) 
    at org.apache.spark.sql.catalyst.ScalaReflection$.optionOfProductType(ScalaReflection.scala:614) 
    at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.apply(ExpressionEncoder.scala:51) 
    at org.apache.spark.sql.Encoders$.scalaInt(Encoders.scala:281) 
    at org.apache.spark.sql.SQLImplicits.newIntEncoder(SQLImplicits.scala:54) 
    at HelloWorld$.main(HelloWorld.scala:9) 
    at HelloWorld.main(HelloWorld.scala) 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:498) 
[trace] Stack trace suppressed: run last compile:run for the full output. 
17/06/01 10:09:15 ERROR ContextCleaner: Error in cleaning thread 
java.lang.InterruptedException 
    at java.lang.Object.wait(Native Method) 
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:143) 
    at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:181) 
    at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1245) 
    at org.apache.spark.ContextCleaner.org$apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:178) 
    at org.apache.spark.ContextCleaner$$anon$1.run(ContextCleaner.scala:73) 
17/06/01 10:09:15 ERROR Utils: uncaught error in thread SparkListenerBus, stopping SparkContext 
java.lang.InterruptedException 
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:998) 
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) 
    at java.util.concurrent.Semaphore.acquire(Semaphore.java:312) 
    at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(LiveListenerBus.scala:80) 
    at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:79) 
    at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:79) 
    at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58) 
    at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(LiveListenerBus.scala:78) 
    at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1245) 
    at org.apache.spark.scheduler.LiveListenerBus$$anon$1.run(LiveListenerBus.scala:77) 
17/06/01 10:09:15 ERROR Utils: throw uncaught fatal error in thread SparkListenerBus 
java.lang.InterruptedException 
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:998) 
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) 
    at java.util.concurrent.Semaphore.acquire(Semaphore.java:312) 
    at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(LiveListenerBus.scala:80) 
    at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:79) 
    at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:79) 
    at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58) 
    at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(LiveListenerBus.scala:78) 
    at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1245) 
    at org.apache.spark.scheduler.LiveListenerBus$$anon$1.run(LiveListenerBus.scala:77) 
17/06/01 10:09:15 INFO SparkUI: Stopped Spark web UI at http://127.0.0.1:4040 
java.lang.RuntimeException: Nonzero exit code: 1 
    at scala.sys.package$.error(package.scala:27) 
[trace] Stack trace suppressed: run last compile:run for the full output. 
[error] (compile:run) Nonzero exit code: 1 
[error] Total time: 7 s, completed 1 Jun, 2017 10:09:15 AM 

但是,当我在build.sbt添加fork in run := true该应用程序运行良好

build.sbt

name := """sbt-sample-app""" 

version := "1.0" 

scalaVersion := "2.11.7" 

libraryDependencies += "org.scalatest" %% "scalatest" % "2.2.6" % "test" 
libraryDependencies += "org.apache.spark" % "spark-sql_2.11" % "2.1.1" 

fork in run := true 

下面是输出:

$ sbt run 
[info] Loading global plugins from /home/user/.sbt/0.13/plugins 
[info] Loading project definition from /home/user/Projects/sample-app/project 
[info] Set current project to sbt-sample-app (in build file:/home/user/Projects/sample-app/) 
[success] Total time: 0 s, completed 1 Jun, 2017 10:15:43 AM 
[info] Updating {file:/home/user/Projects/sample-app/}sample-app... 
[info] Resolving jline#jline;2.12.1 ... 
[info] Done updating. 
[warn] Scala version was updated by one of library dependencies: 
[warn] * org.scala-lang:scala-library:(2.11.7, 2.11.0) -> 2.11.8 
[warn] To force scalaVersion, add the following: 
[warn] ivyScala := ivyScala.value map { _.copy(overrideScalaVersion = true) } 
[warn] Run 'evicted' to see detailed eviction warnings 
[info] Compiling 1 Scala source to /home/user/Projects/sample-app/target/scala-2.11/classes... 
[info] Running HelloWorld 
[error] Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 
[error] 17/06/01 10:16:13 INFO SparkContext: Running Spark version 2.1.1 
[error] 17/06/01 10:16:13 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 
[error] 17/06/01 10:16:14 WARN Utils: Your hostname, user-Vostro-15-3568 resolves to a loopback address: 127.0.1.1; using 127.0.0.1 instead (on interface enp3s0) 
[error] 17/06/01 10:16:14 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address 
[error] 17/06/01 10:16:14 INFO SecurityManager: Changing view acls to: user 
[error] 17/06/01 10:16:14 INFO SecurityManager: Changing modify acls to: user 
[error] 17/06/01 10:16:14 INFO SecurityManager: Changing view acls groups to: 
[error] 17/06/01 10:16:14 INFO SecurityManager: Changing modify acls groups to: 
[error] 17/06/01 10:16:14 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(user); groups with view permissions: Set(); users with modify permissions: Set(user); groups with modify permissions: Set() 
[error] 17/06/01 10:16:14 INFO Utils: Successfully started service 'sparkDriver' on port 37747. 
[error] 17/06/01 10:16:14 INFO SparkEnv: Registering MapOutputTracker 
[error] 17/06/01 10:16:14 INFO SparkEnv: Registering BlockManagerMaster 
[error] 17/06/01 10:16:14 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information 
[error] 17/06/01 10:16:14 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up 
[error] 17/06/01 10:16:14 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-edf40c39-a13e-4930-8e9a-64135bfa9770 
[error] 17/06/01 10:16:14 INFO MemoryStore: MemoryStore started with capacity 1405.2 MB 
[error] 17/06/01 10:16:14 INFO SparkEnv: Registering OutputCommitCoordinator 
[error] 17/06/01 10:16:14 INFO Utils: Successfully started service 'SparkUI' on port 4040. 
[error] 17/06/01 10:16:15 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://127.0.0.1:4040 
[error] 17/06/01 10:16:15 INFO Executor: Starting executor ID driver on host localhost 
[error] 17/06/01 10:16:15 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 39113. 
[error] 17/06/01 10:16:15 INFO NettyBlockTransferService: Server created on 127.0.0.1:39113 
[error] 17/06/01 10:16:15 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy 
[error] 17/06/01 10:16:15 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 127.0.0.1, 39113, None) 
[error] 17/06/01 10:16:15 INFO BlockManagerMasterEndpoint: Registering block manager 127.0.0.1:39113 with 1405.2 MB RAM, BlockManagerId(driver, 127.0.0.1, 39113, None) 
[error] 17/06/01 10:16:15 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 127.0.0.1, 39113, None) 
[error] 17/06/01 10:16:15 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 127.0.0.1, 39113, None) 
[error] 17/06/01 10:16:15 INFO SharedState: Warehouse path is 'file:/home/user/Projects/sample-app/spark-warehouse/'. 
[error] 17/06/01 10:16:18 INFO CodeGenerator: Code generated in 395.134683 ms 
[error] 17/06/01 10:16:19 INFO CodeGenerator: Code generated in 9.077969 ms 
[error] 17/06/01 10:16:19 INFO CodeGenerator: Code generated in 23.652705 ms 
[error] 17/06/01 10:16:19 INFO SparkContext: Starting job: foreach at HelloWorld.scala:10 
[error] 17/06/01 10:16:19 INFO DAGScheduler: Got job 0 (foreach at HelloWorld.scala:10) with 1 output partitions 
[error] 17/06/01 10:16:19 INFO DAGScheduler: Final stage: ResultStage 0 (foreach at HelloWorld.scala:10) 
[error] 17/06/01 10:16:19 INFO DAGScheduler: Parents of final stage: List() 
[error] 17/06/01 10:16:19 INFO DAGScheduler: Missing parents: List() 
[error] 17/06/01 10:16:19 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[3] at foreach at HelloWorld.scala:10), which has no missing parents 
[error] 17/06/01 10:16:20 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 6.3 KB, free 1405.2 MB) 
[error] 17/06/01 10:16:20 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 3.3 KB, free 1405.2 MB) 
[error] 17/06/01 10:16:20 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 127.0.0.1:39113 (size: 3.3 KB, free: 1405.2 MB) 
[error] 17/06/01 10:16:20 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:996 
[error] 17/06/01 10:16:20 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[3] at foreach at HelloWorld.scala:10) 
[error] 17/06/01 10:16:20 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks 
[error] 17/06/01 10:16:20 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, PROCESS_LOCAL, 6227 bytes) 
[error] 17/06/01 10:16:20 INFO Executor: Running task 0.0 in stage 0.0 (TID 0) 
[info] 2 
[info] 3 
[info] 4 
[error] 17/06/01 10:16:20 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 1231 bytes result sent to driver 
[error] 17/06/01 10:16:20 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 152 ms on localhost (executor driver) (1/1) 
[error] 17/06/01 10:16:20 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
[error] 17/06/01 10:16:20 INFO DAGScheduler: ResultStage 0 (foreach at HelloWorld.scala:10) finished in 0.181 s 
[error] 17/06/01 10:16:20 INFO DAGScheduler: Job 0 finished: foreach at HelloWorld.scala:10, took 0.596960 s 
[error] 17/06/01 10:16:20 INFO SparkContext: Invoking stop() from shutdown hook 
[error] 17/06/01 10:16:20 INFO SparkUI: Stopped Spark web UI at http://127.0.0.1:4040 
[error] 17/06/01 10:16:20 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 
[error] 17/06/01 10:16:20 INFO MemoryStore: MemoryStore cleared 
[error] 17/06/01 10:16:20 INFO BlockManager: BlockManager stopped 
[error] 17/06/01 10:16:20 INFO BlockManagerMaster: BlockManagerMaster stopped 
[error] 17/06/01 10:16:20 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 
[error] 17/06/01 10:16:20 INFO SparkContext: Successfully stopped SparkContext 
[error] 17/06/01 10:16:20 INFO ShutdownHookManager: Shutdown hook called 
[error] 17/06/01 10:16:20 INFO ShutdownHookManager: Deleting directory /tmp/spark-77d00e78-9f76-4ab2-bc40-0b99940661ac 
[success] Total time: 37 s, completed 1 Jun, 2017 10:16:20 AM 

谁能帮我理解这背后的原因是什么?

+1

w帽子版本的sbt你在用吗? 'sbt sbtVersion'将打印版本。 – marios

+0

@marios我正在使用sbt v0.13.13。 – himanshuIIITian

回答

6

从 “入门SBT斯卡拉” 通过诗体摘录Saxena

Why do we need to fork JVM?

When a user runs code using run or console commands, the code is run on the same virtual machine as SBT. In some cases, running of code may cause SBT to crash, such as a System.exit call or unterminated threads (for example, when running tests on code while simultaneously working on the code).

If a test causes the JVM to shut down, you would need to restart SBT. In order to avoid such scenarious, forking the JVM is important.

You do not need to fork the JVM to run your code if the code follows the constraints listed as follows, else it must be run in a forked JVM:

  • No threads are created or the program ends when user-created threads terminate on their own
  • System.exit is used to end the program and user-created threads terminate when interrupted
  • No deserialization is done or deserialization code ensures that the right class loader is used
+0

这是迄今为止对我的查询最好的解释。 – himanshuIIITian

+0

这工作解决我的问题与深刻的神秘错误:'[错误](run-main-0)scala.ScalaReflectionException:JavaMirror类中的类scala.Option与ClasspathFilter'。我*认为*由于这个解决方案有效,火花必须做三个子弹中的一个,但是我无法从答案中告诉哪一个或者为什么。奇怪的是,这种需要叉是没有在Spark Scala文档中提及,或者我错过了它。 – FrobberOfBits

+0

嘿@FrobberOfBits 您能否提供更多的上下文?导致问题的代码示例可能是?很难说明目前为止提供的信息可能是什么原因 – ZakukaZ

0

从给定的here

默认情况下,文档,运行任务在同一个JVM运行SBT。但是,在某些情况下需要分叉。或者,您可能希望在执行新任务时分叉Java进程。

默认情况下,分叉进程使用与构建以及当前进程的工作目录和JVM选项相同的Java和Scala版本。本页讨论如何为运行和测试任务启用和配置分叉。如下所述,每种任务都可以通过对相关键进行范围划分来单独配置。

,以使在运行叉只使用

fork in run := true 
+3

我不觉得这回答了这个问题。 –

+1

感谢您的快速响应!但我无法了解分叉与Spark的关系。我的意思是运行一个通常的scala应用程序时不需要分叉。 – himanshuIIITian

+0

SBT,Scala,Spark,Java。他们都一样。它们是在JVM下运行的字节码。在单个JVM进程中,类加载是共享的,sbt会做一些技巧以使共享相同类路径的不同版本成为可能。这个技巧并不总是奏效。单个JVM与分叉JVM有其他问题,可能会导致处理IO等问题。 – pedrofurla

0

我无法找到确切的原因:

但是这是他们构建文件与建议:

https://github.com/deanwampler/spark-scala-tutorial/blob/master/project/Build.scala

希望有人可以给出更好的答案。

编辑代码:

进口org.apache.spark.sql.SparkSession

object HelloWorld { 
def main(args: Array[String]): Unit = { 
    val spark = SparkSession.builder().master("local").appName("BigApple").getOrCreate() 

import spark.implicits._ 

val ds = Seq(1, 2, 3).toDS() 
ds.map(_ + 1).foreach(x => println(x)) 
} 
} 

build.sbt

name := """untitled""" 

version := "1.0" 

scalaVersion := "2.11.7" 

libraryDependencies += "org.scalatest" %% "scalatest" % "2.2.6" % "test" 
libraryDependencies += "org.apache.spark" % "spark-sql_2.11" % "2.1.1" 
+0

感谢这个例子!但是在上面提到的构建文件中,我发现唯一的评论是'//更好地在单独的JVM中运行示例和测试。 fork:= true,' – himanshuIIITian

+0

如果我不得不拍摄一张照片,我会假设这是因为火花在单独的线程中运行,并且可能会或可能不会像代码那样停止。所以当你完成后,你可以尝试spark.close吗?它现在应该工作。 –

+0

无叉工作,就是我的意思。你可否确认? –

相关问题