3
我们使用的是Apache 1.6星火,斯卡拉2.10.5时,SBT 0.13.9错误执行的Apache火花ML管道
当执行一个简单的管道:
def buildPipeline(): Pipeline = {
val tokenizer = new Tokenizer()
tokenizer.setInputCol("Summary")
tokenizer.setOutputCol("LemmatizedWords")
val hashingTF = new HashingTF()
hashingTF.setInputCol(tokenizer.getOutputCol)
hashingTF.setOutputCol("RawFeatures")
val pipeline = new Pipeline()
pipeline.setStages(Array(tokenizer, hashingTF))
pipeline
}
当执行ML管道拟合方法得到下面的错误。 任何关于可能发生的事情的意见都会有所帮助。
**java.lang.RuntimeException: error reading Scala signature of org.apache.spark.mllib.linalg.Vector: value linalg is not a package**
[error] org.apache.spark.ml.feature.HashingTF$$typecreator1$1.apply(HashingTF.scala:66)
[error] org.apache.spark.sql.catalyst.ScalaReflection$class.localTypeOf(ScalaReflection.scala:642)
[error] org.apache.spark.sql.catalyst.ScalaReflection$.localTypeOf(ScalaReflection.scala:30)
[error] org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:630)
[error] org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:30)
[error] org.apache.spark.sql.functions$.udf(functions.scala:2576)
[error] org.apache.spark.ml.feature.HashingTF.transform(HashingTF.scala:66)
[error] org.apache.spark.ml.PipelineModel$$anonfun$transform$1.apply(Pipeline.scala:297)
[error] org.apache.spark.ml.PipelineModel$$anonfun$transform$1.apply(Pipeline.scala:297)
[error] org.apache.spark.ml.PipelineModel.transform(Pipeline.scala:297)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
build.sbt
scalaVersion in ThisBuild := "2.10.5"
scalacOptions := Seq("-unchecked", "-deprecation", "-encoding", "utf8")
val sparkV = "1.6.0"
val sprayV = "1.3.2"
val specs2V = "2.3.11"
val slf4jV = "1.7.5"
val grizzledslf4jV = "1.0.2"
val akkaV = "2.3.14"
libraryDependencies in ThisBuild ++= { Seq(
("org.apache.spark" %% "spark-mllib" % sparkV) % Provided,
("org.apache.spark" %% "spark-core" % sparkV) % Provided,
"com.typesafe.akka" %% "akka-actor" % akkaV,
"io.spray" %% "spray-can" % sprayV,
"io.spray" %% "spray-routing" % sprayV,
"io.spray" %% "spray-json" % sprayV,
"io.spray" %% "spray-testkit" % "1.3.1" % "test",
"org.specs2" %% "specs2-core" % specs2V % "test",
"org.specs2" %% "specs2-mock" % specs2V % "test",
"org.specs2" %% "specs2-junit" % specs2V % "test",
"org.slf4j" % "slf4j-api" % slf4jV,
"org.clapper" %% "grizzled-slf4j" % grizzledslf4jV
) }
感谢您花时间研究此问题。添加spark-sql没有影响。另一方面,如果流水线运行在期货背景之外运行,问题似乎不会发生。任何想法为什么这可能是这种情况? – Krys
我不认为“未来”本身就是一个问题。更可能是执行上下文。你可以添加一些解释你如何使用它?一个MCVE也许? – zero323
你如何运行这个例子。这可能在'sbt console'里面吗? '提供'库不包括在内。 –