0
val conf = new SparkConf()
.setMaster("local[1]")
.setAppName("Small")
.set("spark.executor.memory", "2g")
val sc = new SparkContext(conf)
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._
val df = sc.parallelize(Array((1,30),(2,10),(3,20),(1,10)(2,30))).toDF("books","readers")
val results = df.join(
df.select($"books" as "r_books", $"readers" as "r_readers"),
$"readers" === $"r_readers" and $"books" < $"r_books"
)
.groupBy($"books", $"r_books")
.agg($"books", $"r_books", count($"readers"))
在SBT控制台开始与以下build.sbt:
name := "Small"
version := "1.0"
scalaVersion := "2.10.4"
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.3.1"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "1.3.1"
返回错误:
scala.reflect.internal.MissingRequirementError:类org.apache.spark.sql.catalyst.ScalaReflection在JavaMirror与java.net.URLClass Loader @ 13a9a4f9 ...
任何想法?
感谢,所以'count'的错误用法是'scala.reflect.internal.MissingRequirementError的原因:类org.apache.spark.sql.catalyst.ScalaReflection'错误?是吗? – zork
好吧,这个cou,ld应该是因为它似乎是这样说的,但它也可能是因为你没有数组,因为你留下了一个逗号,我只是重写你的代码,Idea不运行,如果它工作的话,请正确标记 – anquegi
在你的代码中'results'是一个'long'数字,不是我所需要的。我需要得到一个数据帧,其中每个记录是'book1,book2,cnt',cnt是book1和book2一起读取的次数。 – zork