2017-12-03 125 views
1

我想出来的火花简单NGRAM例如

https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/ml/JavaNGramExample.java

这是我的POM依赖

<dependencies> 
    <dependency> 
     <groupId>org.apache.spark</groupId> 
     <artifactId>spark-core_2.11</artifactId> 
     <version>2.2.0</version> 
    </dependency> 
    <dependency> 
     <groupId>org.apache.spark</groupId> 
     <artifactId>spark-mllib_2.10</artifactId> 
     <version>2.2.0</version> 
    </dependency> 
</dependencies> 

下面的示例代码

public class App { 
    public static void main(String[] args) { 
     System.out.println("Hello World!"); 

     System.setProperty("hadoop.home.dir", "D:\\del"); 

     SparkSession spark = SparkSession 
        .builder() 
        .appName("JavaNGramExample").config("spark.master", "local") 
        .getOrCreate(); 


     List<Row> data = Arrays.asList(RowFactory.create(0, Arrays.asList("car", "killed", "cat")), 
        RowFactory.create(1, Arrays.asList("train", "killed", "cat")), 
        RowFactory.create(2, Arrays.asList("john", "plays", "cricket")), 
        RowFactory.create(3, Arrays.asList("tom", "likes", "mangoes"))); 


     StructType schema = new StructType(new StructField[] { 
       new StructField("id", DataTypes.IntegerType, false, Metadata.empty()), 
       new StructField("words", DataTypes.createArrayType(DataTypes.StringType), false, Metadata.empty()) }); 

     Dataset<Row> wordDataFrame = spark.createDataFrame(data, schema); 

     NGram ngramTransformer = new NGram().setN(2).setInputCol("words").setOutputCol("ngrams"); 

     Dataset<Row> ngramDataFrame = ngramTransformer.transform(wordDataFrame); 
     System.out.println(" DISPLAY NGRAMS "); 
     ngramDataFrame.select("ngrams").show(false); 


    } 
} 

我提示以下错误:当我运行这个代码。

Exception in thread "main" java.lang.NoClassDefFoundError: scala/collection/GenTraversableOnce$class 
    at org.apache.spark.sql.types.StructType.<init>(StructType.scala:98) 
    at com.mypackage.spark.learnspark.App.main(App.java:61) 
Caused by: java.lang.ClassNotFoundException: scala.collection.GenTraversableOnce$class 
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381) 
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424) 
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) 
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357) 
    ... 2 more 

我查阶的依赖,这是斯卡拉库-2.11.8

是否有火花2.2.0和我的斯卡拉罐子有不一致之处?

回答

1

TL;博士变化spark-mllib_2.10spark-mllib_2.11到这样的Scala 2.11.8用于火花MLlib依赖性(以及可选地去除spark-core_2.11依赖性)。


请参阅pom.xml

<dependencies> 
    <dependency> 
     <groupId>org.apache.spark</groupId> 
     <artifactId>spark-core_2.11</artifactId> 
     <version>2.2.0</version> 
    </dependency> 
    <dependency> 
     <groupId>org.apache.spark</groupId> 
     <artifactId>spark-mllib_2.10</artifactId> 
     <version>2.2.0</version> 
    </dependency> 
</dependencies> 
  1. spark-core_2.11从星火2.2.0取决于斯卡拉2.11.8,这就是确定。

  2. spark-mllib_2.10来自Spark 2.2.0取决于两个不同和不兼容的斯卡拉版本2.10.x2.11.8。这是问题的根源。

确保使用:

  1. 同样的后缀为你的星火依赖artifactId,即spark-core_2.11spark-mllib_2.11(注意,我把它改成2.11)。

  2. 在每个Spark依赖项中都有相同的version

相关问题