与java的apache的火花决策树实现问题

我想实现使用java和apache火花1.0.0版本的决策树分类器的简单演示。我基于http://spark.apache.org/docs/1.0.0/mllib-decision-tree.html。到目前为止，我编写了下面列出的代码。与java的apache的火花决策树实现问题

与下面的代码行，我得到错误：

org.apache.spark.mllib.tree.impurity.Impurity impurity = new org.apache.spark.mllib.tree.impurity.Entropy();

类型不匹配：不能转换从熵的杂质。真奇怪，我，一边类熵实现杂质接口：

https://spark.apache.org/docs/1.0.0/api/java/org/apache/spark/mllib/tree/impurity/Entropy.html

我在找问题，为什么我不能做这个作业的答案吗？

package decisionTree; 

import java.util.regex.Pattern; 

import org.apache.spark.api.java.JavaRDD; 
import org.apache.spark.api.java.JavaSparkContext; 
import org.apache.spark.api.java.function.Function; 
import org.apache.spark.mllib.linalg.Vectors; 
import org.apache.spark.mllib.regression.LabeledPoint; 
import org.apache.spark.mllib.tree.DecisionTree; 
import org.apache.spark.mllib.tree.configuration.Algo; 
import org.apache.spark.mllib.tree.configuration.Strategy; 
import org.apache.spark.mllib.tree.impurity.Gini; 
import org.apache.spark.mllib.tree.impurity.Impurity; 

import scala.Enumeration.Value; 

public final class DecisionTreeDemo { 

    static class ParsePoint implements Function<String, LabeledPoint> { 
     private static final Pattern COMMA = Pattern.compile(","); 
     private static final Pattern SPACE = Pattern.compile(" "); 

     @Override 
     public LabeledPoint call(String line) { 
      String[] parts = COMMA.split(line); 
      double y = Double.parseDouble(parts[0]); 
      String[] tok = SPACE.split(parts[1]); 
      double[] x = new double[tok.length]; 
      for (int i = 0; i < tok.length; ++i) { 
       x[i] = Double.parseDouble(tok[i]); 
      } 
      return new LabeledPoint(y, Vectors.dense(x)); 
     } 
    } 

    public static void main(String[] args) throws Exception { 

     if (args.length < 1) { 
      System.err.println("Usage:DecisionTreeDemo <file>"); 
      System.exit(1); 
     } 

     JavaSparkContext ctx = new JavaSparkContext("local[4]", "Log Analizer", 
       System.getenv("SPARK_HOME"), 
       JavaSparkContext.jarOfClass(DecisionTreeDemo.class)); 

     JavaRDD<String> lines = ctx.textFile(args[0]); 
     JavaRDD<LabeledPoint> points = lines.map(new ParsePoint()).cache(); 

     int iterations = 100; 

     int maxBins = 2; 
     int maxMemory = 512; 
     int maxDepth = 1; 

     org.apache.spark.mllib.tree.impurity.Impurity impurity = new org.apache.spark.mllib.tree.impurity.Entropy(); 

     Strategy strategy = new Strategy(Algo.Classification(), impurity, maxDepth, 
       maxBins, null, null, maxMemory); 

     ctx.stop(); 
    } 
}

@samthebest如果删除杂质变量和更改为如下形式：改变为

Strategy strategy = new Strategy(Algo.Classification(), new org.apache.spark.mllib.tree.impurity.Entropy(), maxDepth, maxBins, null, null, maxMemory);

错误：构造熵（）是未定义的。

[编辑] 我发现，我认为方法的正确调用（https://issues.apache.org/jira/browse/SPARK-2197）：

Strategy strategy = new Strategy(Algo.Classification(), new Impurity() { 
@Override 
public double calculate(double arg0, double arg1, double arg2) 
{ return Gini.calculate(arg0, arg1, arg2); } 

@Override 
public double calculate(double arg0, double arg1) 
{ return Gini.calculate(arg0, arg1); } 

}, 5, 100, QuantileStrategy.Sort(), null, 256);

不幸的是我遇到的bug :(

来源

2014-06-28 caruso

奇数。尝试将它内联而不是分配给变量。毕竟你只使用一次变量。也真的推荐使用Scala而不是Java API，你可以用几行代码完成整个事情，阅读起来会更容易。 – samthebest

的错误2197

一个Java的解决方案现已上市，通过this pull request ：

Other improvements to Decision Trees for easy-of-use with Java: * impurity classes: Added instance() methods to help with Java interface. * Strategy: Added Java-friendly constructor --> Note: I removed quantileCalculationStrategy from the Java-friendly constructor since (a) it is a special class and (b) there is only 1 option currently. I suspect we will redo the API before the other options are included.

你可以看到一个完整的例子，这是使用intance（）方法基尼杂质here

Strategy strategy = new Strategy(Algo.Classification(), Gini.instance(), maxDepth, numClasses,maxBins, categoricalFeaturesInfo); 
DecisionTreeModel model = DecisionTree$.MODULE$.train(rdd.rdd(), strategy);

来源

2014-08-14 12:17:56 emecas

与java的apache的火花决策树实现问题

回答

相关问题