2017-08-02 27 views

回答

0

有没有办法直接做到这一点:Catboost目前不支持模型序列化。

但是,Catboost已经可以将模型转换为CoreML,并且有一个CoreML工具可以将模型序列化为类似JSON的文本。享受低保例如:

from sklearn import datasets 
iris = datasets.load_iris() 

import catboost 
# the shortest possible model specification 
cls = catboost.CatBoostClassifier(loss_function='MultiClass', iterations=1, depth=1) 
cls.fit(iris.data, iris.target) 

# save model to CoreML format 
cls.save_model(
    "iris.mlmodel", 
    format="coreml", 
    export_parameters={ 
     'prediction_type': 'probability' 
    } 
) 

# there is a CoreML tool for model serialization 
import coremltools 
model = coremltools.models.model.MLModel("iris.mlmodel") 
model.get_spec() 

你可能需要阅读coremltools documentation要充分认识这是什么代码打印,但你可以阅读这样的输出:"There is an ensemble of a single tree with 2 leaves - in the leaf 0, class 0 dominates, in the leaf 1 - classes 1 and 2. Go to the leaf 1, if feature 3 is larger than 0.8, otherwise go to leaf 0"

specificationVersion: 1 
description { 
    input { 
    name: "feature_3" 
    type { 
     doubleType { 
     } 
    } 
    } 
    output { 
    name: "prediction" 
    type { 
     multiArrayType { 
     shape: 3 
     dataType: DOUBLE 
     } 
    } 
    } 
    predictedFeatureName: "prediction" 
    predictedProbabilitiesName: "prediction" 
    metadata { 
    shortDescription: "Catboost model" 
    versionString: "1.0.0" 
    author: "Mr. Catboost Dumper" 
    } 
} 
treeEnsembleRegressor { 
    treeEnsemble { 
    nodes { 
     nodeBehavior: LeafNode 
     evaluationInfo { 
     evaluationValue: 0.05084745649058943 
     } 
     evaluationInfo { 
     evaluationIndex: 1 
     evaluationValue: -0.025423728245294732 
     } 
     evaluationInfo { 
     evaluationIndex: 2 
     evaluationValue: -0.025423728245294732 
     } 
    } 
    nodes { 
     nodeId: 1 
     nodeBehavior: LeafNode 
     evaluationInfo { 
     evaluationValue: -0.02752293516463098 
     } 
     evaluationInfo { 
     evaluationIndex: 1 
     evaluationValue: 0.01376146758231549 
     } 
     evaluationInfo { 
     evaluationIndex: 2 
     evaluationValue: 0.013761467582315471 
     } 
    } 
    nodes { 
     nodeId: 2 
     nodeBehavior: BranchOnValueGreaterThan 
     branchFeatureIndex: 3 
     branchFeatureValue: 0.800000011920929 
     trueChildNodeId: 1 
    } 
    numPredictionDimensions: 3 
    basePredictionValue: 0.0 
    basePredictionValue: 0.0 
    basePredictionValue: 0.0 
    } 
    postEvaluationTransform: Classification_SoftMax 
} 

有一个缺点这种方法:CoreML不支持Catboost使用分类功能的方式。因此,如果您想要序列化具有分类功能的模型,则需要在训练之前对其进行热编码。

0

如果切换到使用命令行程序,则可以使用--print-trees选项。它只显示正在训练的模型树。所以你不能为现有的模型获取树。