ValueError：使用GridSearch参数时估计器CountVectorizer的参数模型无效

我有一个使用两种类型的功能进行文本分类的sklearn管道：由CountVectorizer（）和TfidfTransformer（）（TfidfVectorizer（））生成的标准tfidf功能以及一些语言特征。我尝试将不同的ngrams范围传递给CountVectorizer（），然后使用GridSearh找到最好的n。ValueError：使用GridSearch参数时估计器CountVectorizer的参数模型无效

这里是我的代码：

text_clf = Pipeline([('union', FeatureUnion([ 
           ('tfidf', Pipeline([ 
             ('sents', GetItem(key='sent')), 
             ('vect', CountVectorizer()), 
             ('transform', TfidfTransformer()) 
           ])), 
           ('LF', Pipeline([ 
            ('features', GetItem(key='features')), 
            ('dict_vect', DictVectorizer()) 
           ]))], 
           transformer_weights={'LF': 0.6, 'tfidf': 0.8} 
           )), 
           ('clf', SGDClassifier()) 
        ]) 

parameters = [{'union__tfidf__vect__model__ngram_range': ((1, 1), (1, 2), (1, 3), (1, 4)), 
      'clf__alpha': (1e-2, 1e-3, 1e-4, 1e-5), 
      'clf__loss': ('hinge', 'log', 'modified_huber', 'squared_hinge', 'perceptron'), 
      'clf__penalty': ('none', 'l2', 'l1', 'elasticnet'), 
      'clf__n_iter': (3, 4, 5, 6, 7, 8, 9, 10)}] 

gs_clf = GridSearchCV(text_clf, parameters, cv=5, n_jobs=-1) 
gs_clf = gs_clf.fit(all_data, labels)

（我省略了一些似乎不相关的问题行）

但它抛出一个错误：

ValueError: Invalid parameter model for estimator CountVectorizer(analyzer=u'word', binary=False, charset=None, 
    charset_error=None, decode_error=u'strict', 
    dtype=<type 'numpy.int64'>, encoding=u'utf-8', input=u'content', 
    lowercase=True, max_df=1.0, max_features=None, min_df=1, 
    ngram_range=(1, 1), preprocessor=None, stop_words=None, 
    strip_accents=None, token_pattern=u'(?u)\\b\\w\\w+\\b', 
    tokenizer=None, vocabulary=None)

与TfidfVectorizer（）相同。 ('vect', CountVectorizer(ngram_range=(1,2)))

感谢：

而且一切，如果我在管道传递给ngram_range的矢量化直接工作正常！

来源

2015-10-06 Katya Stolpovskaya

错误是因为您有union__tfidf__vect__model__ngram_range，应该是union__tfidf__vect__ngram_range。注意它叫一声“模式”为无效PARAM：

ValueError: Invalid parameter model

此外，作为一个说明，我想用TfidfVectorizer将事情简单化。

来源

2015-10-06 15:53:29 David

谢谢大卫的帮助！我刚刚被这篇文章弄糊涂了：http://stackoverflow.com/questions/27810855/python-sklearn-how-to-pass-parameters-to-the-customize-modeltransformer-clas –

ValueError：使用GridSearch参数时估计器CountVectorizer的参数模型无效

回答

相关问题