sklearn：使用CountVectorizer对象来获取新字符串的特征向量

所以我通过执行以下几行创建一个CountVectorizer对象。sklearn：使用CountVectorizer对象来获取新字符串的特征向量

count_vectorizer = CountVectorizer(binary='true') 
data = count_vectorizer.fit_transform(data)

现在我有一个新的字符串，我想将这个字符串映射到我从CountVectorizer获得的TDM矩阵。所以，我期待我输入到TDM的字符串是一个相应的文档术语向量。

我试过，

count_vectorizer.transform([string])

给了一个错误，AttributeError的：变换找不到添加堆栈跟踪，它是一个很长的堆栈跟踪，因此我只需添加相关的位，而这最后几行的AA部分的痕迹。

File "/Users/ankit/Desktop/geny/APIServer/RUNTIME/src/controller/sentiment/Sentiment.py", line 29, in computeSentiment 
    vec = self.models[model_name]["vectorizer"].transform([string]) 
    File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/scipy/sparse/base.py", line 440, in __getattr__ 
    raise AttributeError(attr + " not found")

请指教。

感谢

ANKIT小号

来源

2015-06-07 Ankit Solanki

你表现是不可再生的例子 - 什么是字符串变量在这里？但是下面的代码似乎完美地工作： -

from sklearn.feature_extraction.text import CountVectorizer 

data = ["aa bb cc", "cc dd ee"] 
count_vectorizer = CountVectorizer(binary='true') 
data = count_vectorizer.fit_transform(data) 

# Check if your vocabulary is being built perfectly 
print count_vectorizer.vocabulary_ 

# Trying a couple new string with added new word. new word should be ignored 
newData = count_vectorizer.transform(["aa dd mm", "aa bb"]) 
print newData 

# You can get the array by writing 
print newData.toarray()

enter image description here

好，count_vectorizer.transform（）接受字符串列表 - 不是一个单一的字符串。如果转换装置不起作用，它应该提出“ValueError：词汇不适合或者是空的！”如果出现这种错误，请粘贴整个追溯堆栈（异常堆栈）。没有人可以看到AttributeError来自哪里 - 您的代码或sklearn中的一些内部错误。

来源

2015-06-08 10:41:31 Aditya

Aditya，感谢您的评论。根据您的建议添加堆栈跟踪。这里的矢量化器使用Pickle写入一个文件，然后我再次加载它。我想知道是否会导致错误。我检查一样会保持你的发布。它奇怪的如何简单的函数调用不能工作！ –

sklearn：使用CountVectorizer对象来获取新字符串的特征向量

回答

相关问题