2015-04-12 11 views
-1

是否有可能使用nltk来改变跑步,帮助,厨师,寻找和快乐跑步,帮助,做饭,寻找和快乐等词语?有没有办法正确删除单词中的时态或复数?

+0

我用过的过滤器所产生的呼呼声(多见于https://pypi.python.org/pypi/Whoosh/2.6.0)的 – mpez0

+0

可能重复[Porter Stemming of fried](http://stackoverflow.com/questions/27659179/porter-stemming-of-fried) – alvas

回答

1

nltk中实现了一些干扰算法。它看起来像Lancaster干扰算法会为你工作。

>>> from nltk.stem.lancaster import LancasterStemmer 
>>> st = LancasterStemmer() 
>>> st.stem('happily') 
'happy' 
>>> st.stem('cooks') 
'cook' 
>>> st.stem('helping') 
'help' 
>>> st.stem('running') 
'run' 
>>> st.stem('finds') 
'find' 
2
>>> from nltk.stem import WordNetLemmatizer 
>>> wnl = WordNetLemmatizer() 
>>> ls = ['running', 'helping', 'cooks', 'finds'] 
>>> [wnl.lemmatize(i) for i in ls] 
['running', 'helping', u'cook', u'find'] 
>>> ls = [('running', 'v'), ('helping', 'v'), ('cooks', 'v'), ('finds','v')] 
>>> [wnl.lemmatize(word, pos) for word, pos in ls] 
[u'run', u'help', u'cook', u'find'] 
>>> ls = [('running', 'n'), ('helping', 'n'), ('cooks', 'n'), ('finds','n')] 
>>> [wnl.lemmatize(word, pos) for word, pos in ls] 
['running', 'helping', u'cook', u'find'] 

Porter Stemming of fried

相关问题