-1
是否有可能使用nltk来改变跑步,帮助,厨师,寻找和快乐跑步,帮助,做饭,寻找和快乐等词语?有没有办法正确删除单词中的时态或复数?
是否有可能使用nltk来改变跑步,帮助,厨师,寻找和快乐跑步,帮助,做饭,寻找和快乐等词语?有没有办法正确删除单词中的时态或复数?
在nltk
中实现了一些干扰算法。它看起来像Lancaster
干扰算法会为你工作。
>>> from nltk.stem.lancaster import LancasterStemmer
>>> st = LancasterStemmer()
>>> st.stem('happily')
'happy'
>>> st.stem('cooks')
'cook'
>>> st.stem('helping')
'help'
>>> st.stem('running')
'run'
>>> st.stem('finds')
'find'
>>> from nltk.stem import WordNetLemmatizer
>>> wnl = WordNetLemmatizer()
>>> ls = ['running', 'helping', 'cooks', 'finds']
>>> [wnl.lemmatize(i) for i in ls]
['running', 'helping', u'cook', u'find']
>>> ls = [('running', 'v'), ('helping', 'v'), ('cooks', 'v'), ('finds','v')]
>>> [wnl.lemmatize(word, pos) for word, pos in ls]
[u'run', u'help', u'cook', u'find']
>>> ls = [('running', 'n'), ('helping', 'n'), ('cooks', 'n'), ('finds','n')]
>>> [wnl.lemmatize(word, pos) for word, pos in ls]
['running', 'helping', u'cook', u'find']
我用过的过滤器所产生的呼呼声(多见于https://pypi.python.org/pypi/Whoosh/2.6.0)的 – mpez0
可能重复[Porter Stemming of fried](http://stackoverflow.com/questions/27659179/porter-stemming-of-fried) – alvas