2017-08-07 54 views
0

我想实现一个模糊搜索与Python飞快移动,但我不明白。我试图在NGRAMWORDS的帮助下进行模糊搜索。模糊搜索与Python飞快移动

这里是我的架构:

schema = Schema(id=ID(stored=True), 
       name=NGRAMWORDS(minsize=2, maxsize=4, stored=True, queryor=True), 
       street=NGRAMWORDS(minsize=2, maxsize=4, stored=True, queryor=True), 
       city=NGRAMWORDS(minsize=2, maxsize=4, stored=True, queryor=False)) 

索引,然后填写如下所述:

writer.add_document(id=unicode(row["id"]), name=unicode(row["name"]), street=unicode(row["street"]), city=unicode(row["city"])) 

不幸的是,当涉及到任何结果从索引中检索出搜索:

with self.index.searcher() as searcher: 
from whoosh.query import Term, Or, FuzzyTerm 
from whoosh.analysis import NgramWordAnalyzer 

ngramAnalyzer = NgramWordAnalyzer(minsize=2, maxsize=4) 
tokens = [token.text for token in ngramAnalyzer(unicode(name))] 
fetig = list() 
for t in tokens: 
tt = FuzzyTerm("name", unicode(t)) 
fetig.append(tt) 

myQuery = Or(fetig) 
res = searcher.search(myQuery, limit=10) 

当搜索“Ali”时我得到零回击:

<Top 0 Results for Or([FuzzyTerm('name', u'al', boost=1.000000, maxdist=1, prefixlength=1), FuzzyTerm('name', u'ali', boost=1.000000, maxdist=1, prefixlength=1), FuzzyTerm('name', u'li', boost=1.000000, maxdist=1, prefixlength=1)]) runtime=0.000411987304688> 

回答

0

现在就解决了。问题是已经存在的索引没有通过打开

index = open_dir("index", schema=self.schema) 

而是我创建了一个新的索引。

此外,在查询这是至关重要的使用期限,而不是FuzzyTerm为了得到合理的结果:

ngramAnalyzer = NgramWordAnalyzer(minsize=3, maxsize=6) 
tokens = [token.text for token in ngramAnalyzer(unicode(name))] 
fetig = list() 
for t in tokens: 
    tt = Term("name", unicode(t)) 
    fetig.append(tt) 

myQuery = Or(fetig) 
res = searcher.search(myQuery, limit=10) 

正如你可以看到我已经增加了NGRAMWORDS的MINSIZE到3而不是2.

谢谢你珍贵的工作,马特查普特。