使用PyMongo的Word边界RegEx搜索

我想进行词边界搜索。例如，假设您有以下条目：使用PyMongo的Word边界RegEx搜索

“厨师”。 “厨师”
“厨师”
“厨师”。 “厨师”
“厨师”
“厨师”。

并进行搜索以查找包含“cook”作为整体的条目。也就是说，只有第3，第4和第5个条目应该被返回。

在这种情况下，当我使用\b字边界语句时，由于自动转义它会以某种方式变形。

import re, pymongo 
# prepare pymongo 
collection.find({"entry": re.compile('\bcook\b').pattern})

当我打印查询字典里，\b变得\\b。

我的问题是如何使用PyMongo进行文字边界搜索？我能够在MongoDB shell中执行此操作，但在PyMongo中失败。

来源

2015-11-23 Muatik

我认为它需要'\\ \\ bcook B' –

是，'\ bcook \ B'变成'\\ \\ bcook B' – Muatik

试试['R'\ bcook \ b “']（http://stackoverflow.com/questions/2241600/python-regex-r-prefix）。 – Sam

而不是使用pattern属性产生str对象，请使用正则表达式模式对象。

cursor = db.your_collection.find({"field": re.compile(r'\bcook\b')}) 

for doc in cursor: 
    # your code

来源

2015-11-24 12:46:55

坦克，它适合我。你是对的。原因是str对象正在逃脱。 – Muatik

这需要一个“全文搜索”索引来匹配您的所有案例。没有简单的RegEx足够。

例如，您需要英语词干找到“厨师”&“厨师”。您的RegEx匹配空格或单词边界之间的整个字符串“cook”，而不是“厨师”或“烹饪”。

有许多“全文搜索”索引引擎。研究他们决定使用哪一个。 - ElasticSearch - Lucene - Sphinx

PyMongo，我假设，连接到MongoDB。最新版本内置全文索引。见下文。

MongDB 3.0具有这些索引：https://docs.mongodb.org/manual/core/index-text/

来源

2015-11-24 17:12:08 Andrew

所有这些测试病例是由在Python简单的重新表达处理。例如：

>>> a = "the cooks." 
>>> b = "cooks" 
>>> c = " cook." 
>>> d = "the cook is" 
>>> e = "cook." 
>>> tests = [a,b,c,d,e] 
>>> for test in tests: 
     rc = re.match("[^c]*(cook)[^s]", test) 
     if rc: 
       print ' Found: "%s" in "%s"' % (rc.group(1), test) 
     else: 
       print ' Search word NOT found in "%s"' % test 


    Search word NOT found in "the cooks." 
    Search word NOT found in "cooks" 
    Found: "cook" in " cook." 
    Found: "cook" in "the cook is" 
    Found: "cook" in "cook." 
>>>

来源

2015-11-28 19:16:48 user4851165

使用PyMongo的Word边界RegEx搜索

回答

相关问题