2011-12-14 43 views
3

我正在使用Lucene.net来索引产品目录。我使用ANTS Profiler来分析我的搜索结果,我注意到使用MultiFieldQueryParser创建和解析查询的行为几乎与实际搜索(约100ms)一样长。然后我尝试手动创建查询,这发生得非常快(约1ms)。我宁愿不必手动解析,虽然它确实给了我相同的结果集,但我担心我可能无法处理某些用例或输入(尽管输入来自网站上的文本搜索,用户将不知道关于Lucene搜索语法的任何内容)。我的代码(这两种方法)如下:为什么MultiFieldQueryParser比手工创建查询慢得多?

 IApplicationSettings settings = new ApplicationSettingService(); 
     FSDirectory directory = FSDirectory.Open(new DirectoryInfo(settings.GetSetting<string>("LuceneMainSearchDirectory"))); 
     RAMDirectory ramDir = new RAMDirectory(directory); 
     _Searcher = new IndexSearcher(ramDir, true);   
     string[] searchFields = new string[] { "ProductName", "ProductLongDescription", "BrandName", "CategoryName" }; 

     //Add a wildcard character to end of search to give broader results 
     if (!searchTerm.EndsWith(" ")) { searchTerm = searchTerm + "*"; } 


     //Use query parser...this block typically takes about 100ms on my machine, roughly 40% on the constructor and 60% on the call to Parse 
     MultiFieldQueryParser multiParser = new MultiFieldQueryParser(Lucene.Net.Util.Version.LUCENE_29, searchFields, _analyzer); 
     multiParser.SetDefaultOperator(QueryParser.AND_OPERATOR); 
     Query query = multiParser.Parse(searchTerm); 



     //Manually create query....this block doesn't even take 1ms on my machine 
     BooleanQuery booleanQuery = new BooleanQuery(true); 
     var terms = searchTerm.Split(' '); 
     foreach (string s in terms) 
     { 
      BooleanQuery subQuery = new BooleanQuery(true); 
      if (!s.EndsWith("*")) 
      { 
       Query query1 = new TermQuery(new Term("ProductName", s)); 
       Query query2 = new TermQuery(new Term("ProductLongDescription", s)); 
       Query query3 = new TermQuery(new Term("BrandName", s)); 
       Query query4 = new TermQuery(new Term("CategoryName", s)); 
       subQuery.Add(query1, BooleanClause.Occur.SHOULD); 
       subQuery.Add(query2, BooleanClause.Occur.SHOULD); 
       subQuery.Add(query3, BooleanClause.Occur.SHOULD); 
       subQuery.Add(query4, BooleanClause.Occur.SHOULD); 
      } 
      else 
      { 
       Query query1 = new WildcardQuery(new Term("ProductName", s)); 
       Query query2 = new WildcardQuery(new Term("ProductLongDescription", s)); 
       Query query3 = new WildcardQuery(new Term("BrandName", s)); 
       Query query4 = new WildcardQuery(new Term("CategoryName", s)); 
       subQuery.Add(query1, BooleanClause.Occur.SHOULD); 
       subQuery.Add(query2, BooleanClause.Occur.SHOULD); 
       subQuery.Add(query3, BooleanClause.Occur.SHOULD); 
       subQuery.Add(query4, BooleanClause.Occur.SHOULD); 
      } 
      booleanQuery.Add(subQuery, BooleanClause.Occur.MUST); 
     } 


    //Run the search....results are the same for simple multiword text queries 
     var result2 = _Searcher.Search(booleanQuery, null, maxResults); 
     var result = _Searcher.Search(query, null, maxResults); 

一种选择使用手动查询构建可能是共享MultiFieldQueryParser救了我,但我猜想它的解析方法不是线程安全的(虽然我只看过那关于Java版本...请纠正我,如果我在这个假设是错误的)。

我做错了什么或者这只是野兽的性质?

回答

5

MultiFieldQueryParser只是在场景下使用多个常规QueryParser,它会为每个要查询的字段创建一个。

创建QueryParser的成本比手动创建Query的成本更高是正常的。

它可以处理记录在这里的复杂查询synthax:Apache Lucene - Query Parser Syntax

它也将处理使用Analyzer您指定的搜索查询。如果在索引时使用Analyzer,则必须在搜索代码中使用相同的Analyzer /逻辑。如果你不这样做,你最终会失去结果。

如果您使用空白分析器进行索引,那么您的代码来手动构建布尔查询是好的。

相关问题