2014-01-09 25 views
1

我的搜索建议工作得很好,我喜欢即使原始关键字返回结果(如果我们的文档中有拼写错误)。但是,我经常会收到返回完全相同结果的建议。防爆。我搜索黄色薄荷罐头,我得到“你是不是指黄色薄荷罐头?”solr - 跳过与原始搜索返回相同文档的建议

有没有办法删除返回与原始术语相同结果的建议?

我使用Solr的4.6.0 下面是solrconfig.xml中

<searchComponent name="spellcheck" class="solr.SpellCheckComponent"> 
     <str name="queryAnalyzerFieldType">text_general</str> 
     <!-- a spellchecker built from a field of the main index --> 
     <lst name="spellchecker"> 
      <str name="name">default</str> 
      <str name="field">spell2</str> 
      <str name="classname">solr.DirectSolrSpellChecker</str> 
      <!-- the spellcheck distance measure used, the default is the internal levenshtein --> 
      <str name="distanceMeasure">internal</str> 
      <!-- minimum accuracy needed to be considered a valid spellcheck suggestion --> 
      <float name="accuracy">0.1</float> 
      <!-- the maximum #edits we consider when enumerating terms: can be 1 or 2 --> 
      <int name="maxEdits">2</int> 
      <!-- the minimum shared prefix when enumerating terms --> 
      <int name="minPrefix">0</int> <!-- if set to 1, must start with same letter --> 
      <!-- maximum number of inspections per result. --> 
      <int name="maxInspections">5</int> 
      <!-- minimum length of a query term to be considered for correction --> 
      <int name="minQueryLength">4</int> 
      <!-- maximum threshold of documents a query term can appear to be considered for correction --> 
      <float name="maxQueryFrequency">0.01</float> 
     </lst> 
     <!-- a spellchecker that can break or combine words. See "/spell" handler below for usage --> 
     <lst name="spellchecker"> 
      <str name="name">wordbreak</str> 
      <str name="classname">solr.WordBreakSolrSpellChecker</str> 
      <str name="field">spell2</str> 
      <str name="combineWords">true</str> 
      <str name="breakWords">true</str> 
      <int name="maxChanges">10</int> 
      <str name="buildOnCommit">true</str> 
      <int name="minBreakLength">3</int> 
     </lst> 
     </searchComponent> 

    <requestHandler name="/spell" class="solr.SearchHandler" startup="lazy"> 
     <lst name="defaults"> 
      <str name="echoParams">none</str> 
      <int name="rows">10</int> 
      <str name="df">contents</str> 
      <str name="defType">edismax</str> 
      <str name="spellcheck.dictionary">default</str> 
      <str name="spellcheck.dictionary">wordbreak</str> 
      <str name="spellcheck">on</str> 
      <str name="spellcheck.extendedResults">false</str>  
      <str name="spellcheck.count">10</str> 
      <str name="spellcheck.alternativeTermCount">25</str> 
      <str name="spellcheck.maxResultsForSuggest">25</str> 
      <str name="spellcheck.collate">true</str> 
      <str name="spellcheck.maxCollationTries">10</str> 
      <str name="spellcheck.maxCollations">5</str>   
      <str name="spellcheck.onlyMorePopular">false</str> 
      <str name="spellcheck.collateParam.defType">dismax</str> 
     </lst> 
     <arr name="last-components"> 
      <str>spellcheck</str> 
     </arr> 
     </requestHandler> 

的信息下面是从schema.xml中的信息

<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100"> 
     <analyzer type="index"> 
     <tokenizer class="solr.StandardTokenizerFactory"/> 
     <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" /> 
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> 
     <filter class="solr.LowerCaseFilterFactory"/> 
     </analyzer> 
     <analyzer type="query"> 
     <tokenizer class="solr.StandardTokenizerFactory"/> 
     <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" /> 
     <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> 
     <filter class="solr.LowerCaseFilterFactory"/> 
     </analyzer> 
    </fieldType> 

<field name="spell2" type="text_general" indexed="true" stored="false" required="false" multiValued="true" /> 

示例查询 - http://localhost:8985/solr/(collection)/spell?q=yellow%20buttermints 回报

<str name="collation">yellow (butter mints)</str> 
    <str name="collation">yellow buttermint</str> 

“黄色buttermints”和“黄色buttermint”返回相同的结果。

回答

0

我不认为有一个确定的方法来保证这一点。但是,这绝对应该帮助 -

  1. 都在查询和索引时添加此过滤器 - EnglishMinimalStemFilterFactory

    https://cwiki.apache.org/confluence/display/solr/Filter+Descriptions#FilterDescriptions-EnglishMinimalStemFilter

  2. 我不知道怎么会在这种情况下SynonymFilterFactory工作。你可以尝试它没有它太

+0

我很困惑。你的意思是什么过滤器?是否从你的答案中剥离了代码,也许?我感谢帮助! – jessieloo

+0

是的。我的错。我的意思是英文MinimalStemmer,它会干掉薄荷糖 - >薄荷等。 – varunthacker

+0

加入EnglishMinimalStemmer帮了很多忙。谢谢! – jessieloo