2012-02-13 106 views
0

我Solr模式如下(仅重要部分):使用dismax搜索多字索引项

<fieldType name="bagofwords_expertfinding" class="solr.TextField" positionIncrementGap="100"> 
    <analyzer type="index"> 
    <!-- remove letters repeated more than two times --> 
    <charFilter class="solr.HTMLStripCharFilterFactory"/> 
    <tokenizer class="solr.StandardTokenizerFactory"/> 
    <filter class="solr.StopFilterFactory" 
      ignoreCase="true" 
      words="stopwords_en.txt" 
      enablePositionIncrements="true" 
      /> 
    <filter class="solr.LowerCaseFilterFactory"/> 
    <filter class="solr.EnglishPossessiveFilterFactory"/> 
    <filter class="solr.PatternReplaceFilterFactory" pattern="^[0-9-/_,\.]+$" replacement="" replace="all"/> 
    <filter class="solr.PatternReplaceFilterFactory" pattern="^.*(([aA-zZ])\\2)\\2+.*$" replacement=""/> 
    <filter class="solr.PorterStemFilterFactory"/> 
    <filter class="solr.LengthFilterFactory" min="3" max="100"/> 
    </analyzer> 
    <analyzer type="query"> 
    <tokenizer class="solr.StandardTokenizerFactory"/> 
    <filter class="solr.StopFilterFactory" 
      ignoreCase="true" 
      words="stopwords_en.txt" 
      enablePositionIncrements="true" 
      /> 
    <filter class="solr.LowerCaseFilterFactory"/> 
    <filter class="solr.EnglishPossessiveFilterFactory"/> 
    <filter class="solr.PatternReplaceFilterFactory" pattern="^[0-9-/_,\.]+$" replacement="" replace="all"/> 
    <filter class="solr.PorterStemFilterFactory"/> 
    <filter class="solr.LengthFilterFactory" min="3" max="100"/> 
    </analyzer> 
</fieldType> 
<fieldType name="namedentities_expertfinding" class="solr.TextField" positionIncrementGap="100"> 
    <analyzer type="index"> 
    <!-- remove letters repeated more than two times --> 
    <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="\s," replacement=","/> 
    <charFilter class="solr.PatternReplaceCharFilterFactory" pattern=",\s" replacement=","/> 
    <tokenizer class="solr.PatternTokenizerFactory" pattern="," /> 
    <filter class="solr.LowerCaseFilterFactory"/> 
    </analyzer> 
    <analyzer type="query"> 
    <tokenizer class="solr.WhitespaceTokenizerFactory"/> 
    <filter class="solr.StopFilterFactory" 
      ignoreCase="true" 
      words="stopwords_en.txt" 
      enablePositionIncrements="true" 
      /> 
    <filter class="solr.LowerCaseFilterFactory"/> 
    <filter class="solr.EnglishPossessiveFilterFactory"/> 
    <filter class="solr.PatternReplaceFilterFactory" pattern="^[0-9-/_,\.]+$" replacement="" replace="all"/> 
    <filter class="solr.LengthFilterFactory" min="3" max="100"/> 
    </analyzer> 
</fieldType> 

在namedentities我索引多字词,如:“diego alberto milito”,“diego armando maradona”。我试图在两个领域进行搜索,以dismax查询来提升他们。

但与此查询尝试: 本地主机:8080/Solr的/选择/ Q = “马拉多纳” & DEFTYPE = dismax & QF = namedentities^100个bagofwords^1 & FL = *,得分& debugQuery =真& mm = 0

solr找不到任何东西。也许我不明白正确使用“象征

我不明白,也给这个从Solr的维基:

”在Solr的1.4和之前,您应该基本定毫米= 0,如果你想等同于q.op = OR,而mm = 100%,如果您想要q.op = AND的等价性。在3.x和trunk中,默认值mm由q.op参数决定(q.op = AND => mm = 100%; q.op = OR => mm = 0%)。请记住,缺省操作符受到schema.xml条目的影响。在较旧版本的Solr中,默认值为100%(所有子句必须匹配)“

并且假设在我的模式中defaultOperator是OR,为什么没有设置mm = 0,我获得的默认mm值为100.

提前感谢!

+0

解析查询的调试版本的输出也是有用的。我怀疑t由于您将字段标记为字母,因此您的精确搜索将不匹配 - 因为这两个条目都不是您将其用引号引起来搜索的字符串。 – MatsLindh 2012-02-13 21:46:17

+0

谢谢。我终于发现引号并不意味着完全匹配,而是寻找一个短语:连续的字符串,所以我改变了我的模式分析器。但是没有办法处理多词记号......所以我在单词索引中搜索短语 – Tywnil 2012-02-13 21:56:15

回答

0

有各地的查询字符串引号上述迫使短语查询,这意味着只有完全匹配的考虑。删除它们,用括号替换和实验与PF和PF2和PF3参数以增加更长的匹配短语