2013-05-08 156 views
0

我有一个使用SOLR数据库的PHP应用程序。 问题出现时,我做了/项请求(terms docSolr搜索条件

所以文档的自己感兴趣的部分我们是

poi: "Bistriţa", 
... 
text: [ 
"ddt", 
"Numeric", 
"/14/Gagaga 2/11/Economics/17/datenow", 
"/20/Daniel_same/11/Economics/17/datenow", 
"0/Gagaga 2", 
"1/Gagaga 2/Economics", 
"2/Gagaga 2/Economics/datenow", 
"0/Daniel_same", 
"1/Daniel_same/Economics", 
"2/Daniel_same/Economics/datenow", 
"ppla", 
"seat of a first-order administrative division", 
"/19/Daniel_same/1071/Plurinational State of Bolivia/2269/Cuba/2272/Bistriţa", 
"0/Daniel_same", 
"1/Daniel_same/Plurinational State of Bolivia", 
"2/Daniel_same/Plurinational State of Bolivia/Cuba", 
"3/Daniel_same/Plurinational State of Bolivia/Cuba/Bistriţa", 
"0/Undefined_activity", 
"Year", 
"0/1999", 
"0/1999", 
"Measured", 
"", 
"utf8" 
], 

,请求将

http://localhost:8080/solr/terms 
?wt=json 
&indent=true 
&terms.sort=count 
&terms.mincount=1 
&terms.limit=10 
&terms.regex.flag=case_insensitive 
&terms.regex=.*bi.* 
&terms.fl=text 

的响应

{ 
    responseHeader: { 
     status: 0, 
     QTime: 4 
    }, 
    terms: { 
     text: [ 
      "bistriå", 
      16 
     ] 
    } 
} 

结果的问题是结果文本被截断。我期待的是“BistriÅ£a”,这是一个城市Bistrița的UTF-8编码。 但结果似乎被截断在特殊字符。

奇怪的是,如果我做的字段名称,而不是“文本”,“兴趣点”的要求,我得到一个正确的响应

http://localhost:8080/solr/terms 
?wt=json 
&indent=true 
&terms.sort=count 
&terms.mincount=1 
&terms.limit=10 
&terms.regex.flag=case_insensitive 
&terms.regex=.*bi.* 
&terms.fl=poi 

{ 
    responseHeader: { 
     status: 0, 
     QTime: 4 
    }, 
    terms: { 
     text: [ 
      "Bistriţa", 
      16 
     ] 
    } 
} 

所以这个词不被截断。

2字段之间的最大区别在于类型。 Poi有一个字符串类型和文本有一个text_general类型。 text_general类型在此模式中定义如下

<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100"> 
    <analyzer type="index"> 
    <tokenizer class="solr.StandardTokenizerFactory"/> 
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" /> 
    <!-- in this example, we will only use synonyms at query time 
    <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/> 
    --> 
    <filter class="solr.LowerCaseFilterFactory"/> 
    </analyzer> 
    <analyzer type="query"> 
    <tokenizer class="solr.StandardTokenizerFactory"/> 
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" /> 
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> 
    <filter class="solr.LowerCaseFilterFactory"/> 
    </analyzer> 
</fieldType> 

如果提问,我可以提供更多详细信息。不知道我现在可以添加什么,而不是太多地膨胀这个问题。

回答