我失去我的脑海里对这个对于现在看两天......Sphinx搜索:字符集表的困难
我想在狮身人面像搜索中使用字母斯洛文尼亚语,英语所有的人+ CžS(以防万一C)
我一直在寻找所有过网来获取正确的字符,但我发现蹲下......
,所以我决定让我自己一步一步...
这是我的索引
index classifieds
{
source = classifieds_src
path = c:\Sphinx\data\classifieds
docinfo = extern
min_infix_len = 2
infix_fields = title,keywords,summary,text
expand_keywords = 1
enable_star = 1
charset_type = utf-8
charset_table = 0..9, a..z, _, A..Z->a..z,-, U+002C, \
U+010C->U+010D, U+0106->U+0107, U+0160->U+0161, U+017D->U+017E, \
U+010D->c,U+0107->c, U+0161->s, U+017E->z, \
U+010D, U+0107, U+0161, U+017E
}
其中I映射大C,(C S)Z到他们的小写对应,并加入映射从 č到C,C到C,S为s和z割成Z 最后我加入这四个字符表....
这些都是我的公告标题:
T1:HP USBoptičnamiškaZA prenosnik RH304 T2:ČiškaPCplus MO-U033 + F2(optična,brezžična,PS/2) T3:miška LogitechoptičnaNano M235 siva
db encodi NG:utf8_general_ci 表的编码:utf8_general_ci 标题字段编码:utf8_general_ci
测试用例:
$testcase = array(
"miška",
"mi*ka",
"Čiška",
"čiška",
"miska",
"usb prenosnik",
"prenosnik miska",
"miška usb"
);
//api settings:
$this->sphinx->SetArrayResult(true);
$this->sphinx->setLimits(0, 100);
$this->sphinx->setMatchMode(SPH_MATCH_EXTENDED2);
$this->sphinx->SetSortMode(SPH_SORT_RELEVANCE, '@weight DESC');
$this->sphinx->SetRankingMode(SPH_RANK_PROXIMITY_BM25);
$this->sphinx->SetFieldWeights(array("title"=>100, "keywords"=>80, "summary"=>60,
"text"=>20, "slug"=>10));
和最后测试结果:
关键字(总/ total_found) 词语
miška (0/0)
Array
(
[*miška*] => Array
(
[docs] => 0
[hits] => 0
)
[miška] => Array
(
[docs] => 0
[hits] => 0
)
)
mi*ka (0/0)
Array
(
[*mi*] => Array
(
[docs] => 3
[hits] => 4
)
[mi] => Array
(
[docs] => 1
[hits] => 1
)
[*2aka*] => Array
(
[docs] => 0
[hits] => 0
)
[2aka] => Array
(
[docs] => 0
[hits] => 0
)
)
Čiška (0/0)
Array
(
[*čiška*] => Array
(
[docs] => 0
[hits] => 0
)
[čiška] => Array
(
[docs] => 0
[hits] => 0
)
)
čiška (0/0)
Array
(
[*čiška*] => Array
(
[docs] => 0
[hits] => 0
)
[čiška] => Array
(
[docs] => 0
[hits] => 0
)
)
miska (0/0)
Array
(
[*miska*] => Array
(
[docs] => 0
[hits] => 0
)
[miska] => Array
(
[docs] => 0
[hits] => 0
)
)
usb prenosnik (1/1)
Array
(
[*usb*] => Array
(
[docs] => 1
[hits] => 1
)
[usb] => Array
(
[docs] => 1
[hits] => 1
)
[*prenosnik*] => Array
(
[docs] => 1
[hits] => 1
)
[prenosnik] => Array
(
[docs] => 1
[hits] => 1
)
)
prenosnik miska (0/0)
Array
(
[*prenosnik*] => Array
(
[docs] => 1
[hits] => 1
)
[prenosnik] => Array
(
[docs] => 1
[hits] => 1
)
[*miska*] => Array
(
[docs] => 0
[hits] => 0
)
[miska] => Array
(
[docs] => 0
[hits] => 0
)
)
miška usb (0/0)
Array
(
[*miška*] => Array
(
[docs] => 0
[hits] => 0
)
[miška] => Array
(
[docs] => 0
[hits] => 0
)
[*usb*] => Array
(
[docs] => 1
[hits] => 1
)
[usb] => Array
(
[docs] => 1
[hits] => 1
)
)
你可以清楚地看到,只有在queri中我才能得到积极的结果斯洛文尼亚没有特殊字符
请ES,请帮助我失去我的脑海里对这个
是的,我做的..没有差异 –
OMG!我做的! [发现这里答案] [1] [1]:http://ryaneby.com/2009/11/21/unicode-and-sphinx.html 我需要添加 sql_query_pre = SET CHARACTER_SET_RESULTS = UTF8 sql_query_pre = SET NAMES UTF8 到我的源定义......显然DB没有被默认连接槽UTF8! WOOO HOOOO –
我会的,但它不会让我:S 100的声誉需要...... 请自行张贴,我; 11确认 –