0
我尝试生成字符串n克PHP对于我使用此功能从:https://gist.github.com/Xeoncross/5366393PHP分割字符串的n-gram的Unicode字符问题
function Bigrams($word){
$ngrams = array();
$len = strlen($word);
for($i=0;$i+1<$len;$i++){
$ngrams[$i]=$word[$i].$word[$i+1];
}
return $ngrams;
}
$word = "abcdefg";
print_r(Bigrams($word));
那OK回报预期的n-gram:
[0] => ab
[1] => bc
[2] => cd
[3] => de
[4] => ef
[5] => fg
但对于某些Unicode字符不会返回预期:
例如:为$字= “洛里亚” 回报:
[0] => L�
[1] => ò
[2] => �r
[3] => ri
或为$字= “пожалуйста” 回报:
[0] => п
[1] => ��
[2] => о
[3] => ��
[4] => ж
[5] => ��
[6] => а
[7] => ��
[8] => л
不知道如何解决这个问题?