多字节字符串和环视奇怪错误

为什么下面的代码对于不同的多字符串行为不同？多字节字符串和环视奇怪错误

echo preg_replace('@(?=\pL)@u', '*', 'م'); // prints: '*م'  ✓ 
echo preg_replace('@(?=\pL)@u', '*', 'ض'); // prints: '*ض'  ✓ 
echo preg_replace('@(?=\pL)@u', '*', 'غ'); // prints: '*�*�' ✗ 
echo preg_replace('@(?=\pL)@u', '*', 'ص'); // prints: '*�*�' ✗

参见：http://3v4l.org/fvab1

来源

2013-02-18 PHPst

它不会识别'غ'字符。恕我直言，它看起来像一个PCRE库中的错误，但作为PHP，很难说你是否需要启用某些东西...... – 2013-02-18 17:09:07

这工作正常：echo preg_replace（'/(.+)/'，'* $ 1' ，'غ'）; – 2013-02-18 18:06:25

奇怪的是，它似乎在旧版本中工作：http://3v4l.org/0Pq36 – deceze 2013-02-18 20:15:25

您需要包括修改字母以及（Lm）。请参见下面的脚本遍历整个阿拉伯语的Unicode块：

<?php 
function uchar_2($dec) 
{ 
    $utf = chr(192 + (($dec - ($dec % 64))/64)); 
    $utf .= chr(128 + ($dec % 64)); 


    return $utf; 
} 

$issues = 0; 
$count = 0; 
for ($dec = 1536; $dec <= 1791; $dec++) { 
    $char = uchar_2($dec); 
    if (preg_replace('@^(?=\pLm)[email protected]', '*', $char) !== $char) { 
     printf("Issue with %s (%s)\n", $dec, $char); 
     $issues++; 
    } 
    $count++; 
} 

printf("Found %d issues in %d rows\n", $issues, $count);

随着出Lm，这将一半左右的字符失败。

来源

2013-02-18 20:22:06

在你的代码中，即使使用'@ ^（？= \ pL）$ @ u''也没有问题。但是如果你使用'@（？= \ pL）@u'，它会返回一些问题。在使用'\ pLm'的代码中显示所需的输出。但它也必须和'\ pL'一起工作。 – PHPst 2013-02-19 05:28:48

多字节字符串和环视奇怪错误

回答

相关问题