检查截断HTML实体SUBSTR

如果我有：检查截断HTML实体SUBSTR

$output = substr($str, 0, 3);

和$str具有值 'a ABCDE'。 $输出值的值为'à& ag'，'& agrave;`被切断。我希望输出的值为'ààb'。我试过mb_substr($str, 0, 3, 'UTF-8')同样的问题。使用html_entity_decode对$str给了我500个内部服务器错误。编辑：我注意到，500错误只发生在被截断的字符串部分是html实体的一部分时。

来源

2013-12-10 Aditya

如果您正在处理编码的html，则必须将其解码为纯文本，然后执行您的子字符串，然后重新编码。字符串函数不能期望处理html字符实体。 –

您需要使用正确的编码。 $str可能不是utf8。只有你知道编码。 PHP可以猜测，但不是很确定。

使用html_entity_decode()是要走的路。

，或者你有做自己算：

$str = 'Hello &amp; byeye!'; 

// mb_ shouldn't be necessary because all mb chars are html encoded 
$output = substr($str, 0, 8); 
var_dump($output); 
$cutoff = is_int($pos = strrpos($output, '&')) && strrpos($output, ';') < $pos; 
if ($cutoff) { 
    $output = substr($str, 0, 1+strpos($str, ';', strlen($output))); 
    var_dump($output); 
}

类似的东西。但html_entity_decode()更好，所以请打开error_reporting和display_errors，看看有什么不对。

来源

2013-12-10 18:50:38 Rudie

如果你想要它返回一个加重的字符，你必须将你的字符串转换为真正的UTF-8字符（或任何你喜欢的编码），而不是à等。 Php将所有这些人当作角色来对待，你无法通过substr将整个à识别为单个角色。

可以使用

// $str = '&agrave; &agravecde' 
html_entity_decode($str,ENT_COMPAT,'UTF-8'); 
// $str = 'à àcde'; 
$output = substr($str, 0, 3); 
// $output = 'à àc'

我知道你显然试图html_entity_decode，但我敢肯定的功能不被破坏。字符串中是否有字符已经以任何不同的编码进行了翻译？请回显html_entity_decode出现问题的字符串？

来源

2013-12-10 18:57:19

检查截断HTML实体SUBSTR

回答

相关问题