如何在“<!DOCTYPE>”之前删除多个UTF-8 BOM序列？

使用PHP5（cgi）从文件系统中输出模板文件，并遇到吐出原始HTML的问题。如何在“<!DOCTYPE>”之前删除多个UTF-8 BOM序列？

private function fetch($name) { 
    $path = $this->j->config['template_path'] . $name . '.html'; 
    if (!file_exists($path)) { 
     dbgerror('Could not find the template "' . $name . '" in ' . $path); 
    } 
    $f = fopen($path, 'r'); 
    $t = fread($f, filesize($path)); 
    fclose($f); 
    if (substr($t, 0, 3) == b'\xef\xbb\xbf') { 
     $t = substr($t, 3); 
    } 
    return $t; 
}

即使我已经添加了BOM修正，但我仍然遇到Firefox接受它的问题。你可以在这里看到一个活的副本：http://ircb.in/jisti/（和模板文件我扔在http://ircb.in/jisti/home.html如果你想看看）

任何想法如何解决这个问题？ o_o

来源

2012-04-24 sheppardzw

UTF8文件不应该有一个BOM，如果你的编辑把那些在，应该有省略这些配置中，如果你的编辑器将不允许你不放入BOM中，替换您的编辑器。 – 2012-04-24 02:11:14

是的。我使用n ++，如果使用n ++，我尝试不使用BOM – sheppardzw 2012-04-28 02:17:00

你可以使用下面的代码删除UTF8 BOM

//Remove UTF8 Bom 

function remove_utf8_bom($text) 
{ 
    $bom = pack('H*','EFBBBF'); 
    $text = preg_replace("/^$bom/", '', $text); 
    return $text; 
}

来源

2013-03-15 02:55:00 jasonhao

这个为我工作。 – 2013-10-11 15:56:09

尝试了很多解决方案，但这个工作。谢谢！ – nijlgier 2014-06-11 10:16:41

出于某种原因，在Google+ API中，此BOM显示在内容变量的末尾，所以我需要调整它以将其从字符串末尾移除。 – 2017-03-02 18:08:13

b'\xef\xbb\xbf'代表文字字符串“\ xef \ xbb \ xbf”。如果你想检查一个BOM，则需要用双引号，所以\x序列实际上是解释成字节：

"\xef\xbb\xbf"

您的文件似乎也包含了很多的垃圾不仅仅是一个单一的领导BOM：

$ curl http://ircb.in/jisti/ | xxd 

0000000: efbb bfef bbbf efbb bfef bbbf efbb bfef ................ 
0000010: bbbf efbb bf3c 2144 4f43 5459 5045 2068 .....<!DOCTYPE h 
0000020: 746d 6c3e 0a3c 6874 6d6c 3e0a 3c68 6561 tml>.<html>.<hea 
...

来源

2012-04-24 02:07:43 deceze

，为什么会这样呢？它将它保存为unix/utf8 -bom – sheppardzw 2012-04-28 02:17:28

将它另存为UTF-8 NO BOM（或者其他所谓的N ++）。 – deceze 2012-04-28 02:26:21

我做了，我仍然得到相同的结果。我curl'd直接文件（卷曲http://ircb.in/jisti/home.html | xxd）并没有得到主角，但curl'ing PHP脚本增加了额外的数据在前面，我所有使用打印输出数据。 – sheppardzw 2012-04-28 02:34:50

尝试：

// -------- read the file-content ---- 
$str = file_get_contents($source_file); 

// -------- remove the utf-8 BOM ---- 
$str = str_replace("\xEF\xBB\xBF",'',$str); 

// -------- get the Object from JSON ---- 
$obj = json_decode($str);

来源

2013-09-18 11:19:03 o1max

这个为我做了窍门，感谢发布这个解决方案！ – Blaater 2014-06-17 07:19:04

通常更容易。 :-) – Bondt 2015-07-24 08:49:44

的另一种方法，以除去BOM这对于UTF-8系统基础字符集Unicode代码点U + FEFF

$str = preg_replace('/\x{FEFF}/u', '', $file);

来源

2014-06-19 17:03:45

此全局funtion决心。坦克！

function prepareCharset($str) { 

    // set default encode 
    mb_internal_encoding('UTF-8'); 

    // pre filter 
    if (empty($str)) { 
     return $str; 
    } 

    // get charset 
    $charset = mb_detect_encoding($str, array('ISO-8859-1', 'UTF-8', 'ASCII')); 

    if (stristr($charset, 'utf') || stristr($charset, 'iso')) { 
     $str = iconv('ISO-8859-1', 'UTF-8//TRANSLIT', utf8_decode($str)); 
    } else { 
     $str = mb_convert_encoding($str, 'UTF-8', 'UTF-8'); 
    } 

    // remove BOM 
    $str = urldecode(str_replace("%C2%81", '', urlencode($str))); 

    // prepare string 
    return $str; 
}

来源

2016-06-22 15:13:22

一个额外的方法来完成同样的工作：

function remove_utf8_bom_head($text) { 
    if(substr(bin2hex($text), 0, 6) === 'efbbbf') { 
     $text = substr($text, 3); 
    } 
    return $text; 
}

我发现其他方法无法在我的情况下工作。

希望它在某些特殊情况下有所帮助。

来源

2016-11-07 04:53:40

如果你正在读使用file_get_contents一些API，并得到了json_decode一股莫名的NULL，检查json_last_error()值：有时file_get_contents返回的值将有一个多余的BOM，当你检查字符串几乎是无形的，但会使json_last_error()返回JSON_ERROR_SYNTAX（4）。

>>> $json = file_get_contents("http://api-guiaserv.seade.gov.br/v1/orgao/all"); 
=> "\t{"orgao":[{"Nome":"Tribunal de Justi\u00e7a","ID_Orgao":"59","Condicao":"1"}, ...]}" 
>>> json_decode($json); 
=> null 
>>>

在这种情况下，检查前3个字节 - 附和他们是不是非常有用，因为BOM是大多数设置隐形：

>>> substr($json, 0, 3) 
=> " " 
>>> substr($json, 0, 3) == pack('H*','EFBBBF'); 
=> true 
>>>

如果上述方法返回行真为你，那么简单的测试可能会解决问题：

>>> json_decode($json[0] == "{" ? $json : substr($json, 3)) 
=> {#204 
    +"orgao": [ 
     {#203 
     +"Nome": "Tribunal de Justiça", 
     +"ID_Orgao": "59", 
     +"Condicao": "1", 
     }, 
    ], 
    ... 
    }

来源

2017-07-12 17:14:29

这可能有所帮助。让我知道你是否关心我扩展我的思维过程。

<?php 
    // 
    // labled TESTINGSTRIPZ.php 
    // 

    define('CHARSET', 'UTF-8'); 

    $stringy = "\xef\xbb\xbf\"quoted text\" "; 
    $str_find_array = array("\xef\xbb\xbf"); 
    $str_replace_array = array(   ''); 


    $RESULT = 
     trim(
      mb_convert_encoding(

       str_replace(
        $str_find_array, 
        $str_replace_array, 
        strip_tags($stringy) 
        ), 

       'UTF-8', 

       mb_detect_encoding(
        strip_tags($stringy) 
        ) 

       ) 
      ); 

     print("YOUR RESULT IS: " . $RESULT.PHP_EOL); 

?>

结果：

terminal$ php TESTINGSTRIPZ.php 
     YOUR RESULT IS: "quoted text" // < with no hidden char.

来源

2017-12-19 18:11:48 JayRizzo

如何在“<!DOCTYPE>”之前删除多个UTF-8 BOM序列？

回答

相关问题