删除使用正则表达式

我有以下脚本，它可以有多次出现在后删除使用正则表达式

<script type='text/javascript'> 
    if(typeof(jQuery)=="function"){(function($){$.fn.fitVids=function(){}})(jQuery)}; 
    customfunction('customfunction_div').setup(
    {"playlist":"customfunction\/jw6\/eM0MzdZ2.xml"} 
); 
</script>

我想删除使用的preg_replace或preg_replace_callback正则表达式，也可能的话要检查这些脚本的出现多行脚本如果customfunction_div在脚本中至少存在一次。请帮忙！

来源

2016-09-23 Sachin

尝试https://ideone.com/84eH8f，查看[正则表达式演示]（https://regex101.com/r/dS1xR7/2） –

您应该使用正确的工具（如DOMDocument）解析HTML，而不是依赖正则表达式。

这里是示出如何抓住内部含有字customfunction_div的script标签和删除它们的一个片段：

$html = "<html><head><script type='text/javascript'>\n if(typeof(jQuery)==\"function\"){(function(\$){\$.fn.fitVids=function(){}})(jQuery)};\n customfunction('cu').setup(\n {\"playlist\":\"customfunction\/jw6\/eM0MzdZ2.xml\"}\n);\n</script>\n\n<script type='text/javascript'>\n if(typeof(jQuery)==\"function\"){(function(\$){\$.fn.fitVids=function(){}})(jQuery)};\n customfunction('customfunction_div').setup(\n {\"playlist\":\"customfunction\/jw6\/eM0MzdZ2.xml\"}\n);\n</script></head><body>TEXT</body></html>"; 
$dom = new DOMDocument; 
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED|LIBXML_HTML_NODEFDTD); 
$xp = new DOMXPath($dom); 
$scripts = $xp->query('//script[contains(.,"customfunction_div")]'); 
foreach ($scripts as $script) { 
     $script->parentNode->removeChild($script); 
} 
echo $dom->saveHTML();

参见PHP demo

这里，//script[contains(.,"customfunction_div")]是抓住script标签XPath表达式其内容（.）包含customfunction_div。

如果你坚持一个正则表达式，则'~<script\b(?:(?!</?script[\s>]).)*customfunction_div.*?</script>~s'模式应该为你在大多数情况下工作（因为它会匹配任何<script开放的标签，那么任何序列没有开始<script或</script（见(?:(?!</?script[\s>]).)*），然后你所需的值，然后0+字符到第一个</script>），但请记住，正则表达式不是操作HTML的正确工具。只有当您损坏HTML时才将其用作备用。

来源

2016-09-23 10:25:33

删除使用正则表达式

回答

相关问题