PHP的正则表达式和preg_replace问题

我正在浏览别人的旧代码，并有一些麻烦理解它。PHP的正则表达式和preg_replace问题

他：

explode(' ', strtolower(preg_replace('/[^a-z0-9-]+/i', ' ', preg_replace('/\&#?[a-z0-9]{2,4}\;/', ' ', preg_replace('/<[^>]+>/', ' ', $texts)))));

我认为第一个正则表达式排除a-z和0-9，我不知道第二个正则表达式做什么，但。第三个匹配'< >'里面任何东西，除了'>'

结果将输出，并在$texts变量的每一个字的阵列，但是，我只是不知道如何代码产生这样。我明白了什么preg_replace等功能做什么，只是不知道如何处理工作

来源

2013-03-19 FlyingCat

这许多嵌套的preg_replace电话仅仅是将导致混乱 – Scuzzy 2013-03-19 23:30:51

它分解成三个独立的语句，使用临时变量的处理顺序。然后它变得更容易遵循。 – mario 2013-03-19 23:31:15

表达/[^a-z0-9-]+/i将匹配（并随后与空白代替）的任何字符除了 A-Z和0-9。 ^ in [^...]表示否定其中包含的字符集。

[^a-z0-9]任何非字母数字字符
+指一种或多种的前述
/i使得它匹配不区分大小写

表达/\&#?[a-z0-9]{2,4}\;/匹配&随后任选地匹配#，后面是两到四个字母和数字，以结尾这将match HTML entities like 或'

&#?比赛要么因为?&或&#，使前#可选&实际上并不需要逃跑。
[a-z0-9]{2,4}两个和四个字母数字字符匹配
;是文字分号。它实际上并不需要转义。

部分是因为你怀疑，最后一个将取代像<tagname>或<tagname attr='value'>或</tagname>任何代码与一个空的空间。请注意，它与整个标签相匹配，而不仅仅是<>的内部内容。

<是文字字符
[^>]+是每个字符直到但不包括下一个>
>是文字字符

我真的建议重写这三个单独的呼叫到preg_replace()而不是嵌套它们。

// Strips tags. 
// Would be better done with strip_tags()!! 
$texts = preg_replace('/<[^>]+>/', ' ', $texts); 
// Removes HTML entities 
$texts = preg_replace('/&#?[a-z0-9]{2,4};/', ' ', $texts); 
// Removes remainin non-alphanumerics 
$texts = preg_replace('/[^a-z0-9-]+/i', ' ', $texts); 
$array = explode(' ', $texts);

来源

2013-03-19 23:30:57

...匹配一个'＆'，后面可以跟'＃'？ – 2013-03-19 23:32:43

@JanTuroň已经被claraified。 – 2013-03-19 23:33:16

这段代码看起来像它...

条HTML/XML标签
那么任何与&或&＃开始，为2-4（任何<和>之间）字符长（字母数字）
然后剥离任何非字母数字或破折号的东西

在嵌套

/<[^>]+>/ 

Match the character “<” literally «<» 
Match any character that is NOT a “>” «[^>]+» 
    Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+» 
Match the character “>” literally «>» 


/\&#?[a-z0-9]{2,4}\;/ 

Match the character “&” literally «\&» 
Match the character “#” literally «#?» 
    Between zero and one times, as many times as possible, giving back as needed (greedy) «?» 
Match a single character present in the list below «[a-z0-9]{2,4}» 
    Between 2 and 4 times, as many times as possible, giving back as needed (greedy) «{2,4}» 
    A character in the range between “a” and “z” «a-z» 
    A character in the range between “0” and “9” «0-9» 
Match the character “;” literally «\;» 


/[^a-z0-9-]+/i 

Options: case insensitive 

Match a single character NOT present in the list below «[^a-z0-9-]+» 
    Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+» 
    A character in the range between “a” and “z” «a-z» 
    A character in the range between “0” and “9” «0-9» 
    The character “-” «-»

来源

2013-03-19 23:34:10 Scuzzy

PHP的正则表达式和preg_replace问题

回答

相关问题