PHP preg_match_all不正确匹配

我想从网站源代码中获取一些数据。我想要做的是获得/collections/(whatever that follows here)之后的所有内容。我的模式与我所寻找的“最”相匹配。当我的preg_match_all达到“&”的模式时，就会发生问题，此时它将简单读到“&”的位置并停止读取其余部分。这里是我的脚本：PHP preg_match_all不正确匹配

$homepage = file_get_contents('http://www.harrisfarm.com.au/'); 
$pattern = '/collections([\w-&\/]*)/i'; 
preg_match_all($pattern, $processedHomePage, $collections); 
print_r($collections);

注意，这样打印时，事物的“&”被忽略之后，这意味着它会得到我：

/collections/seafood/Shellfish-&

但是，当我的模式匹配上一个串如下这样：

$subject = 'a href="/collections/organic/Pantry/sickmonster/grandma" <a href="/collections/seafood/Shellfish-&-Crustaceans">Oysters, Shellfish & Crustaceans';

能把我我想要的一切：

/collections/seafood/Shellfish-&-Crustaceans

所以我想......为什么会发生这种情况？我真的很难过。

来源

2014-11-24 user2443943

好像[它应该匹配（http://regex101.com/r/ tY7sE1/1）？你确定'＆'没有被转义到'&'，因为你正在处理网页？ – 2014-11-24 22:12:43

我认为“＆”以某种方式被转换为&。但我不知道我怎么能阻止它做到这一点？是否有任何来自PHP的神奇功能，将阻止它做到这一点？ – user2443943 2014-11-24 22:41:49

$ homepage和$ processedHomePage之间正在做什么？我想这是代码缺失。 – hellcode 2014-11-24 22:42:58

当您使用$ homepage而不是preg_match_all中的$ processedHomePage时，提供的代码没有问题。

BTW：你应该逃避方括号减号（或它的开头或在方括号中表达的结束写），但意外的是让你的情况没有什么区别：

$ pattern ='/ collections（[ - \ w & /] *）/ i';

查看http://php.net/manual/regexp.reference.meta.php了解更多信息。

来源

2014-11-24 22:22:02 hellcode

虽然我同意你最好不要在组中间加一个连字符，但我认为这不是问题。 [This shows]（http://regex101.com/r/tY7sE1/1）它工作正常，并且在OP中，如果它需要被转义，它不可能在'＆'之前匹配'-' – 2014-11-24 22:30:03

不，不是连字符的问题。尝试过，仍然是相同的结果。 – user2443943 2014-11-24 22:37:02

我想出了什么问题 - 也许这会在稍后帮助其他人。

我曾尝试使用htmlspecialchars()转换url http://www.harrisfarm.com.au/，然后以字符串形式读取它。这将一些特殊字符如&和其他一些东西转换成有许多字符的东西。

&转换成&它有一个;，这不是我的正则表达式。由于;不是正则表达式的一部分，因此正则表达式在此时停止匹配。

来源

2014-11-24 22:57:50 user2443943

很高兴你发现了这个问题 - 在将来的任何问题中，请包括所有相关的代码，因为这会突出显示问题:) – 2014-12-10 14:30:04

试试这个：

$re = "/\\/collections([\\w\\-\\&\\/;]*)/mi"; 
$str = "<a href=\"/collections/seafood/Shellfish-&amp;-Crustaceans\">Oysters, Shellfish & Crustaceans';\n<a href=\"/collections/seafood/Shellfish-&-Crustaceans\">Oysters,collections Shellfish & Crustaceans';"; 

preg_match_all($re, $str, $matches);

live demo

您的更新代码

$homepage = file_get_contents('http://www.harrisfarm.com.au/'); 
$pattern = "/\\/collections([\\w\\-\\&\\/;]*)/mi"; 
preg_match_all($pattern, $homepage, $collections); 
print_r($collections);

来源

2014-11-25 05:26:07

PHP preg_match_all不正确匹配

回答

相关问题