正则表达式以匹配包含“Google”的链接

我想使用PHP正则表达式来匹配包含单词google的所有链接。我试过这个：正则表达式以匹配包含“Google”的链接

$url = "http://www.google.com"; 
$html = file_get_contents($url); 
preg_match_all('/<a.*(.*?)".*>(.*google.*?)<\/a>/i',$html,$links); 
echo '<pre />'; 
print_r($links); // it should return 2 links 'About Google' & 'Go to Google English'

但是它什么都没有返回。为什么？

来源

2011-03-06 yuli chika

这里的“问题”是，当完美的解析器和XPath可用时，您正在使用正则表达式。 – 2011-03-06 10:36:00

你应该使用dom parser，因为在HTML文档中使用正则表达式可能会“痛苦”地出错。尝试类似这样的

//Disable displaying errors 
libxml_use_internal_errors(TRUE); 

$url="http://www.google.com"; 
$html=file_get_contents($url); 


$doc = new DOMDocument(); 
$doc->loadHTML($html); 
$n=0; 
foreach ($doc->getElementsByTagName('a') as $a) { 
    //check if anchor contains the word 'google' and print it out 
    if ($a->hasAttribute('href') && strpos($a->getAttribute('href'),'google')) { 
     echo "Anchor" . ++$n . ': '. $a->getAttribute('href') . '<br>'; 
    } 
}

来源

2011-03-06 10:53:09 Francesco

wahoo ~~ dom可以做到这一点。非常感谢。我学习了一些新的。 – 2011-03-06 11:04:48

这与OP想要的不同（至少通过查看他的代码）。他似乎希望获得* text *包含Google的链接，而不是URL。但是，因为这是被接受的答案......要么他没有正确指出，要么不在乎。 – 2011-03-06 11:08:05

更好的是在这里使用XPath：

$url="http://www.google.com"; 
$html=file_get_contents($url); 

$doc = new DOMDocument; 
$doc->loadHTML($html); 

$xpath = new DOMXPath($doc); 
$query = "//a[contains(translate(text(), 'GOOGLE', 'google'), 'google')]"; 
// or just: 
// $query = "//a[contains(text(),'Google')]"; 
$links = $xpath->query($query);

$links将是一个DOMNodeList可以迭代。

来源

2011-03-06 10:27:29

正则表达式以匹配包含“Google”的链接

回答

相关问题