试图从网页上刮掉所有的Facebook链接

我试图从Facebook上链接页面。但是，我得到一个空白页面，没有任何错误消息。试图从网页上刮掉所有的Facebook链接

我的代码如下：

<?php 
error_reporting(E_ALL); 

function getFacebook($html) { 

    $matches = array(); 
    if (preg_match('~^https?://(?:www\.)?facebook.com/(.+)/?$~', $html, $matches)) { 
     print_r($matches); 

    } 
} 

$html = file_get_contents('http://curvywriter.info/contact-me/'); 

getFacebook($html);

这有什么错呢？

来源

2013-01-20 Ash Van Wilder

这有什么错呢？ –

得到一个空白页..没有输出.. –

这意味着你的比赛失败。尝试'preg_match_all'，一次，并从你的模式中剔除'^'和'$'。 –

一个更好的选择（更强大的）是使用DOM文档和DOMXPath：

<?php 
error_reporting(E_ALL); 

function getFacebook($html) { 

    $dom = new DOMDocument; 
    @$dom->loadHTML($html); 

    $query = new DOMXPath($dom); 

    $result = $query->evaluate("(//a|//A)[contains(@href, 'facebook.com')]"); 

    $return = array(); 

    foreach ($result as $element) { 
     /** @var $element DOMElement */ 
     $return[] = $element->getAttribute('href'); 
    } 

    return $return; 

} 

$html = file_get_contents('http://curvywriter.info/contact-me/'); 

var_dump(getFacebook($html));

为了您的具体问题，但是，我做了以下几件事：

变化preg_match至preg_match_all，以便在首次发现后不会停止。
删除模式中的^（开始）和$（结束）字符。您的链接将出现在文件的中间，而不是在开始或结束（绝对不是两个！）

所以纠正代码：

<?php 
error_reporting(E_ALL); 

function getFacebook($html) { 

    $matches = array(); 
    if (preg_match_all('~https?://(?:www\.)?facebook.com/(.+)/?~', $html, $matches)) { 
     print_r($matches); 

    } 
} 

$html = file_get_contents('http://curvywriter.info/contact-me/'); 

getFacebook($html);

来源

2013-01-20 07:34:11

你能指出我在正确的方向如何我可以清除代码并删除额外的标签，如'taget =空白'和锚文本。我只想要Facebook的网址。 –

试图从网页上刮掉所有的Facebook链接

回答

相关问题