2015-07-01 264 views
1

我有我自己的外部网站,我想从网站获取一些数据。我用CURL来获取网站的内容,但我想要一些部分是:从网站获取数据

编辑:非常坦率地说,我想获取Facebook页面的时间戳,如果您在页面上使用Inspect元素,您将看到这样的代码:

<span class="fsm fwn fcg"><a class="_5pcq"> 
<abbr title="Tuesday, June 30, 2015 at 5:00pm" data-utime="1435663826" data-shorten="1" class="_5ptz timestamp livetimestamp">5 hrs</abbr></a> 
<span class="fsm fwn fcg"><a class="_5pcq"> 
<abbr title="Tuesday, June 30, 2015 at 5:01pm" data-utime="1435663827" data-shorten="1" class="_5ptz timestamp livetimestamp">5 hrs</abbr></a> 
<span class="fsm fwn fcg"><a class="_5pcq"> 
<abbr title="Tuesday, June 30, 2015 at 5:02pm" data-utime="1435663828" data-shorten="1" class="_5ptz timestamp livetimestamp">5 hrs</abbr></a> 
<span class="fsm fwn fcg"><a class="_5pcq"> 
<abbr title="Tuesday, June 30, 2015 at 5:03pm" data-utime="1435663829" data-shorten="1" class="_5ptz timestamp livetimestamp">5 hrs</abbr></a> 
<span class="fsm fwn fcg"><a class="_5pcq"> 
<abbr title="Tuesday, June 30, 2015 at 5:04pm" data-utime="1435663830" data-shorten="1" class="_5ptz timestamp livetimestamp">5 hrs</abbr></a> 
</span> 

我只是想显示“数据UTIME”是1435663826.这里的价值是我的代码,将获取的内容。在此之后我应该使用什么?

$cookie = tmpfile(); 
    $userAgent = 'Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.64 Safari/537.31' ; 

    $ch = curl_init("https://www.mywebsite.com"); 

    $options = array(
     CURLOPT_CONNECTTIMEOUT => 20 , 
     CURLOPT_USERAGENT => $userAgent, 
     CURLOPT_AUTOREFERER => true, 
     CURLOPT_FOLLOWLOCATION => true, 
     CURLOPT_RETURNTRANSFER => true, 
     CURLOPT_COOKIEFILE => $cookie, 
     CURLOPT_COOKIEJAR => $cookie , 
     CURLOPT_SSL_VERIFYPEER => 0 , 
     CURLOPT_SSL_VERIFYHOST => 0 
    ); 

    curl_setopt_array($ch, $options); 
    $kl = curl_exec($ch); 
    curl_close($ch); 

    echo $kl; // Final output after fetching 
+0

喜杰夫,你可以给整个PHP。我可以帮你解决它。 –

+0

这是完整的PHP! – Jeff

回答

1

你可以使用PHP的DOM扩展load and parse HTML文件,然后使用DOMXPath一个实例query特定元素。

+0

我尝试了很多人。但它不适合我。仅仅因为我使用CURL来获取? – Jeff

0

如果你已经得到的HTML标签,你可以

试试这个:

<?php 

$curl = curl_init('https://www.facebook.com/Rajnikant.Vs.CIDJokez'); 


curl_setopt($curl, CURLOPT_FAILONERROR, true); 
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true); 
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true); 
curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, false); 
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false); 
$result = curl_exec($curl); 
//echo $result; 

/* $result = 
'<span class="fsm fwn fcg"><a class="_5pcq"> 
<abbr title="Tuesday, June 30, 2015 at 5:00pm" data-utime="1435663826" data-shorten="1" class="_5ptz timestamp livetimestamp">5 hrs</abbr></a> 
<span class="fsm fwn fcg"><a class="_5pcq"> 
<abbr title="Tuesday, June 30, 2015 at 5:01pm" data-utime="1435663827" data-shorten="1" class="_5ptz timestamp livetimestamp">5 hrs</abbr></a> 
<span class="fsm fwn fcg"><a class="_5pcq"> 
<abbr title="Tuesday, June 30, 2015 at 5:02pm" data-utime="1435663828" data-shorten="1" class="_5ptz timestamp livetimestamp">5 hrs</abbr></a> 
<span class="fsm fwn fcg"><a class="_5pcq"> 
<abbr title="Tuesday, June 30, 2015 at 5:03pm" data-utime="1435663829" data-shorten="1" class="_5ptz timestamp livetimestamp">5 hrs</abbr></a> 
<span class="fsm fwn fcg"><a class="_5pcq"> 
<abbr title="Tuesday, June 30, 2015 at 5:04pm" data-utime="1435663830" data-shorten="1" class="_5ptz timestamp livetimestamp">5 hrs</abbr></a> 
</span>'; 
*/ 
$html = $result; 
$dom = new DOMDocument(); 

@$dom->loadHTML($html); 
$a = $dom->getElementsByTagName('abbr'); 

$data = array(); 

for ($i=0; $i < $a->length; $i++) { 
    $data[] = $a->item($i)->getAttribute('data-utime'); 

} 

echo '<pre>'; 
print_r($data); 
echo '</pre>'; 
+0

嗯,非常愚蠢地说,我想要报废Facebook页面。并想获得第一页的时间戳。你显示的代码不起作用。页面是:https://www.facebook.com/Rajnikant.Vs.CIDJokez 如果你使用Inspect Element,那么你可以看到帖子的时间戳 – Jeff

+0

@Jeff更新了我的答案。 –

+0

仍然没有工作!这是为你工作吗?如果你使用你的代码,那么它会告诉你错误“更新你的浏览器” – Jeff