2017-04-22 33 views
0

我试图提取路段第一次出现的启动这样PHP提取链接的第一次出现在源代码

https://encrypted-tbn3.gstatic.com/images?... 

从页面的源代码。该链接的开始和结尾的”这是我到目前为止已经有:。

$search_query = $array[0]['Name']; 
$search_query = urlencode($search_query); 
$context = stream_context_create(array('http' => array('header' => 'User-Agent: Mozilla compatible'))); 
$response = file_get_contents("https://www.google.com/search?q=$search_query&tbm=isch", false, $context); 
$html = str_get_html($response); 
$url = explode('"',strstr($html, 'https://encrypted-tbn3.gstatic.com/images?'[0])) 

然而$网址的输出是不是我尝试提取链接,但非常不同的东西我已经加入了图像。enter image description here

谁能解释输出给我,我怎么会得到所需的链接?谢谢

回答

1

看来你使用PHP Simple HTML DOM Parser
我通常使用DOMDocument,这是php构建的一部分-in类。
这里有你所需要的工作示例:

$search_query = $array[0]['Name']; 
$search_query = urlencode($search_query); 
$context = stream_context_create(array('http' => array('header' => 'User-Agent: Mozilla compatible'))); 
$response = file_get_contents("https://www.google.com/search?q=$search_query&tbm=isch", false, $context); 

libxml_use_internal_errors(true); 
$dom = new DOMDocument(); 
$dom->loadHTML($response); 

foreach ($dom->getElementsByTagName('img') as $item) { 
    $img_src = $item->getAttribute('src'); 
    if (strpos($img_src, 'https://encrypted') !== false) { 
     print $img_src."\n"; 
    } 
} 

输出:

https://encrypted-tbn2.gstatic.com/images?q=tbn:ANd9GcSumjp6e37O_86nc36mlktuWpbFuCI4nkkkocoBCYW3qCOicqdu_KEK-MY 
https://encrypted-tbn3.gstatic.com/images?q=tbn:ANd9GcR_ttK8NlBgui_JndBj349UxZx0kHn0Z-Essswci-_5UQCmUOruY1PNl3M 
https://encrypted-tbn2.gstatic.com/images?q=tbn:ANd9GcSydaTpSDw2mvU2JRBGEYUOstTUl4R1VhRevv1Sdinf0fxRvU26l3pTuqo 
... 
0
$url_beginning = 'https://encrypted-tbn3.gstatic.com/images?'; 
if(preg_match('/\"(https\:\/\/encrypted\-tbn3\.gstatic\.com\/images\?.+?)\"/ui',$html, $matches)) 
    $url = $matches[1]; 
else 
    $url = ''; 

尝试使用了preg_replace,它更适合用来解析

而且在这个例子假定你的HTML中的url应该被引用。

UPD 一点点调整版本对任何URL的开头使用:

$url_beginning = 'https://encrypted-tbn3.gstatic.com/images?'; 
$url_beginning = preg_replace('/([^а-яА-Я[email protected]%\s])/ui', '\\\\$1', $url_beginning); 
if(preg_match('/\"('.$url_beginning.'.+?)\"/ui',$html, $matches)) 
    $url = $matches[1]; 
else 
    $url = '';