php cURL。 preg_match，从xhtml中提取文本

我试图使用php cURL和preg_match从下面的html页面/链接中提取价格。基本上，我期待这个代码输出4,550，但由于某种原因，我得到php cURL。 preg_match，从xhtml中提取文本

 Notice: Undefined offset: 1 in C:\wamp\www\test.php on line 22

我觉得模式是正确的，因为如果我把HTML本身的变量和逃避“”它的工作原理！。另外，如果我输出（echo $ result;）它显示从foxtons网站正确抓住的HTML，所以我无法弄清楚为什么整个事情不起作用。我需要做这项工作，如果您能告诉我为什么会生成该通知，以及为什么我的当前脚本不起作用，我将不胜感激。

 
$url = " http://www.foxtons.co.uk/search?bedrooms_from=0&property_id=727717 "; 
$ch = curl_init($url); 

curl_setopt($ch, CURLOPT_HEADER, 0); 
curl_setopt($ch,CURLOPT_RETURNTRANSFER, 1); 
$result = curl_exec($ch); 
curl_exec($ch); 
curl_close($ch); 
$result2 = str_replace('"', '\"', $result); 

$tagname1= ");</script> 
    "; 
$tagname2= "</noscript> 
    per month</a>"; 

$pattern = "/$tagname1(.*?)$tagname2/"; 
preg_match($pattern, $result, $matches); 
$prices = $matches[1]; 

print_r($prices); 

?>

来源

2010-05-14 Michael

为什么你定义$ result2如果你不使用它？ – Artefacto 2010-05-15 00:02:28

我重写了剧本有点占超过1 <无脚本>在页面上。你需要使用preg_match_all来寻找所有的匹配，而不是只停留在第一个匹配。



$url = "http://www.foxtons.co.uk/search?bedrooms_from=0&property_id=727717"; 
$ch = curl_init($url); 
curl_setopt($ch, CURLOPT_HEADER, 0); 
curl_setopt($ch,CURLOPT_RETURNTRANSFER, 1); 
$result = curl_exec($ch); 
curl_exec($ch); 
curl_close($ch); 

preg_match_all("/<noscript>(.*)<\/noscript>/", $result, $matches); 
print_r($matches);

输出



Array 
(
    [0] => Array 
     (
      [0] => £1,050 
      [1] => 4,550 
     ) 

    [1] => Array 
     (
      [0] => £1,050 
      [1] => 4,550 
     ) 

)

我想这对我的盒子，它的工作 - 让我知道，如果它的工作对你

来源

2010-05-15 00:11:33

是的，它也适用于我的。我会将你的答案设定为接受，但如果你能向我解释为什么我的脚本不起作用，我将不胜感激，因此我可以理解它有什么问题。问候，迈克尔！ – Michael 2010-05-16 03:25:45

我看到的一些东西 - 1）你不应该需要逃避报价 2）使用preg_match_all和preg_match，preg_match_all找到所有匹配与第一个匹配（在这种情况下，这不是你正在寻找的结果） 3）你用*和？在模式字符串中，？匹配零次或一次，*匹配零次或多次 – 2010-05-16 22:43:40

不要使用正则表达式来解析HTML，使用改为html dom解析器，如PHP Simple HTML DOM Parser

include("simple_html_dom.php") ; 

$html = file_get_html("http://www.foxtons.co.uk/search?bedrooms_from=0&property_id=727717"); 

foreach($html->find('noscript') as $noscript) 
{ 

    echo $noscript->innertext."<br>"; 
}

回声的：

来源

2011-08-09 17:03:19

php cURL。 preg_match，从xhtml中提取文本

回答

相关问题