从一个html文档中提取特定的部分，php cURL，php，preg_match

我试图从网页中使用php cURL + preg_match或任何其他函数提取一些信息，但由于某些原因它根本不起作用。例如，从this page, 我想提取“4床房子出租， Caroline Place，Bayswater，W2”的标题，价格是“2,300”，并且以“This fantastic ... “并结束于”（Circle and District Lines）“。我试图使用PHP cURL + DOM，但我得到了很多像这样的错误“htmlParseEntityRef：expectcting';'在实体，行：243“和没有结果显示从一个html文档中提取特定的部分，php cURL，php，preg_match

另外我试图使用preg_match或preg_match_all但也不工作。

一个非常基本的例子将不胜感激！

来源

2010-05-04 Michael

我认为，DOM解决方案无法正常工作，因为页面无效xhtml或xml – Michael 2010-05-04 18:43:08

也许发布您尝试过的正则表达式不起作用。这些模式看起来非常简单。 – serg 2010-05-04 18:46:23

**不要使用正则表达式来解析HTML **，而是使用[html dom解析器代替]（http://simplehtmldom.sourceforge.net/）它支持无效的HTML。 – 2011-08-18 00:25:30

一个非常基本的例子是高度理解

要回答的正则表达式的一部分：

preg_match('!<title>(.*)</title>!s', '<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" 
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> 
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> 
    <head> 
<title> 

      4 bedroom 


     house 


    to rent in Caroline Place, Bayswater, W2 through Foxtons (Property to rent)</title> 
<meta name="keywords" content="Houses" />', $matches); 
print_r($matches); 

/* output: 
Array 
(
    [0] => <title> 

      4 bedroom 


     house 


    to rent in Caroline Place, Bayswater, W2 through Foxtons (Property to rent)</title> 
    [1] => 

      4 bedroom 


     house 


    to rent in Caroline Place, Bayswater, W2 through Foxtons (Property to rent) 
) 
*/

的s在正则表达式的结尾使解析器弄成（inaptly）称为single-line mode。

来源

2010-05-04 18:57:33 webbiedave

非常感谢您的帮助。我成功地制作了一个脚本来提取我需要的信息，但我仍然对价格有一些问题。我有这个： preg_match（'！