正则表达式解析HTML

请参阅代码：正则表达式解析HTML

$result = "<b>Associated Names</b>&nbsp;&nbsp;[<a href='http://www.examples.com/authors.html?act=change&id=6141&item=associated'><u>Edit</u></a>]</td> 
     </tr> 
     <tr> 
      <td class='text' align='left'>G&#12539;R<br />G-R<br />   </td>" 

preg_match_all("/<b>Associated Names.{10,100}<td class='text' align='left'>((.*<br \/>)*).*<\/td>/sU", $result, $assoc); 
var_dump($assoc); 
----------------------------------------------------------- 
RESULT 
array 
    0 => 
    array 
     0 => string '<b>Associated Names</b></td> 
     </tr> 
     <tr> 
      <td class='text' align='left'>G&#12539;R<br />G-R<br />   </td>' (length=135) 
    1 => 
    array 
     0 => string '' (length=0) 
    2 => 
    array 
     0 => string '' (length=0)

我希望它返回

array(
    1 => 
    array 
     0 => string 'G&#12539;R', 
    2 => 
    array 
     0 => string> 'G-R' 
)

是括号的事（（））我想解决这个问题，请大家帮忙我

来源

2010-07-17 meotimdihia

什么是你对符合正则表达式？ – quantumSoup 2010-07-17 17:23:40

最好不要使用正则表达式来解析HTML。改为尝试一个HTML解析器。 – 2010-07-17 17:25:54

我们可以在“Ask Question”页面告诉用户不要尝试用正则表达式解析HTML吗？ – 2010-07-17 17:44:17

请不要试图用正则表达式解析HTML，它invokes the wrath of Zalgo。

尝试使用the DOM和xpath来定位您尝试提取的特定元素和属性。

（我会提供一个XPath例子，但它仍然是我学习的列表... :)）

来源

2010-07-17 17:25:56 Charles

感谢您的建议 – meotimdihia 2010-07-17 17:28:20

不幸的是，有些时候这是唯一的方法，因为不是每个页面都格式良好。很多次，Zend Dom Query未能正确创建dom，并且我得到了错误的结果。当然不是框架的错误，但解析可能会变得混乱。我使用两种方法，特设。 – johnjohn 2010-07-17 17:31:18

@john，您是否试图首先通过[tidy]（http://us2.php.net/manual/en/book.tidy.php）运行该页面？ – Charles 2010-07-17 17:42:31

回答

相关问题