2011-03-02 59 views
0

我正在制作一个PHP应用程序来解析HTML内容。我需要在php变量中存储某个表列。使用PHP解析HTML文档

这里是我的代码:

$dom = new domDocument; 

@$dom->loadHTML($html); 
$dom->preserveWhiteSpace = false; 
$tables = $dom->getElementsByTagName('table'); 

    $rows = $tables->item(0)->getElementsByTagName('tr');  
    $flag=0; 
    foreach ($rows as $row) 
    { 
      if($flag==0) $flag=1; 
      else 
      { 
        $cols = $row->getElementsByTagName('td'); 
        foreach ($cols as $col) 
        { 
         echo $col->nodeValue; //NEED HELP HERE 
        } 
        echo '<hr />'; 
      } 
    } 

在每一行中,第一个关口是关键,二是价值问题。如何从表中创建键值对并将它们作为数组存储在php中。

我尝试了很多事情,但每次我只是得到DOMElement Object()作为价值。

任何帮助深表感谢......的要求

HTML:

<table align='center' border='0' cellpadding='0' cellspacing='0' style='border-collapse: collapse' width='780' height=100%> 
<tr><td height=96% align=center><BR><BR>  
<html> 
<head> 
</head> 
<body style="background:url(uptu_logo1.gif); background-repeat:no-repeat; background-position:center"> 
<p align="center" style="font-size:18px"><span style='font-size:20px'>this text is unimportant gibberish that is not required by my app</span><br/><span style='font-size:16px'>this text is unimportant gibberish that is not required by my app</span><br/><u>B.Tech. Third Year Result 2009-10. this text is unimportant gibberish that is not required by my app</u></p> 
<br/> 
<table align="center" border="1" cellpadding="0" cellspacing="0" bordercolor="#E3DDD5" width="700" style="border-collapse: collapse; font-size: 11px"> 
<tr> 

<td width="50%"><b>Name:</b></td> 
<td width="50%">John Fernandes   </td> 
</tr> 
<tr> 
<td><b>Fathers Name:</b></td> 
<td>Caith Fernandes     </td> 
</tr> 
<tr> 
<td><b>Roll No:</b></td> 
<td>0702410099</td> 
</tr> 

<tr> 
<td><b>Status:</b></td> 
<td>REGULAR </td> 
</tr> 
<tr> 
<td><b>Course/Branch:</b></td> 
<td>B. Tech. </td> 
</tr> 
<tr> 
<td><b>Institute Name</b></td> 
<td>Imperial College of Science and Technology</td> 

</tr> 
</table> 

我的PHP代码输出:

Name:John Fernandes   <hr /> 
Fathers Name:Caith Fernandes     <hr /> 
Roll No:0702410099<hr /> 
Status:REGULAR <hr /> 
Course/Branch:B. Tech. Computer Science and Engineering (10)<hr /> 
Imperial College of Science and Technology<hr /> 

还怎么摆脱这种很傻吗?我在原始HTML中看到 ,所以我尝试使用PHP函数进行消毒html_entity_decode()但是它仍然存在...

+0

不应该是'$ dom = new domDocument;'是'$ dom = new DOMDocument();'? – 2011-03-02 17:23:57

+1

@Rocket PHP中的类名不区分大小写,括号对于没有任何参数的构造函数是可选的。 – lonesomeday 2011-03-02 17:25:19

+1

你可以包含HTML吗?还有,'''cols = $ row-> getElementsByTagName('td');'''总是只返回2列? – McHerbie 2011-03-02 17:25:40

回答

2

什么是您正在加载的HTML?我假设它的东西简单,像这样:

<table> 
    <tr> 
     <td>heading</td> 
     <td>heading</td> 
    </tr> 
    <tr> 
     <td>key</td> 
     <td>value</td> 
    </tr> 
</table> 

看起来像第一TR被跳过(标题),然后你必须要配对的钥匙只有2列=>值;

$cols = $row->getElementsByTagName('td'); 
$key = $cols->item(0)->nodeValue; // string(3) "key" 
$val = $cols->item(1)->nodeValue; // string(5) "value" 

上述代码将返回您想要的项目。

+0

+1这也是我的设想。用你的例子,你可以去'$ cols = $ row-> childNodes'。 – lonesomeday 2011-03-02 17:35:39