2014-09-07 65 views
0

我对此很新。我想使用PHP从页面中提取表格,并在修改所有锚点的HREF值后返回HTML。 下面是表:用DOMdocument和DOMXpath刮网页

<html xmlns="http://www.w3.org/1999/xhtml"> 
<head> 
    <meta http-equiv="Content-Type" content="text/html; charset=windows-1255"> 
    <link rel="stylesheet" type="text/css" href="../CssGraduateE.css"> 
    <title></title> 
</head> 
<body> 
    <div> 
     <br> 
     <table class="main" cellspacing="0" cellpadding="0"> 
      <tbody> 
       <tr> 
        <td> 
         <br><span class="MainHeader">Subjects in Faculty - Electrical Engineering</span><br><br> 
         <table cellpadding="2" cellspacing="0" border="1" width="100%"> 
          <tbody> 
           <tr> 
            <td><span class="SecondHeader"> Subject Number</span></td> 
            <td><span class="SecondHeader">Subject Name</span></td> 
            <td><span class="SecondHeader">Points</span></td> 
            <td><span class="SecondHeader">Semesters</span></td> 
            <td>Subject Site</td> 
           </tr> 
           <tr> 
            <td><a href="../Subjects/?SUB=46001">46001</a>&nbsp;</td> 
            <td nowrap="">Engineering of Distributed Software Sys</td> 
            <td>3</td> 
            <td><br></td> 
            <td><a target="_newtab" href="http://www.thislinkisok.com/courses/046001">www</a></td> 
           </tr> 
           <tr> 
            <td><a href="../Subjects/?SUB=46002">46002</a>&nbsp;</td> 
            <td nowrap="">Design and Analysis of Algorithms</td> 
            <td>3</td> 
            <td>B<br></td> 
            <td>&nbsp;<br></td> 
           </tr> 
          </tbody> 
         </table> 
        </td> 
       </tr> 
      </tbody> 
     </table> 
     <br> 
     <table border="0"> 
      <tbody> 
       <tr> 
        <td>Last Update on :</td> 
        <td>Wednesday ,9 April 2014</td> 
        <td></td> 
       </tr> 
      </tbody> 
     </table> 
    </div> 
</body> 
</html>  

我知道怎么抢我想表: $查询= $ xpath->查询('//表[@类= “主”] //台1 ]'); 但我该如何循环所有以“../xxx”开头的链接并将它们修改为如下所示的内容:“www.mynewlink.com/xxx”? 最后,我想将提取的表格作为HTML返回。我如何使用原生DOMDocument和DOMXpath来做到这一点?

谢谢大家!

回答

1

如果$html是你的字符串与HTML您从外部网站获得,你可以做这样的事情:

$dom = new DOMDocument(); 
@$dom->loadHTML($html); 

$xpath = new DOMXPath($dom); 

foreach($xpath->query('//table[@class="main"]//a[starts-with(@href, "../")]') as $link) { 
    $link->setAttribute('href', preg_replace('#^..#', 'http://www.mynewlink.com', $link->getAttribute('href'))); 
} 

$container = new DOMDocument(); 
$container->appendChild($container->importNode($xpath->query('//table[@class="main"]')->item(0), true)); 

echo $container->saveHTML(); 
+0

谢谢!有效! – wpdev

+0

为什么我需要创建一个新的$ container DOMDocument?我不能这样做:$ table = $ xpath-> query('// table [@ class =“main”] // table [1]');返回$ dom-> saveHTML($ table-> item(0)); – wpdev

+0

@ user3510841如果你不这样做,你只会得到表的内部内容,没有'

...',所以我们需要创建一个主容器,它可以容纳整个表格包括其开始标签。请点击问题旁边的复选标记接受我的回答,如果它解决了你的问题。 – silkfire