我正在使用DOMDocument解析一个html文档并从中获取一些数据。以下是DOM的子树的结构如何有选择地删除DOM文档的子树中的节点?
<div id="tab1">
<div class="some class name"></div>
<div class="some other class name">arbitrary data and nodes</div>
<p> lot of paragraphs to follow </p>
<p> paragraphs </p>
<p> paragraphs </p>
<p> paragraphs </p>
<p> paragraphs </p>
<br />
<br />
<br />
<br />
<br />
<table />
<table />
<table />
<table />
</div>
我不想要tab1的前两个孩子。我使用下面的PHP代码
<?php
$urlArray = file('sitemap.txt');
$dataSet = array();
foreach($urlArray as $url){
$scrapedData = file_get_contents('./scraped-site/'.trim($url));
$doc = new DOMDocument();
@$doc->loadHTML($scrapedData);
$domXpathDoc = new DOMXPath($doc);
$results = '';
$xpathArray = array(
'info'=>'//*[@id="tabs1"]',
);
$set = array();
foreach($xpathArray as $field => $xpath){
$domNodeList = $domXpathDoc->query($xpath);
foreach($domNodeList as $node){
foreach ($node->childNodes as $child) {
$set[] = $child->ownerDocument->saveXML($child);
}
}
}
$dataSet[] = $set;
}
给出的代码给了我所有的孩子我怎么能选择性地避免任何节点?
你只是希望删除第一个两个孩子? – BumbleShrimp
@JonathonG,在你看到的结构中,我只想删除前两个元素,但在其他一些情况下可能会有所不同。 – Kumar