2011-05-21 51 views
0

我正在尝试做一个简单的提取,但我一直以不可预知的结果结束。关于寻找类的Simple_DOM问题

我有这样的HTML代码

<div class="thread" style="margin-bottom:25px;"> 

<div class="message"> 

<span class="profile">Suzy Creamcheese</span> 

<span class="time">December 22, 2010 at 11:10 pm</span> 

<div class="msgbody"> 

<div class="subject">New digs</div> 

Hello thank you for trying our soap. <BR> Jim. 

</div> 
</div> 


<div class="message reply"> 

<span class="profile">Lars Jörgenmeier</span> 

<span class="time">December 22, 2010 at 11:45 pm</span> 

<div class="msgbody"> 

I never sold you any soap. 

</div> 

</div> 

</div> 

而且我试图从“msgbody”,但只有当“轮廓”等于东西outertext。像这样。

$contents = $html->find('.msgbody'); 
$elements = $html->find('.profile'); 

      $length = sizeof($contents); 

      while($x != sizeof($elements)) { 

      $var = $elements[$x]->outertext; 

         //If profile = the right name 
      if ($var = $name) { 

            $text = $contents[$x]->outertext; 
       echo $text; 

      } 



      $x++; 
     }  

我从错误的配置文件中获取文本,而不是我需要的关联文本。 有没有办法只用一行代码来拉取所需的信息?

一样,如果跨度知名度=“正确名称”,然后 拉它的DIV-msgbody

回答

3

好吧,我要与DOMXpath去这一个。我不知道什么是外文“的解释是:,但我会用这个要求去:

一样,如果跨度知名度=“正确名称” 然后将其DIV-msgbody

首先,这里是缩小的HTML测试情况下,我用:

<html> 
<body> 
<div class="thread" style="margin-bottom:25px;"> 

<div class="message"> 

<span class="profile">Suzy Creamcheese</span> 

<span class="time">December 22, 2010 at 11:10 pm</span> 

<div class="msgbody"> 

<div class="subject">New digs</div> 

Hello thank you for trying our soap. <BR> Jim. 

</div> 
</div> 


<div class="message reply"> 

<span class="profile">Lars Jörgenmeier</span> 

<span class="time">December 22, 2010 at 11:45 pm</span> 

<div class="msgbody"> 

I never sold you any soap. 

</div> 

</div> 

</div> 
</body> 
</html> 

因此,我们将弥补这方面的XPath查询。让我们显示了整个事情,然后把它分解:

$messages = $xpath->query("//span[@class='profile' and contains(.,'$profile_name')]/../div[@class='msgbody']"); 

击穿:

//跨度

给我跨越

//跨度[@类=”个人资料']

给我跨班级的地方 个人资料

//跨度[@类= '个人资料' 和 包含(。, '$ PROFILE_NAME')]

给我跨越其中类是 轮廓和跨度内 包含$profile_name,这是该 名字你以后

//跨度[@类= '个人资料' 和 包含(。, '$ PROFILE_NAME')] /../

给我跨越其中类是 简介并且跨度 的内部包含$profile_name,这是 名字你现在后走升了一级, 这使我们向<div class="message">

//跨度[@类= '个人资料' 和 包含(。, '$ PROFILE_NAME')]/../ DIV [@类=“msgbody”]

给我跨越其中类是 轮廓和跨度 包含$profile_name内,这是 名字你现在后升了一级, 这得到我们<div class="message">最后,给我 所有div <div class="message"> 其中类是msgbody

那么现在,这里的PHP代码的样本下:

$doc = new DOMDocument(); 
$doc->loadHTMLFile("test.html"); 

$xpath = new DOMXpath($doc); 
$profile_name = 'Lars Jörgenmeier'; 
$messages = $xpath->query("//span[@class='profile' and contains(.,'$profile_name')]/../div[@class='msgbody']"); 
foreach ($messages as $message) { 
    echo trim("{$message->nodeValue}") . "\n"; 
} 

XPath非常强大。我建议您查看basic tutorial,如果您想查看更多高级用法,则可以检查XPath standard

+0

谁是很多简洁的信息。感谢xpath转换。我爱Simple_DOM,但它出血的记忆! – user734063 2011-05-22 03:17:10

+0

另外,我注意到你必须在头文件中插入这些字符来获得特殊字符,比如'Jörgenmeier'来传递XPath。 – user734063 2011-05-22 03:57:36

+0

'<!DOCTYPE html PUBLIC“ - // W3C // DTD XHTML 1.0 Strict // EN”“http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd”> ' – user734063 2011-05-22 03:58:12

0

这是一个简单的HTML DOM工作示例。

我改变你的榜样HTML,所以会有对苏西奶油芝士多个配置文件如下:(文件:test_class_class.htm)

<div class="message"> 
    <span class="profile">Suzy Creamcheese</span> 
    <span class="time">December 22, 2010 at 11:10 pm</span> 
    <div class="msgbody"> 
    <div class="subject">New digs</div> 
     Hello thank you for trying our soap. <BR> Jim. 
    </div> 
    </div> 

    <div class="message reply"> 
    <span class="profile">Lars Jörgenmeier</span> 
    <span class="time">December 22, 2010 at 11:45 pm</span> 
    <div class="msgbody"> 
     I never sold you any soap. 
    </div> 
    </div> 
</div> 

<div class="message"> 
    <span class="profile">Suzy Yogurt</span> 
    <span class="time">December 22, 2010 at 11:10 pm</span> 
    <div class="msgbody"> 
    <div class="subject">No Creamcheese</div> 
     This is not Suzy Creamcheese <BR> Jim. 
    </div> 
    </div> 

    <div class="message reply"> 
    <span class="profile">Suzy Creamcheese</span> 
    <span class="time">December 22, 2010 at 11:45 pm</span> 
    <div class="msgbody"> 
     A reply from Suzy Creamcheese. 
    </div> 
    </div> 
</div> 

</div> 

下面是使用简单的HTML DOM我的测试: 包括( 'simple_html_dom.php');

function getMessage_for_profile($iUrl,$iProfile) 
{ 
    // create HTML DOM 
    $html = file_get_html($iUrl); 

    // get text elements 
    $aoProfile = $html->find('span[class=profile]'); 
    echo "Found ".count($aoProfile)." profiles.<br />"; 

    foreach ($aoProfile as $key=>$oProfile) 
    { 
     if ($oProfile->plaintext == $iProfile) 
     { 
     echo "<b>Profile ".$key.": ".$oProfile->plaintext."</b><br />"; 
// Using $e->next_sibling() 
     $oCurrent = $oProfile; 
     while ($oNext = $oCurrent->next_sibling()) 
     { 
      if ($oNext->class == "msgbody") 
      { 
      echo "<hr />"; 
      echo $oNext->outertext; 
      echo "<hr />"; 
      } 
      $oCurrent = $oNext; 
     } 
     }   
    } 

    // clean up memory 
    $html->clear(); 
    unset($html); 

    return; 
} 
// -------------------------------------------- 
// test it! 
// user_agent header... 
ini_set('user_agent', 'My-Application/2.5'); 

getMessage_for_profile('test_class_class.htm','Suzy Creamcheese'); 
echo "<br /><br /><br />"; 
getMessage_for_profile('test_class_class.htm','Suzy Yogurt'); 

我的产量为:

Found 4 profiles. 
Profile 0: Suzy Creamcheese 
-------------------------------- 
New digs 
Hello thank you for trying our soap. 
Jim. 
--------------------------------- 
Profile 3: Suzy Creamcheese 
--------------------------------- 
A reply from Suzy Creamcheese. 
--------------------------------- 



Found 4 profiles. 
Profile 2: Suzy Yogurt 
--------------------------------- 
No Creamcheese 
This is not Suzy Creamcheese 
Jim. 
--------------------------------- 

看看它是可以用简单的HTML DOM来完成,因为我已经知道DOM是如何工作的?或足够惹上麻烦......我做了不必学习任何已知的语法!