使用php获取HTML标签内容

基本上我使用php file_get_contents()从URL获取内容。使用php获取HTML标签内容

获得页面的源代码后，我从网页源

<div class="span2 box-product" data-store="kimstore" data-product-id="cpnYKmW6D5" data-product-title="Nokia-900-Lumia"> 
<a href="/Nokia-900-Lumia/p-cpnYKmW6D5" title="Nokia 900 Lumia Php 14,300"> 
    <img src="https://m-md.s3.amazonaws.com/storefront/kimstore/media/46/68/2d/99/68159647b67e5b1a2d124f9-120x90" width="120" height="90" title="Nokia 900 Lumia Php 14,300" alt="Nokia 900 Lumia Php 14,300" /> 
</a> 
<p class="title"> 
    <a href="/Nokia-900-Lumia/p-cpnYKmW6D5" title="Nokia 900 Lumia Php 14,300"> 
     Nokia 900 Lumia 
    </a> 
</p> 
<p class="price">Php 14,300</p> 
<p class="shop"> 
    <a href="/kimstore" title="kimstore">kimstore</a> 
</p> 
</div>

所以，我有<div class="span2 box-product"和结束标记之间，以获得这部分数据得到的数据。

，然后在那之后，我必须从中得到3个数据： 1.数据存储 2.数据产品称号 3.价格

我一直在使用regex尝试，但没有运气这种方式。任何建议做什么，技术使用？提前致谢。

来源

2013-04-30 Jayson Obado

http://php.net/dom – DaveRandom 2013-04-30 08:59:15

什么你正在尝试做的可能是：a）侵犯版权并二）错误的方法。 IANAL。 – PointedEars 2013-04-30 09:12:14

可能重复的[如何解析和处理HTML/XML？]（http://stackoverflow.com/questions/3577641/how-to-parse-and-process-html-xml） – Quentin 2013-04-30 09:21:16

使用SimpleXML，您可以将属性和DOM作为PHP对象来访问。从的file_get_contents结果传递给SimpleXML的，就像这样：

$str = file_get_contents($url); 
$xml = simplexml_load_string($string);

http://in1.php.net/manual/en/class.simplexmlelement.php http://in1.php.net/manual/en/simplexml.examples-basic.php

来源

2013-04-30 09:05:43 Adil

我会试试这个，但告诉你，我没有得到XML网址，但HTML – 2013-04-30 09:09:12

只要HTML片段是有效的，它会起作用。 – Adil 2013-04-30 09:19:38

使用DOM扩展（或的SimpleXML如果您解析XHTML文档）。如果你的文件不是有效的XML

http://php.net/manual/en/book.dom.php

http://php.net/manual/en/book.simplexml.php

的SimpleXML可能会失败。

另外，您应该了解xPath以快速访问任何DOM节点。

来源

2013-04-30 09:11:22 SlyChan

只用HTML代码，您所提供的这种解决方案将工作：

<?php 
$html = <<<HTML 
<div class="span2 box-product" data-store="kimstore" data-product-id="cpnYKmW6D5" data-product-title="Nokia-900-Lumia"> 
    <a href="/Nokia-900-Lumia/p-cpnYKmW6D5" title="Nokia 900 Lumia Php 14,300"> 
     <img src="https://m-md.s3.amazonaws.com/storefront/kimstore/media/46/68/2d/99/68159647b67e5b1a2d124f9-120x90" width="120" height="90" title="Nokia 900 Lumia Php 14,300" alt="Nokia 900 Lumia Php 14,300" /> 
    </a> 
    <p class="title"> 
     <a href="/Nokia-900-Lumia/p-cpnYKmW6D5" title="Nokia 900 Lumia Php 14,300"> 
      Nokia 900 Lumia 
     </a> 
    </p> 
    <p class="price">Php 14,300</p> 
    <p class="shop"> 
     <a href="/kimstore" title="kimstore">kimstore</a> 
    </p> 
</div> 
HTML; 

$sxe  = new SimpleXMLElement($html); 
$attributes = $sxe->attributes(); 
$data_store = trim((string) $attributes['data-store']); 
$title  = trim((string) $sxe->p[0]->a); 
$price  = trim((string) $sxe->p[1]); 

echo "{$data_store}\n{$title}\n{$price}\n";

来源

2013-04-30 09:19:53

但我必须从整个html中获取数据。 – 2013-04-30 09:26:16

此代码片段应该足以让您了解SimpleXML解析如何工作。只需加载整个HTML并导航到对象中，直到达到预期的块。 – 2013-04-30 09:44:07

使用php获取HTML标签内容

回答

相关问题