如何提取特定的标签，并从XML使用SimpleXML

取代其他

我在写我自己的博客在PHP中，我希望能够写在降价的帖子显示HTML的结果，我也需要做一些定制的东西与HTML。如何提取特定的标签，并从XML使用SimpleXML

有一个简单的脚本，可以降价转换为HTML，但是我需要做对HTML有些事情一旦完成：

我需要通过更换预标签内的所有HTML符号htmlentities（）函数。（在我的博客中，我发布了包含HTML的代码，但我只想显示这个HTML，而不是在浏览器中解析它）。
我需要提取所有纯文本，以便在末尾创建不包含图像标记或半标记（或代码中的代码块）的摘录。

我想我用下面的代码有针对问题2的解决方案：

$xml = new SimpleXMLElement('<xml>' . $html . '</xml>');

$ XML现在看起来是这样的：

<xml> 
    <p>some random text</p> 
    <img src='image.jpg'> 
    <p>some random text</p> 
</xml>

这种提取所有文字：

foreach($xml->{'p'} as $p){ 
echo $p . '<hr>'; 
}

这个工作，但是我还希望它包含在ul和ol中找到的所有文本（与它们在XML中出现的顺序相同）。我搜索了一种方法来循环$ xml的所有子元素，但是我找不到如何检查元素是p，ul还是ol。

我不能找到一种方法来解决问题1，因为我不知道如何替换XML对象中的内容，但在离开一切完好。（还是我失去了一些东西完全地明显？）

来源

2012-01-10 askmike

可能重复（HTTP：//计算器。com/questions/8772348/replacement-end-div-tags-using-preg-replace-callback-function） - 另请参见：http://php.net/dom_import_simplexml – hakre 2012-01-10 12:32:45

还有很多其他问题和答案也是如此。我建议你使用搜索。 – hakre 2012-01-10 12:42:22

很多搜索我找不到正是我一直在寻找一个使用XML解析，除了我需要一对夫妇的其他功能后。我解决了REGEX的问题，因为所有的HTML都是由我生成的。

所以在这里提供的解决方案解决了我原来的问题+更多的问题。

此功能需要一段内容（串），并返回一个几个字符串：

MD =一样与anglebrackets里面的内容前的改变他们的HTML实体（我在博客HTML，但在我编辑后屏幕，我不想里面前的被解析的HTML。
HTML =外前的一切都是Markdown'd和内部预的每一个htmlchar改变他们的HTML实体。
摘录=文本缩小接近300个字符，没有任何预标签（或c那些内容），没有markdown语法或html标记。

meta =与160个字符的摘录相同。 [使用函数preg_replace_callback更换端div标签]的

function prepareContent($content) { 

    // I use this instead of htmlentities for the plain text, this prevents HTML to be parsed inside the edit screen 
    // all HTML is served with htmlentities instead 
    function removeAngleBrackets($str) { 
     $str = str_replace('<','&lt;',$str); 
     $str = str_replace('>','&gt;',$str); 
     return $str; 
    } 

    $segments = preg_split('/(<\/?pre.*?>)/', $content, -1, PREG_SPLIT_DELIM_CAPTURE); 

    // STATE MACHINE 
    // borrowed from: http://stackoverflow.com/questions/1278491/howto-encode-texts-outside-the-pre-pre-tag-with-htmlentities-php#answer-1278575 

    // this breaks when I nest pre's in pre's (unless I escape the <pre> myself), could be fixed though 

    // $state = 0 if outside of a pre 
    // $state = 1 if inside of a pre 
    $state = 0; 

    $plaintext = ''; 
    $html = ''; 
    $preless = ''; 

    // $html, $plaintext and $preless are all written in here 
    foreach ($segments as &$segment) { 
     if ($state == 0) { 
      if (preg_match('#<pre[^>]*>#i',$segment)) { 
       //this is the pre opening tag 
       $state = 1; 
       $html .= $segment; 
       $plaintext .= $segment; 
      } else { 
       //this is outside the pre tag 
       $plaintext .= $segment; 
       $markdown = Markdown($segment); 
       $html .= $markdown; 
       $preless .= $markdown; 
      } 
     } else if ($state == 1) { 
      if ($segment == '</pre>') { 
       //this is the pre closing tag 
       $state = 0; 
       $html .= $segment; 
       $plaintext .= $segment; 
      } else { 
       //this is inside the pre tag 
       $plaintext .= removeAngleBrackets($segment); 
       // first encode &gt; to > so I can re encode it together with other chars 
       // else we get double encoding like: $amp;gt; 
       $enti = html_entity_decode($segment); 
       $html .= htmlspecialchars($enti, ENT_QUOTES); 
      } 
     } 
    } 

    $arr['html'] = SmartyPants($html); 
    $arr['md'] = $plaintext; 

    //      the excerpt & meta 

    // remove all html tags (markdown is already converted to HTML) 
    $tagless = strip_tags($preless); 

    function shrinkText($str, $limit) { 
     $strlen = strlen($str); 
     if($strlen > $limit) { 
      $pos = strpos($str, ' ', $limit); 
      if($strlen > $pos) { 
       $result = substr($str,0,$pos); 
      } 
     } 
     return $result ? $result : $str; 
    } 

    // I need to smartypants the excerpt to 
    $excerpt = shrinkText($tagless, 275) . ' (...)'; 
    $arr['excerpt'] = SmartyPants($excerpt); 

    $arr['meta'] = shrinkText($tagless, 160); 

    return $arr; 
}

来源

2012-01-13 11:56:40 askmike

如何提取特定的标签，并从XML使用SimpleXML

回答

相关问题