从html中提取文本？

我有一个字符串，如下从html中提取文本？

<p>&nbsp;Hello World, this is StackOverflow&#39;s question details page</p>

我想从上面的HTML中提取文本的，我想删除 以及Hello World, this is StackOverflow's question details page通知。

我们如何在PHP中实现这一点，我尝试了几个函数，strip_tags，html_entity_decode等，但都在某些条件下失败。

请帮忙，谢谢！

编辑我的代码，我想是如下，但它不工作:(它的叶子 和'这种类型的字符。

$TMP_DESCR = trim(strip_tags($rs['description']));

来源

2011-02-02 Prashant

什么条件，不要离开我们猜测！？ – 2011-02-02 11:45:08

正如@jakenoble所说，如果你发布你的示例代码和输出和错误将会有所帮助。 – diagonalbatman 2011-02-02 11:46:21

如果显示的字符串是完整的HTML页面或包含附加标记的较大片段的一部分，请参阅[最佳方法解析HTML]（http://stackoverflow.com/questions/3577641/best-methods-to-parse- html/3577662＃3577662） – Gordon 2011-02-02 11:47:51

下面为我工作......不得不在非做str_replace尽管如此，空间还是很大。

$string = "<p>&nbsp;Hello World, this is StackOverflow&#39;s question details page</p>"; 
echo htmlspecialchars_decode(trim(strip_tags(str_replace('&nbsp;', '', $string))), ENT_QUOTES);

来源

2011-02-02 12:06:41

strip_tags()将摆脱的标签，并trim()应该摆脱空白的。我不知道这是否会与非打破空间的工作，虽然。

来源

2011-02-02 11:45:54 sevenseacat

首先，你必须呼吁HTML装饰（）删除空白。http://php.net/manual/en/function.trim.php

然后strip_tags，然后html_entity_decode。

所以：html_entity_decode(strip_tags(trim(html)));

来源

2011-02-02 11:46:24

可能做到这一点的最好的和最可靠的方法是用真正的（X | HT）ML解析像DOMDocument类功能：

<?php 

$str = "<p>&nbsp;Hello World, this is StackOverflow&#39;s question details page</p>"; 

$dom = new DOMDocument; 
$dom->loadXML(str_replace('&nbsp;', ' ', $str)); 

echo trim($dom->firstChild->nodeValue); 
// "Hello World, this is StackOverflow's question details pages"

这是可能略有矫枉过正这个问题，但使用适当的解析功能是一个很好的习惯。

编辑：您可以重用DOMDocument对象，所以你只需要在循环中两行：

$dom = new DOMDocument; 
while ($rs = mysql_fetch_assoc($result)) { // or whatever 
    $dom->loadHTML(str_replace('&nbsp;', ' ', $rs['description'])); 
    $TMP_DESCR = $dom->firstChild->nodeValue; 

    // do something with $TMP_DESCR 
}

来源

2011-02-02 11:52:07 lonesomeday

从html中提取文本？

回答

相关问题