如何让groovy/XMLSlurper从节点剥离html标签？

我正在从POST响应中读取HTML文件并使用XMLSlurper解析它。页面上的textarea节点会放入一些HTML代码（非urlencoded - 不是我的选择），当我读取该值时，Groovy会去除所有标记。如何让groovy/XMLSlurper从节点剥离html标签？

例子：

<html> 
    <body> 
     <textarea><html><body>This has html code for some reason</body></html></textarea> 
    </body> 
</html>

当我分析上面的，然后找到（...）的 “文本域” 节点，它将返回给我：

This has html code for some reason

和标记都没有。我如何保留标签？

来源

2012-03-14 Don Rhummy

你可以把textarea块放在CDATA中吗？ – traneHead 2012-03-15 08:39:45

我想你会得到正确的数据，但打印出来错了......你可以尝试使用StreamingMarkupBuilder将节点转换回一片xml吗？

def xml = '''<html> 
      | <body> 
      | <textarea><html><body>This has html code for some reason</body></html></textarea> 
      | </body> 
      |</html>''' 

def ta = new XmlSlurper().parseText(xml).body.textarea 

String content = new groovy.xml.StreamingMarkupBuilder().bind { 
    mkp.yield ta.children() 
} 

assert content == '<html><body>This has html code for some reason</body></html>'

来源

2012-03-15 09:12:30

如何让groovy/XMLSlurper从节点剥离html标签？

回答

相关问题