2014-01-27 51 views
3

我有下面的XML:消除重复,更改标签与scala.xml.transform.RuleTransformer

<tree> 
    <leaf id="1"/> 
    <leaf id="1"/> 
</tree> 

我希望做的是摆脱重复<leaf/> S的(整个XML文档) ,并配有单<new-leaf/>像这样替换它们:

<tree> 
    <new-leaf id="1"/> 
</tree> 

我已经写了下面的RewriteRule,我相信应该已经完成​​了这个(原谅有状态):

import scala.xml._ 
import scala.xml.transform._ 

class UniqueLeaves extends RewriteRule { 

    var leafIds = Set.empty[String] 

    override def transform(node: Node): Seq[Node] = node match { 
    case e: Elem if ((e.label == "leaf") && !leafIds.contains((e \\ "@id").text)) => { 
     leafIds += (e \\ "@id").text 
     <new-leaf id={(e \\ "@id")} /> 
    } 
    case e: Elem if (e.label == "leaf") => Seq.empty 
    case _ => node 
    } 

} 

不幸的是,使用RuleTransformer给了我下面的:

scala> val tree = <tree><leaf id="1"/><leaf id="1"/></tree> 
scala> println(new RuleTransformer(new UniqueLeaves).transform(tree)) 
<tree/> 

我假定这是因为RuleTransformer calls transform on the RewriteRule multiple times,并且使用输出非第一次调用<new-leaf>节点,它返回一个空Seq在我的比赛中。

希望有关使这项工作(以及无国籍)的任何提示。

回答

2

对于类似的问题的人,我已经找到了以下解决方案:

def removeDuplicates(tree: Node): Node = { 
    var ids = Set.empty[String] 
    def recurse(node: Node): Seq[Node] = node match { 
    case e: Elem if (e.label == "leaf") => { 
     val id = (e \\ "@id").text 
     ids.contains(id) match { 
     case true => Seq.empty 
     case _ => { 
      ids = ids + id 
      <new-leaf id={id}/> 
     } 
     } 
    } 
    case e: Elem => e.copy(child = e.nonEmptyChildren.map(recurse(_).headOption).flatten) 
    case _ => node 
    } 
    recurse(tree).head 
} 

这工作,因为它手动处理遍历节点,不使用RuleTransformer#transform,因此在相同的节点不重复多曾经(尽管它仍然是有状态的,不幸的是)。