自动检测/解析XML中的重复元素（'行对象'）

我试图一般性地编写XML解析器来消费未知模式的提要。基本上，我想对“行”在XML文档中的位置做出最佳猜测。这里有两个例子提要：自动检测/解析XML中的重复元素（'行对象'）

饲料1，例如：

<xml> 
    <some-container-tag> 
    <some-row-tag> 
     <attribute-1>value</attribute-1> 
     <attribute-2>value</attribute-2> 
     <attribute-3>value</attribute-3> 
     <attribute-4>value</attribute-4> 
    </some-row-tag> 
    <some-row-tag> 
     <attribute-1>value</attribute-1> 
     <attribute-2>value</attribute-2> 
     <attribute-3>value</attribute-3> 
     <attribute-4>value</attribute-4> 
    </some-row-tag> 
    ... 
    </some-container-tag> 
</xml>

饲料2，例如：

<xml> 
    <some-container-tag> 
    <some-row-tag> 
     <attribute-1>value</attribute-1> 
     <attribute-2>value</attribute-2> 
     <attribute-3>value</attribute-3> 
     <attribute-4>value</attribute-4> 
     <optional-nested-attribute-set> 
     ... 
     </optional-nested-attribute-set> 
    </some-row-tag> 
    <some-row-tag> 
     <attribute-1>value</attribute-1> 
     <attribute-2>value</attribute-2> 
     <attribute-3>value</attribute-3> 
     <attribute-4>value</attribute-4> 
     <optional-nested-attribute-set> 
     ... 
     </optional-nested-attribute-set> 
    </some-row-tag> 
    ... 
    </some-container-tag> 
    <some-other-container-tag> 
    <some-row-tag> 
     <attribute-1>value</attribute-1> 
     <attribute-2>value</attribute-2> 
     <attribute-3>value</attribute-3> 
     <attribute-4>value</attribute-4> 
     <optional-nested-attribute-set> 
     ... 
     </optional-nested-attribute-set> 
    </some-row-tag> 
    </some-other-container-tag> 
</xml>

我所做的到目前为止是横贯结构和地图的XPath来一个计数，例如第一进是这样的：

xml => 1 
xml/some-container-tag => 1 
xml/some-container-tag/some-row-tag => n 
xml/some-container-tag/some-row-tag/attribute-1 => n 
xml/some-container-tag/some-row-tag/attribute-2 => n 
xml/some-container-tag/some-row-tag/attribute-3 => n 
xml/some-container-tag/some-row-tag/attribute-4 => n

现在我的想法是，“基本单位”（行级）将是最低级的非叶节点，虽然我有问题（独奏开发在这里）审查这个想法。

当然，feed 2的'更'更复杂，可能有嵌套的属性（基本上是子数组），也可能有两个父列表。

这里有什么足够好的通用方法？

来源

2017-02-14 whistler

你的问题是你试图将多维树结构转换为二维表格结构。没有一个模式，你没有一个好方法来确保你的假设是正确的，但是如果你必须这样做，你必须提出一些假设。

您可以通过深入的层次，而不是节点的数量在一个特定的深度接近它（还有什么可说的，所有的叶节点将在相同的深度，你正在运行到现在的问题）：

深度0（根标记）指示的数据结构的一个新的集合
深度1（some-container-tag）指示新的二维结构
深度2（some-row-tag）表示在两维结构的新行
深度3+表示进入该行的行，其本身可能具有子条目。也许这些被表示为CSV字符串，或者作为指向另一个数组/表格的数据结构的指针 - 但是如果你开始添加，那么你不再真正处理二维结构。

所有这些都取决于您最终需要处理的数据以及您选择处理它的语言的哪些假设是有效的。无论哪种方式，您都可能最好通过深度而不是计数来解析这一点。另外，如果这确实是无模式的，那么您可能需要考虑如何处理XML中显示的属性。

来源

2017-02-14 14:44:16

自动检测/解析XML中的重复元素（'行对象'）

回答

相关问题