获取所有前面的/下面的兄弟文本内容

考虑下面的XML：获取所有前面的/下面的兄弟文本内容

<paratext ID="p34"><bold>pass</bold> <bold>pass</bold></paratext> 
<paratext ID="p35"><bold>pass</bold></paratext> 
<paratext ID="p36">foo <bold>pass</bold> bar</paratext> 
<paratext ID="p37">foo<bold> pass </bold>bar</paratext> 
<paratext ID="p38"><bold>fail</bold><bold>fail</bold></paratext> 
<paratext ID="p39">foo<bold>fail</bold>bar</paratext>

P34应该通过，因为有大胆标签的字母之间非阿尔法
P35应该通过，因为没有字母字符上大胆标签外
P36应该通过，因为有大胆的文字等文本
P37之间的非阿尔法应通过，因为有大胆的文字等文本
P38应该失败，因为它们之间的非阿尔法在t之间没有字母字符他大胆字母字符
P39应该失败，因为有大胆的文字和“富”或“ - ”之间没有字母字符

我试图通过Schematron的做到这一点一直是这样的：

<iso:rule context="//jd:csc|//jd:bold|//jd:ital|//jd:underscore"> 
<iso:assert test=" 
    string-length(preceding-sibling::text()) = 0 
    or  
    matches(substring(preceding-sibling::text(), string-length(preceding-sibling::text())), '[^a-zA-Z]') 
    or 
    matches(substring(.,1,1), '[^a-zA-Z]') 
    "> 
    {WS1046} An .alpha character cannot both immediately preceed and follow &lt;<iso:value-of select="name()"/>&gt; tag 
</iso:assert> 
<iso:assert test=" 
    string-length(following-sibling::text()) = 0 
    or 
    matches(substring(following-sibling::text(), 1,1), '[^a-zA-Z]') 
    or 
    matches(substring(., string-length(.)), '[^a-zA-Z]') 
    "> 
    {WS1046} An .alpha character cannot both immediately preceed and follow &lt;/<iso:value-of select="name()"/>&gt; tag 
</iso:assert> 
</iso:rule>

的问题在于它仅查看当前上下文的父级的直接子文本节点。因此，p38不会失败，因为没有直接的子文本节点。此外，类似b<foo>bar <bold>pass</bold>会失败，因为它只会看到preceding-sibling::text()中的“b”，并且看不到"foo "。

我也尝试::*/text()而不是::text()，但后来我遇到了类似的问题，因为我只看到兄弟元素内的文本，并没有得到直接兄弟文本节点。我需要把这两件事情结合在一起，有谁知道如何？

例如，在此xml：

<paratext ID="p1">hello <foo>bar</foo> <bold>THIS</bold> <foo>bar</foo>goodbye</paratext>

当上下文规则命中<bold>THIS</bold>并检查前，我想它看到"hello bar "和检查以下时，我想它看" bargoodbye"。

来源

2013-11-22 smerny

使用XPath 2.0（这你好像你用matches使用），那么你可以使用：

string-join(preceding-sibling::node(), '')

得到"hello bar "，并且：

string-join(following-sibling::node(), '')

得到" bargoodbye"。

上述各行假定您只有元素和文本节点为兄弟。如果可以有评论和/或处理说明，并且您想忽略其内容为这些规则，您可以使用：

string-join(preceding-sibling::* | preceding-sibling::text(), '')

来源

2013-11-22 17:02:57

获取所有前面的/下面的兄弟文本内容

回答

相关问题