BASH脚本遍历XML文件中的ID列表并将名称打印/输出到shell或输出文件？

我正在寻找遍历XML数据文件中与ID号匹配的ID号列表，并使用BASH（和AWK）将下面的行打印到shell或将其重定向到第三个输出文件（output.txt）BASH脚本遍历XML文件中的ID列表并将名称打印/输出到shell或输出文件？

这里是击穿：

ID_list.txt（缩短这个例子 - 其具有100点的ID）

XML_example.txt（数千个条目的）

<book> 
    <ID>4414</ID> 
    <name>Name of first book</name> 
</book> 
<book> 
    <ID>4561</ID> 
    <name>Name of second book</name> 
</book>

想我的脚本的输出是100个标识的名称从第一个文件：

Name of first book 
Name of second book 
etc

我相信这是可能做到这一点使用bash和AWK一个for循环（每个在文件1，在file2中找到相应的名称）。我认为你可以重新获得身份证号码的GREP，然后使用AWK打印下面的行。即使输出看起来像这样，我可以后删除XML标签：

<name>Name of first book</name> 
<name>Name of second book</name>

这是一个Linux服务器上，但我可以将它移植到PowerShell的Windows上。我认为BASH/GREP和AWK是要走的路。

有人可以帮我脚本吗？

来源

2014-01-21 Mike J

向我们展示你尝试过什么，你有什么问题 - 否则看起来你希望我们为你写信。 – 2014-01-21 17:54:31

Shell和/或awk不是解析XML的正确选择。 – chepner

@ user2062950，您是对的，请不要在发布之前发布我的版本。我正在阅读时使用;在ID_list.txt解决方案中为我做了一个，但下面的Dogbane的解决方案更干净。 –

这里有一种方法：

while IFS= read -r id 
do 
    grep -A1 "<ID>$id</ID>" XML_example.txt | grep "<name>" 
done < ID_list.txt

这里的另一种方式（只有一行）。这是更有效，因为它使用一个单一的grep来提取所有的ID，而不是循环：

egrep -A1 $(sed -e 's/^/<ID>/g' -e 's/$/<\/ID>/g' ID_list.txt | sed -e :a -e '$!N;s/\n/|/;ta') XML_example.txt | grep "<name>"

输出：

<name>Name of first book</name> 
<name>Name of second book</name>

来源

2014-01-21 17:56:45 dogbane

谢谢@dogbane。这两项都按预期工作。我发现第一个更容易阅读，但两者都完全符合我的要求。 –

这个平台上的每个响应都应该像这样。 – intumwa

$ awk ' 
NR==FNR{ ids["<ID>" $0 "</ID>"]; next } 
found { gsub(/^.*<name>|<[/]name>.*$/,""); print; found=0 } 
$1 in ids { found=1 } 
' ID_list.txt XML_example.txt 
Name of first book 
Name of second book

来源

2014-01-21 17:56:22

给定一个ID，您可以使用XPath Xpressions的和xmllint获取名称命令，像这样：

id=4414 
name=$(xmllint --xpath "string(//book[ID[text()='$id']]/name)" books.xml)

这个

所以，你可以写这样的：

while read id; do 
    name=$(xmllint --xpath "string(//book[ID[text()='$id']]/name)" books.xml) 
    echo "$name" 
done < id_list.txt

与涉及awk，grep和朋友的解决方案不同，这是使用实际的XML解析工具。这意味着，虽然如果他们遇到的大多数其他的解决方案可能会破坏：

<book><ID>4561</ID><name>Name of second book</name></book>

...这会工作得很好。

xmllint是libxml2程序包的一部分，可用于大多数发行版。

还要注意最近版本的awk有native XML parsing。

来源

2014-01-21 18:00:27 larsks

我会去的BASH_REMATCH途径，如果我不得不这样做在bash

BASH_REMATCH 
      An array variable whose members are assigned by the =~ binary 
      operator to the [[ conditional command. The element with index 
      0 is the portion of the string matching the entire regular 
      expression. The element with index n is the portion of the 
      string matching the nth parenthesized subexpression. This vari‐ 
      able is read-only.

因此，像下面

#!/bin/bash 

while read -r line; do 
    [[ $print ]] && [[ $line =~ "<name>"(.*)"</name>" ]] && echo "${BASH_REMATCH[1]}" 

    if [[ $line == "<ID>"*"</ID>" ]]; then 
    print=: 
    else 
    print= 
    fi 
done < "ID_list.txt"

示例输出

> abovescript 
Name of first book 
Name of second book

来源

2014-01-22 10:29:17 BroSlow

BASH脚本遍历XML文件中的ID列表并将名称打印/输出到shell或输出文件？

回答

相关问题