2013-03-08 80 views
0

我有一个.xml文件,其中我必须搜索“<reviseddate>”标记。它可以在文件中出现多次。如果是这样我不得不更换“<reviseddate>”标记为“<reviseddate1>”我需要为这个用于使用递增值替换字符串的shell脚本

文本的样本是一个shell脚本如下:

Manuscript received <receiveddate>June 7, 2005</receiveddate>; revised    
<reviseddate> February 4, 2006 </reviseddate>, <reviseddate> August 14, 2006 </reviseddate>, 
and <reviseddate> October 7, 2006 </reviseddate>. This work was supported by the 
<supported><agency-name>California Department of Transportation through the California 
Center for Innovative Transportation and the California Partners for Advanced Highway 
and Transit Program</agency-name><grant-grp/></supported>. The contents of this paper 
reflect the views of the authors and do not necessarily indicate acceptance by the 
sponsors. The Associate Editor for this paper was M. M. Sokoloski.</affnote-para> 

输出应该如下

Manuscript received <receiveddate> June 7, 2005 <receiveddate>; revised    
<reviseddate1> February 4, 2006 </reviseddate1>, <reviseddate2> August 14, 2006 </reviseddate2>,   
and <reviseddate3> October 7, 2006 </reviseddate3>. This work was supported by the 
<supported><agency-name>California Department of Transportation through the California 
Center for Innovative Transportation and the California Partners for Advanced Highway 
and Transit Program</agency-name><grant-grp/></supported>. The contents of this paper 
reflect the views of the authors and do not necessarily indicate acceptance by the 
sponsors. The Associate Editor for this paper was M. M. Sokoloski.</affnote-para> 

我已经试过:

for i in $c do 
    sed -e "s/<reviseddate>/<reviseddate$i>/g" $path/$input_file > $path/input_new.xml 
    cp $path/input_new.xml $path/$input_file 
    rm -f input_new.xml 
done 
+0

请解码您的问题。 – Anubhab 2013-03-08 07:47:18

+0

for i in $ c do sed -e“s///g”$ path/$ input_file> $ path /input_new.xml cp $ path /input_new.xml $ path/$ input_file rm -f input_new.xml done – Mallik 2013-03-08 08:19:01

+0

使用XML解析器;他们可用于多种语言。 – chepner 2013-03-08 16:00:31

回答

0

我会使用一个Perl脚本样T他做的工作:

#!/usr/bin/env perl 
use strict; 
use warnings; 

my $i = 1; 
while (<>) 
{ 
    while (m%<reviseddate>([^<]+)</reviseddate>%) 
    { 
     s%<reviseddate>([^<]*)</reviseddate>%<reviseddate$i>$1</reviseddate$i>%; 
     $i++; 
    } 
    print; 
} 

对于每一行,每一个门牌<reviseddate>标签,请用适当的编号标签的标签。

输出示例:

Manuscript received <receiveddate>June 7, 2005</receiveddate>; revised    
<reviseddate1> February 4, 2006 </reviseddate1>, <reviseddate2> August 14, 2006 </reviseddate2>, 
and <reviseddate3> October 7, 2006 </reviseddate3>. This work was supported by the 
<supported><agency-name>California Department of Transportation through the California 
Center for Innovative Transportation and the California Partners for Advanced Highway 
and Transit Program</agency-name><grant-grp/></supported>. The contents of this paper 
reflect the views of the authors and do not necessarily indicate acceptance by the 
sponsors. The Associate Editor for this paper was M. M. Sokoloski.</affnote-para> 

您可以调整该处理的替代方案,例如在一行的开始标记和结束标记上的下一个。在你需要之前,没有必要大惊小怪。使用正则表达式是一门艺术。您需要平衡紧迫需求与所有可能情景的弹性。


因为Perl显然不是“外壳”(但sed是),你可以安排处理文件往往不足以发现所有的条目,并改变它们。

tmp=$(mktemp ./revise.XXXXXXXXXXXX) 
trap "rm -f $tmp; exit 1" 0 1 2 3 13 15 

i=1 
while grep -s '<reviseddate>' filename 
do 
    sed "1,/<reviseddate>/ s%<reviseddate>\([^<]*\)</reviseddate>%<reviseddate$i>\1</reviseddate$i>%" filename > $tmp 
    mv $tmp filename 
    i=$(($i+1)) 
done 

rm -f $tmp # Should be a no-op 
trap 0 

这反复更新文件。 1,/<reviseddata>部分确保只有第一个<reviseddate>标签更新(s%%%命令中没有g,这很重要)。陷阱代码确保临时文件不会被留下。

这对您的示例数据起作用,给出相同的输出。对于小文件,这很好。如果你正在管理多千兆字节的文件,Perl会更好,因为它只处理一次文件。

+0

谢谢你,但它如何在shell.Any帮助这个.. – Mallik 2013-03-08 08:55:48

相关问题