2012-01-13 28 views
0

我有一个文件“frequencies.xml”,其包含与此表格线:取下xml文件行,如果包含相同字(perl的)

<?xml version="1.0"?> 
<!DOCTYPE stationlist PUBLIC "-//xxxxx//DTD stationlist 1.0//EN" "http://xxxxxxxxx/DTD/xxxxxxxx.dtd"> 
<frequencies xmlns="http://xxxxxxxxxxxxxxxx/DTD/"> 
<list norm="PAL" frequencies="Custom" audio="bg"> 
.............................................................. 
<station name="A" active="1" channel="48.25MHz" norm="PAL"/> 
<station name="B" active="1" channel="55.25MHz" norm="PAL"/> 
<station name="C" active="1" channel="62.25MHz" norm="PAL"/> 
<station name="D" active="1" channel="112.25MHz" norm="PAL"/> 
.............................................................. 
<station name="E" active="1" channel="119.25MHz" norm="PAL"/> 
<station name="F" active="0" channel="48.25MHz" norm="PAL"/> 
.............................................................. 
<station name="G" active="1" channel="55.25MHz" norm="PAL"/> 
<station name="H" active="0" channel="62.25MHz" norm="PAL"/> 
.............................................................. 
    </list> 
</frequencies> 

我想删除线视为重复,如果包含具有相同频率的其他线路。

输出结果:

<station name="A" active="1" channel="48.25MHz" norm="PAL"/> 
<station name="B" active="1" channel="55.25MHz" norm="PAL"/> 
<station name="C" active="1" channel="62.25MHz" norm="PAL"/> 
<station name="D" active="1" channel="112.25MHz" norm="PAL"/> 
<station name="E" active="1" channel="119.25MHz" norm="PAL"/> 

我写的脚本来做到这一点:

for i in `cat frequencies.xml | sed 's/.*channel="\([^"]*\)".*/\1/; /</ d' |grep MHz`; do 
cat frequencies.xml | awk -v i="channel=\"$i" ' 
    BEGIN  { a=0 } 
    $0 ~ i  { if (a == "1") { print i"\" - duplicate" > "/dev/stderr" ; next ;} ; a=1 } 
      { print $_ }' > frequencies.xml.tmp && \ 
mv frequencies.xml.tmp frequencies.xml 
done 

如何在Perl语言调换呢?

谢谢

更新:我想保留XML结构。

我的代码:

open (FH, "+< frequencies.xml") or die "Opening: $!"; 
my $out = ''; 
my %seen =(); 
foreach my $line (<FH>) { 
    if ($line =~ m/<station/) { 
     my ($freq) = ($line =~ m/channel="([^"]+)"/); 
      $out .= $line unless $seen{$freq}++; 
    } else { 
     $out .= $line; 
    } 
} 
seek(FH,0,0)     or die "Seeking: $!"; 
print FH $out     or die "Printing: $!"; 
truncate(FH, tell(FH))   or die "Truncating: $!"; 
close(FH)      or die "Closing: $!"; 

回答

3

保持哈希来跟踪你所看到的频率,如果你已经看到了,不散发行:

更新

如果还有其他线路要保留,您只需要打印它们。最简单的方法可能只是做测试,如果它是一个<station>元素,并打印所有其他内容......但一旦你开始变得比这更复杂,你可能想要使用真正的XML Parsers之一。因此,使用扎伊德的建议:使用一个行脚本

open INPUT, '<', 'frequencies.xml' or die "Can't read file : $!"; 
my %seen =(); 
foreach my $line (<INPUT>) { 
    if ($line =~ m/<station/) { 
     my ($freq) = ($line =~ m/channel="([^"]+)"/); 
     print $line unless $seen{$freq}++; 
    } else { 
     print $line; 
    } 
} 
close INPUT; 
+1

它工作正常。谢谢。但如何保持XML标题? – user1148015 2012-01-13 16:52:25

+1

'print $ line除非$ seen {$ freq} ++;'也可以使用 – Zaid 2012-01-13 16:54:14

0

方式一:

perl -ne '($freq) = m/(?i)channel="([^"]+)/; print unless exists $arr{ $freq }; $arr{ $freq } = 1' infile 
0
open(IN, '<', 'frequencies.xml') or die; 
while ($inline = <IN>) { 
    $inline =~ /([\d.]+)MHz/; 
    $freq = $1; 
    push(@out, $inline) unless (grep(/$freq/, @out)); 
} 
print "@out\n"; 
+0

不需要在字符类内反斜线:'/([\ d。] +)MHz /' – pilcrow 2012-01-13 16:21:51

0
$ perl -pi.tmp -ale '$_="" if $seen{ $F[2] }++' frequencies.xml 
0

使用XML :: XSH2:

use XML::XSH2; 
xsh q{ 
    open so-8853324.xml; 
    $ch := hash @channel //station; 
    for { keys %$ch } ls xsh:lookup("ch", .)[1]; 
}; 

我删除来自数据的名称空间以简化代码。

相关问题