使用perl脚本分割xml文件

嗨我正在使用Perl脚本将Big xml分割为小块。我已经审阅此链接 Split file by XML tag 使用perl脚本分割xml文件

，我的代码是这样的

if($line =~ /^</row>/) 
{ 
$count++; 
}

但即时得到这个错误

works\filesplit.pl line 20. 
Bareword found where operator expected at E:\Work\perl works\filesplit.pl line 2 
0, near "/^</row" 
     (Missing operator before row?) 
syntax error at E:\Work\perl works\filesplit.pl line 20, near "/^</row" 
Search pattern not terminated at E:\Work\perl works\filesplit.pl line 20.

谁能帮我

更新

<row> 
    <date></date> 
    <ForeignpostingId /> 
    <country>11</country> 
    <domain>http://www.xxxx.com</domain> 
    <domainid>20813</domainid> 
</row> 
<row> 
    <date></date> 
    <ForeignpostingId /> 
    <country>11</country> 
    <domain>http://www.xxxx.com</domain> 
    <domainid>20813</domainid> 
</row> 
<row> 
    <date></date> 
    <ForeignpostingId /> 
    <country>11</country> 
    <domain>http://www.xxxx.com</domain> 
    <domainid>20813</domainid> 
</row>

来源

2013-11-28 Backtrack

你想如何分割这个文件，你想用这些块做什么？ – Kenosis

@Kenosis ...“Five” ........将在单个文件中被分块 – Backtrack

@Kenosis ..其实我的文件太大了所以我希望它被分块5 .. in单个文件... ... ....这样的 – Backtrack

您需要^<\/row>，前提是您试图在行的开头匹配</row>。这是我的测试代码。

#!/usr/bin/perl 
use strict; 
use warnings; 

my $line = "</row> something"; 
if ($line =~ /^<\/row>/) 
{ 
    print "found a match \n"; 
}

OUTPUT：

# perl test.pl 
found a match

更新

发布此更新OP提供的样本数据之后。

你需要在你的正则表达式中使用^\s+<\/row>，因为它们并不都是从行首开始的。其中一些人在他们之前有one space。因此在进行实际匹配之前，我们需要在行的开头匹配零个或多个空格。

代码：

#!/usr/bin/perl -w 
use strict; 
use warnings; 

while (my $line = <DATA>) 
{ 
    if ($line =~ /^\s+<\/row>/) 
    { 
     print "found a match \n"; 
    } 
} 

__DATA__ 
<row> 
    <date></date> 
    <ForeignpostingId /> 
    <country>11</country> 
    <domain>http://www.xxxx.com</domain> 
    <domainid>20813</domainid> 
</row> 
<row> 
    <date></date> 
    <ForeignpostingId /> 
    <country>11</country> 
    <domain>http://www.xxxx.com</domain> 
    <domainid>20813</domainid> 
</row> 
<row> 
    <date></date> 
    <ForeignpostingId /> 
    <country>11</country> 
    <domain>http://www.xxxx.com</domain> 
    <domainid>20813</domainid> 
</row>

输出：

# perl test.pl 
found a match 
found a match 
found a match

来源

2013-11-28 05:19:44 slayedbylucifer

但是，如果我们有多个它不起作用 – Backtrack

你能否提供样品数据来处理？ – slayedbylucifer

+1。是的，我添加了 – Backtrack

你试过xml_split？这是一个与XML::Twig一起使用的工具，它基于各种条件（标签名称，级别，大小）专门设计用于拆分大型XML文件。

来源

2013-11-28 06:07:24 mirod

或许下面会有所帮助：

use strict; 
use warnings; 

my $i = 1; 
local $/ = '<row>'; 

while (<>) { 
    chomp; 
    s!</row>!! or next; 

    open my $fh, '>', 'File_' . (sprintf '%05d', $i++) . '.xml' or die $!; 
    print $fh $_; 
}

用法：perl script.pl inFile.xml

这台Perl的记录分隔$/到<row>读取XML文件中的<row>分隔那些 '块'。它从块中删除</row>，然后将该块写入具有“File_nnnnn.xml”命名方案的文件。

来源

2013-11-28 07:03:42 Kenosis

进入黑屏。什么都没有发生 – Backtrack

检查生成文件的目录。 – Kenosis

#!/bin/perl -w 

## splitting xml files using perl script 

print "Input File ? "; 
chomp($XmlFile = <STDIN>); 

open $XmlFileHandle,'<',$XmlFile; 

print "\nSplit By which Tag ? "; 
chomp($splitby = <STDIN>); 

open $OutputHandle, '>','OutputFile_'.$splitby; 

## to split by <user>...</user> 
while(<$XmlFileHandle>){ 
    if(/<$splitby>/){ 
     print $OutputHandle "<$splitby>\n"; 
     last; 
    } 
} 

while(<$XmlFileHandle>){ 
    $line = $_; 
    if($line =~ m/<\/$splitby>/){ 
     print $OutputHandle "</$splitby>"; 
     last; 
    } 
    print $OutputHandle $line; 
} 

print "\nOutput File is : OutputFile_$splitby\n";

来源

2013-11-28 07:09:53 prashant

使用perl脚本分割xml文件

回答

相关问题