2013-06-12 24 views
1

我有一个文本文件,在行首以空格作为分隔符。仅从文本文件中的一行解析日期

没有初始空格的行应放在CSV文件的第一列;有两个空格的应该放在CSV的第二列;有四个空格的应该放在第三栏。

这是所有工作正常所需。

在以两个空格开头的行中,我希望只有日期应该放在第二列中,并放弃该行的其他数据。其余的应该保持原样。

为了清楚起见,我在行的开头标出了空格#

文本文件:

Component1 
##(111) Amar Sen <[email protected]> <No comment> 2013/04/01 
####/Com/src/folder1/folder2/newfile.txt 
##(1199) Prashant Singh <[email protected]> <No comment> 2013/04/24 
####/Com/src/folder1/folder2/testfile24 
####/Com/src/folder1/folder2/testfile25 
####/Com/src/folder1/folder2/testfile26 
##(1204) Anthony Li <[email protected]> <No comment> 2013/04/25 
####/Com/src2 
Component2(added) 
Component3 

输出格式:

Component1,2013/04/01,/Com/src/folder1/folder2/newfile.txt 
      2013/04/24,/Com/src/folder1/folder2/testfile24 
        /Com/src/folder1/folder2/testfile25 
         /Com/src/folder1/folder2/testfile26 
      2013/04/25,/Com/src2 
Component2(added) 
Component3 

下面的代码。它的工作很好,除了上面描述的变化。

use strict; 
use warnings; 

my $previous_count   = "-1"; #beginning, we will think, that no spaces. 
my $current_count    = "0"; #current default value 
my $maximum_count    = 3; 
my $to_written    = ""; 
my $delimiter_between_columns = ","; 
my $newline_separator   = ";"; 

my $file = 'C:\\textfile.txt'; 
open (my $fh, '<:encoding(UTF-8)', $file) or die "Could not open file '$file' $!"; 

while (my $row = <$fh>) { 

    # ok, read. 
    chomp($row); 

    # print "row is : $row\n"; 
    if ($row =~ m/^(\s*)/) { 

    #print length($1); 
    $current_count = length($1)/2; #take number of spaces divided by 2 
    $row =~ s/^\s+//; 

    if ($previous_count >= $current_count || $previous_count == $maximum_count) { 

     #output here 
     print "$to_written" . $newline_separator . "\n"; 

     $previous_count = 0; 
     $to_written  = ""; 
    } 
    $previous_count = 0 if ($previous_count == -1); 
    $to_written .= $delimiter_between_columns x ($current_count - $previous_count) . "$row"; 

    $previous_count = $current_count; 

    #print"\n"; 
    } 
} 

print "$to_written" . $newline_separator . "\n"; 
+0

“输出格式”您发布不符合你描述你想要的,在那里你have'。由于CSV字段由逗号分隔,因此任何不含逗号的行表示所有内容都位于第一列。 – doubleDown

回答

1

你似乎已经用自己的解决方案把自己绑在一起了。

这个程序似乎是做你所需要的。我为您的“输出格式”添加了一些逗号,因为您的示例对于初始空字段没有占位符。

我为此保留了散列字符。显然,将它们改为空格很简单,用s/^(\s*)//代替s/^(#*)//

use strict; 
use warnings; 

my @row; 

while (<DATA>) { 

    chomp; 
    s/^(#*)//; 
    my $i = length($1)/2; 

    if ($i == 1 and m<(\d{4}/\d{2}/\d{2})>) { 
    $row[$i] = $1; 
    } 
    else { 
    $row[$i] = $_; 
    } 

    if ($i == 2) { 
    print join(',', @row), ";\n"; 
    @row = ('') x 3; 
    } 
} 


__DATA__ 
Component1 
##(111) Amar Sen <[email protected]> <No comment> 2013/04/01 
####/Com/src/folder1/folder2/newfile.txt 
##(1199) Prashant Singh <[email protected]> <No comment> 2013/04/24 
####/Com/src/folder1/folder2/testfile24 
####/Com/src/folder1/folder2/testfile25 
####/Com/src/folder1/folder2/testfile26 
##(1204) Anthony Li <[email protected]> <No comment> 2013/04/25 
####/Com/src2 

输出

Component1,2013/04/01,/Com/src/folder1/folder2/newfile.txt; 
,2013/04/24,/Com/src/folder1/folder2/testfile24; 
,,/Com/src/folder1/folder2/testfile25; 
,,/Com/src/folder1/folder2/testfile26; 
,2013/04/25,/Com/src2; 

更新

它更有意义级联从一个和两个到它们未提供随后的行的列的值。如果你从我的程序删除行@row = ('') x 3它会做到这一点,这个输出

Component1,2013/04/01,/Com/src/folder1/folder2/newfile.txt; 
Component1,2013/04/24,/Com/src/folder1/folder2/testfile24; 
Component1,2013/04/24,/Com/src/folder1/folder2/testfile25; 
Component1,2013/04/24,/Com/src/folder1/folder2/testfile26; 
Component1,2013/04/25,/Com/src2; 
+0

谢谢你的答复。它以我需要的方式工作,除了在一个失败的地方。事实上,我的错误,我应该更多地采样。如果我有Component2和Component3(开头没有空格),代码应该输出Component2和Component3,不管是否有数据,但代码只输出那些有数据的行。我已经更新了文本文件和输出格式我的问题,请看看。 –