perl - 从列名创建文件名

我是Perl的新手，我想根据输入文件中存在的列名创建输出文件的名称。再说说我的输入文件头如下：perl - 从列名创建文件名

#identifier (%)composition

，我想我的输出文件名是identifier_composition。这些identifiers和compositions可以是字母数字字符序列，例如用于标识符的#E2FAR4或用于组成的(%)MhDE4。在这个例子中，输出文件名应该是E2FAR4_MhDE4。到目前为止，我能够得到identifier而不是composition。这是我曾尝试为代码：

if ($line =~ /^#\s*(\S+)\t\(%)s*(\S+)/){ 
    my $ID = $1; 
    my $comp = $2; 
    my $out_file = "${ID}_${comp}" 
}

，但我得到了identifier也作为第二个参数。任何帮助，将不胜感激。

来源

2016-11-15 Marius

尝试'$ line =〜/ ^＃\ W *（\ w +）\ t \ W *（\ w +）/' –

括号是正则表达式中的特殊字符，。 – Toto

使用正则表达式如下

^#\s*(\S+)\t\(%\)(\S+)

Demo

示例代码：

#!/usr/bin/perl 
use strict; 
use warnings; 
while(<DATA>){ 
    my $line = $_; 
    chomp $line; 
    if ($line =~ /^#\s*(\S+)\t\(%\)(\S+)/){ 
     my $ID = $1; 
     my $comp = $2; 
     my $out_file = "${ID}_${comp}"; 
     print "Filename: $out_file"; 
    } 
} 

__DATA__ 
#identifier (%)composition

输出：

Filename: identifier_composition

来源

2016-11-15 08:52:14

谢谢。它的工作原理可以很容易地适用于'identifier'和'composition'可能采用的任何值，以及（最终）前面的特殊字符。 – Marius

看起来你而过以为你正则表达式。您正在寻找由一些非单词字符分隔的两个单词字符序列。

if ($line =~ /(\w+)\W+(\w+)/) { 
    say "$1/$2"; 
}

一个更简单的方法是将匹配单词字符的所有序列：

if (my @words = $line =~ /(\w+)/g) { 
    say join '/', @words; 
}

更新：我把你的正则表达式到这个regex explainer。这里的一下就出来了：

NODE      EXPLANATION 
-------------------------------------------------------------------------------- 
^      the beginning of the string 
-------------------------------------------------------------------------------- 
    #      '#' 
-------------------------------------------------------------------------------- 
    \s*      whitespace (\n, \r, \t, \f, and " ") (0 or 
          more times (matching the most amount 
          possible)) 
-------------------------------------------------------------------------------- 
    (      group and capture to \1: 
-------------------------------------------------------------------------------- 
    \S+      non-whitespace (all but \n, \r, \t, \f, 
          and " ") (1 or more times (matching the 
          most amount possible)) 
-------------------------------------------------------------------------------- 
)      end of \1 
-------------------------------------------------------------------------------- 
    \t      '\t' (tab) 
-------------------------------------------------------------------------------- 
    \^      '^' 
-------------------------------------------------------------------------------- 
    (      group and capture to \2: 
-------------------------------------------------------------------------------- 
    %      '%' 
-------------------------------------------------------------------------------- 
)      end of \2 
-------------------------------------------------------------------------------- 
    s*      's' (0 or more times (matching the most 
          amount possible)) 
-------------------------------------------------------------------------------- 
    (      group and capture to \3: 
-------------------------------------------------------------------------------- 
    \S+      non-whitespace (all but \n, \r, \t, \f, 
          and " ") (1 or more times (matching the 
          most amount possible)) 
-------------------------------------------------------------------------------- 
)      end of \3

我觉得你最大的问题是，你试图在正则表达式的中间相匹配的文字^，但%周围的转义括号是一个问题了。并且是毫无意义和令人困惑的:-)

来源

2016-11-15 09:06:09

嗨。感谢您的精心回应。是的，这个'^'实际上是我的问题中的复制/粘贴错误。实际上并不是我的代码的一部分，但我把它留在那里，因为你的答案是指它。确实，它适用于我给出的例子，但标识符和组合不一定是字面意义上的“标识符”和“组合”，它可以是一系列字母数字字符序列，前面是一些特殊字符，并由选项卡分隔。但再次感谢你的时间。 – Marius

@Marius：如果你想要一个有意义的答案，那么你的问题应该包括示范数据，以展示可能性的范围:-) –

是的，你是对的。我相应地编辑了问题文本。再次感谢 – Marius

perl - 从列名创建文件名

回答

相关问题