2011-11-13 69 views
4

我不得不解析文件,其中列出了七列文件格式的方阵矩阵的特征向量到方阵中,其中每个特征向量是矩阵寻找更好的解决方案awk或perl:避免管道xargs等

Eigenvector file: COVAR 
    72 72 
    42.27674 53.43516 43.10335 43.43889 53.15094 43.77146 43.17536 
    52.49170 45.07565 42.10424 52.75460 45.74721 41.66882 52.21836 
    47.00361 40.21403 51.86627 47.05245 39.75512 50.92583 47.83411 
    38.36019 50.61541 48.00747 37.56547 51.66199 48.72199 36.29018 
    51.70312 48.54869 35.35773 52.59045 49.19493 34.14085 51.90543 
    49.78376 33.43961 52.55997 50.66576 32.13812 52.14743 51.17284 
    31.02647 52.41422 50.19470 30.02426 51.60068 50.14591 28.86206 
    51.70417 49.28895 27.52769 51.49614 49.94867 27.52460 50.99136 
    51.12215 26.37751 50.74786 51.93507 25.23025 50.04549 51.26765 
    25.46212 49.27591 50.30035 24.47349 48.61017 49.51955 23.64720 
    49.41136 48.60875 
**** 
    1  3.28044 
    0.06504 -0.20409 -0.08035 0.04603 -0.02034 -0.02343 0.03885 
    0.14025 0.01970 -0.00569 0.11391 -0.05271 -0.00874 0.25005 
    -0.02425 0.03969 0.13327 0.01054 0.09958 0.20857 0.08647 
    0.13883 0.12003 0.12859 0.05634 0.06415 0.02570 0.07466 
    -0.06541 0.04636 0.01246 -0.13691 -0.04270 0.03791 -0.15341 
    -0.02595 -0.01027 -0.15604 -0.08393 -0.00526 -0.16938 -0.09027 
    0.01573 -0.25999 -0.09350 0.01121 -0.24367 -0.01033 0.03059 
    -0.31268 -0.00040 0.02074 -0.17927 -0.01689 -0.02183 -0.03912 
    -0.01481 -0.03982 0.10507 -0.03446 -0.06896 0.20946 -0.00450 
    -0.17669 0.17617 0.08755 -0.21143 0.25313 0.12818 -0.13896 
    0.16625 0.06539 
**** 
    2  1.17147 
    0.05028 0.24209 0.07571 0.07015 0.26226 0.10552 0.09788 
    0.15535 0.10020 0.06248 0.07167 0.09337 0.06555 -0.05258 
    0.07777 0.05163 -0.08617 -0.01580 0.05087 -0.17374 -0.06483 
    0.03157 -0.18854 -0.12423 0.02388 -0.15753 -0.07304 0.00221 
    -0.12406 -0.11678 -0.00030 -0.07568 -0.07783 -0.00225 -0.10201 
    -0.09521 0.00373 -0.10066 -0.06755 -0.00386 -0.10808 -0.08343 
    -0.01420 -0.03899 -0.11123 -0.06186 -0.02282 -0.11633 -0.07596 
    0.03656 -0.14599 -0.07542 0.13621 -0.11299 -0.07350 0.22728 
    -0.02254 -0.07473 0.32577 0.01167 -0.09106 0.17148 0.10912 
    -0.01607 0.00303 0.19984 -0.01223 -0.16824 0.28827 -0.00879 
    -0.23259 0.16630 
**** 
    3 et cetera .... 

我设法解决我的问题,我可以,有很多管道......这是我的脚本的提取物还提取特征值(数下的自然数旁****

local dimensions=$(awk 'NR==2 {print$1}' ${ptraj_eigvect[$k]}) #in the second line of the file it is written the dimension of the rotation matrix 
#Ptraj produces a file in seven columns format 
#      || 
#      \/ 
if [[ $((${dimensions} % 7)) == 0 ]] 
then 
     local -i n_rows_eigvect_ptraj=$((${dimensions}/7)) 
else 
     local -i n_rows_eigvect_ptraj=$(((${dimensions}/7) + 1)) 
fi 
#  headers   matrix   **** 
#   || ||||||||||||||||||||||| || 
#   \/ \/\/\/\/\/\/\//\/\/\/\/ \/ 
awk 'NR>'$((2 + ${n_rows_eigvect_ptraj} + 1))' && NR%'$((2 + ${n_rows_eigvect_ptraj}))'==2' ${ptraj_eigvect[$k]} >${eigval_file} 

awk 'NR>'$((2 + ${n_rows_eigvect_ptraj} + 2))' && NR%'$((2 + ${n_rows_eigvect_ptraj}))'!=2 && NR%'$((2 + ${n_rows_eigvect_ptraj}))'!=1' ${ptraj_eigvect[$k]} | xargs printf "%s\n" | awk '($0=$NF x)&&ORS=NR%'${dimensions}'?FS:RS' | awk -f ${script_PA}/transpose.awk >${rotmatr_file} 

if [[ $(wc -l <${rotmatr_file}) != ${dimensions} ]] || [[ $(wc -w <${rotmatr_file}) != $((${dimensions} * ${dimensions})) ]] 
then 
     echo 'ERROR!!!' 
     exit 1 
    fi 

转置.awk文件的要求,并为72 X 72方阵这里生产

我的剧本我写的只是第2列可以看出,数字1 3.28044后的数字对应2 1.17147

here

我编辑

0.06504 0.05028 
-0.20409 0.24209 
-0.08035 0.07571 
0.04603 0.07015 
-0.02034 0.26226 
-0.02343 0.10552 
0.03885 0.09788 
0.14025 0.15535 
0.01970 0.10020 
-0.00569 0.06248 
0.11391 0.07167 
-0.05271 0.09337 
-0.00874 0.06555 
0.25005 -0.05258 
-0.02425 0.07777 
0.03969 0.05163 
0.13327 -0.08617 
0.01054 -0.01580 
0.09958 0.05087 
0.20857 -0.17374 
0.08647 -0.06483 
0.13883 0.03157 
0.12003 -0.18854 
0.12859 -0.12423 
0.05634 0.02388 
0.06415 -0.15753 
0.02570 -0.07304 
0.07466 0.00221 
-0.06541 -0.12406 
0.04636 -0.11678 
0.01246 -0.00030 
-0.13691 -0.07568 
-0.04270 -0.07783 
0.03791 -0.00225 
-0.15341 -0.10201 
-0.02595 -0.09521 
-0.01027 0.00373 
-0.15604 -0.10066 
-0.08393 -0.06755 
-0.00526 -0.00386 
-0.16938 -0.10808 
-0.09027 -0.08343 
0.01573 -0.01420 
-0.25999 -0.03899 
-0.09350 -0.11123 
0.01121 -0.06186 
-0.24367 -0.02282 
-0.01033 -0.11633 
0.03059 -0.07596 
-0.31268 0.03656 
-0.00040 -0.14599 
0.02074 -0.07542 
-0.17927 0.13621 
-0.01689 -0.11299 
-0.02183 -0.07350 
-0.03912 0.22728 
-0.01481 -0.02254 
-0.03982 -0.07473 
0.10507 0.32577 
-0.03446 0.01167 
-0.06896 -0.09106 
0.20946 0.17148 
-0.00450 0.10912 
-0.17669 -0.01607 
0.17617 0.00303 
0.08755 0.19984 
-0.21143 -0.01223 
0.25313 -0.16824 
0.12818 0.28827 
-0.13896 -0.00879 
0.16625 -0.23259 
0.06539 0.16630 

因为我想学习AWK,也许在未来的Perl我好心问请你教我怎么写,执行相同的任务一个awk或perl脚本

非常感谢您的关注

+0

我想帮助你,但我不知道* eigenvector *是什么,或者如果这是重要的信息。 – TLP

+0

我同意TLP,并且如果您包含示例输入的示例输出(感谢您格式化您的消息,那么它会简化我们的帮助)。祝你好运。 – shellter

+0

我编辑过。对不起,很差的清晰度,顺便说一句,没有必要知道什么是特征向量。我只是想解析这个数据文件来创建一个正方形矩阵dim dim,其中dim等于数据文件第二行的第一个字段;并且其列是'****'后面第二行后面列出的数字。 – Mareczek

回答

0

如果您想在C++中编写代码,您可以使用Boost::regexflex/bison

+0

听起来很有趣,它将会非常好用C++执行正则表达式作业......如果我只知道C++ ;-)我只是一个nob xD! 我会很高兴,如果我可以有时间学习严肃的编程...可能会一点点...... – Mareczek

+0

你的代码看起来很漂亮l33t给我:-) – Homer6

+0

这是什么意思l33t? – Mareczek

2

在此工作了一段时间,没有拿出任何非常漂亮的东西,但下面的代码似乎工作,尽管它很笨重。它假定你的数据是完全统一的,并且不关心标题。

积极的一面是,如果你改变<DATA><>,它会在你的数据文件的工作与:

> script.pl input > output 

这是假设你的数据文件具有相同的格式为你的榜样,你的载体按数字顺序出现。

代码:

use strict; 
use warnings; 
use v5.10; 

my @data; 
my $tmp; 

while (<DATA>) { 
    if (/^\*+/) {     # or some other way of separating vectors 
     push @data, $tmp if $tmp; # push buffer to array 
     <DATA>;     # discard header 
     $tmp = "";     # reset buffer 
    } else { 
     $tmp .= $_;    # buffer a new line 
    } 
} 
push @data, $tmp;      # push remaining buffer onto array 
@data = map { [ split ] } @data;  # split string into array 
for my $num (0 .. $#{$data[0]}) { 
    say join " ", map $data[$_][$num], keys @data; 
} 


__DATA__ 
**** 
1  3.28044 
0.06504 -0.20409 -0.08035 0.04603 -0.02034 -0.02343 0.03885 
0.14025 0.01970 -0.00569 0.11391 -0.05271 -0.00874 0.25005 
-0.02425 0.03969 0.13327 0.01054 0.09958 0.20857 0.08647 
0.13883 0.12003 0.12859 0.05634 0.06415 0.02570 0.07466 
-0.06541 0.04636 0.01246 -0.13691 -0.04270 0.03791 -0.15341 
-0.02595 -0.01027 -0.15604 -0.08393 -0.00526 -0.16938 -0.09027 
0.01573 -0.25999 -0.09350 0.01121 -0.24367 -0.01033 0.03059 
-0.31268 -0.00040 0.02074 -0.17927 -0.01689 -0.02183 -0.03912 
-0.01481 -0.03982 0.10507 -0.03446 -0.06896 0.20946 -0.00450 
-0.17669 0.17617 0.08755 -0.21143 0.25313 0.12818 -0.13896 
0.16625 0.06539 
**** 
2  1.17147 
0.05028 0.24209 0.07571 0.07015 0.26226 0.10552 0.09788 
0.15535 0.10020 0.06248 0.07167 0.09337 0.06555 -0.05258 
0.07777 0.05163 -0.08617 -0.01580 0.05087 -0.17374 -0.06483 
0.03157 -0.18854 -0.12423 0.02388 -0.15753 -0.07304 0.00221 
-0.12406 -0.11678 -0.00030 -0.07568 -0.07783 -0.00225 -0.10201 
-0.09521 0.00373 -0.10066 -0.06755 -0.00386 -0.10808 -0.08343 
-0.01420 -0.03899 -0.11123 -0.06186 -0.02282 -0.11633 -0.07596 
0.03656 -0.14599 -0.07542 0.13621 -0.11299 -0.07350 0.22728 
-0.02254 -0.07473 0.32577 0.01167 -0.09106 0.17148 0.10912 
-0.01607 0.00303 0.19984 -0.01223 -0.16824 0.28827 -0.00879 
-0.23259 0.16630 
1

一个AWK-解决方案请尝试以下操作。在文件s.awk保存这些命令:

/\*\*\*/{i++;accInd=0;next} 
(i>0){for (k=1;k <= NF;k++){ 
     I=k+accInd 
     a[i,I]=$k 
    } 
    accInd=accInd+(k-1) 
} 
END{for (n=3;n<=I;n++){ 
     for (m=1;m<=i;m++){ 
      printf "%f\t", a[m,n] 
     } 
     printf "\n" 
    } 
} 

然后在命令行中运行以下命令:

$ awk -f s.awk file 

HTH克里斯

1

如果我理解正确的问题,我觉得这个剧本AWK将会做这项工作,我试图让它容易阅读和理解,因此,相当详细的脚本:

#### 
# Use like: 
# 
# awk -f transpose.awk <Eigenvector file> 
# 
# This script assumes that all Eigenvectors in the file, have the same number 
# of values. The script will output all Eigenvectors into columns e.g if three 
# Eigenvectors it will produce three columns of values. 
# 
#### 

BEGIN { 
    # Keeps track of the number of Eigenvectors 
    currentEV = 0; 
} 

# Signifies a new Eigenvector (EV) 
$1 == "****" { 
    newEV = "true"; 
    transpose = "true"; 
    next; 
} 

# Get the EV's number 
newEV == "true" { 
    newEV = "false"; 
    currentEV = $1; 
    currentEVCol = 0; 
    next; 
} 

# Add all the values on the line, for the current EV, into the EV array 
transpose == "true" { 
    for (i=1; i<=NF; i++) { 
    ev[currentEV,++currentEVCol] = $i; 
    } 
} 

END { 
    # Loop through the array and print EV's ou in columns 
    for (i=1; i<=currentEVCol; i++) { 
    for (j=1; j<=currentEV; j++) { 
     printf ev[j,i] " "; 
    } 
    print ""; 
    } 
} 

对于简洁版本,请将以下内容复制到名为转置的文件中。AWK:

skip { skip = 0; next; } 

$1 == "****" { 
    EV++; EVC = 0; skip = 1; 
    next; 
} 

NF && EV { 
    for (i=1; i<=NF; i++) { 
    EVA[EV,++EVC] = $i; 
    } 
} 

END { 
    for (i=1; i<=EVC; i++) { 
    for (j=1; j<=EV; j++) { 
     printf EVA[j,i] " "; 
    } 
    print ""; 
    } 
} 

并调用像$ awk -f transpose.awk file > transposedFile

0

而是类似于TLP的,但在我看来干净了一点。还保留独立数组中的特征值。正如他所说,您可以将<DATA>更改为<>并以scriptname.pl mydata.dat运行(然后您可以删除__DATA__标签及其后的所有内容)。

它使用附加模块Array::Transpose来执行转置(使用cpan命令安装)。 Data::Dumper模块及其Dumper函数用于可视化。grep { length }位删除由split找到的空元素,这可能通过删除前导空格来消除,但这看起来更加健壮。

#!/usr/bin/env perl 

use strict; 
use warnings; 

use Data::Dumper; 
use Array::Transpose; 

my $row = -1; 
my @eigen; 
my @data; 

while(<DATA>) { 
    if (/\*+/) { 
    #increment row number 
    $row++; 
    #next line is eigenvalue, keep it in @eigen 
    my @line = grep { length } split(/\s+/, <DATA>); 
    push @eigen, $line[-1]; 
    # move on to next line 
    next; 
    } 
    next if $row < 0; #skip first block 

    push @{ $data[$row] }, grep { length } split(/\s+/); 
} 

my @transpose = transpose(\@data); 

print Dumper \@eigen; 
print Dumper \@transpose; 

__DATA__ 
Eigenvector file: COVAR 
    72 72 
    42.27674 53.43516 43.10335 43.43889 53.15094 43.77146 43.17536 
    52.49170 45.07565 42.10424 52.75460 45.74721 41.66882 52.21836 
    47.00361 40.21403 51.86627 47.05245 39.75512 50.92583 47.83411 
    38.36019 50.61541 48.00747 37.56547 51.66199 48.72199 36.29018 
    51.70312 48.54869 35.35773 52.59045 49.19493 34.14085 51.90543 
    49.78376 33.43961 52.55997 50.66576 32.13812 52.14743 51.17284 
    31.02647 52.41422 50.19470 30.02426 51.60068 50.14591 28.86206 
    51.70417 49.28895 27.52769 51.49614 49.94867 27.52460 50.99136 
    51.12215 26.37751 50.74786 51.93507 25.23025 50.04549 51.26765 
    25.46212 49.27591 50.30035 24.47349 48.61017 49.51955 23.64720 
    49.41136 48.60875 
**** 
    1  3.28044 
    0.06504 -0.20409 -0.08035 0.04603 -0.02034 -0.02343 0.03885 
    0.14025 0.01970 -0.00569 0.11391 -0.05271 -0.00874 0.25005 
    -0.02425 0.03969 0.13327 0.01054 0.09958 0.20857 0.08647 
    0.13883 0.12003 0.12859 0.05634 0.06415 0.02570 0.07466 
    -0.06541 0.04636 0.01246 -0.13691 -0.04270 0.03791 -0.15341 
    -0.02595 -0.01027 -0.15604 -0.08393 -0.00526 -0.16938 -0.09027 
    0.01573 -0.25999 -0.09350 0.01121 -0.24367 -0.01033 0.03059 
    -0.31268 -0.00040 0.02074 -0.17927 -0.01689 -0.02183 -0.03912 
    -0.01481 -0.03982 0.10507 -0.03446 -0.06896 0.20946 -0.00450 
    -0.17669 0.17617 0.08755 -0.21143 0.25313 0.12818 -0.13896 
    0.16625 0.06539 
**** 
    2  1.17147 
    0.05028 0.24209 0.07571 0.07015 0.26226 0.10552 0.09788 
    0.15535 0.10020 0.06248 0.07167 0.09337 0.06555 -0.05258 
    0.07777 0.05163 -0.08617 -0.01580 0.05087 -0.17374 -0.06483 
    0.03157 -0.18854 -0.12423 0.02388 -0.15753 -0.07304 0.00221 
    -0.12406 -0.11678 -0.00030 -0.07568 -0.07783 -0.00225 -0.10201 
    -0.09521 0.00373 -0.10066 -0.06755 -0.00386 -0.10808 -0.08343 
    -0.01420 -0.03899 -0.11123 -0.06186 -0.02282 -0.11633 -0.07596 
    0.03656 -0.14599 -0.07542 0.13621 -0.11299 -0.07350 0.22728 
    -0.02254 -0.07473 0.32577 0.01167 -0.09106 0.17148 0.10912 
    -0.01607 0.00303 0.19984 -0.01223 -0.16824 0.28827 -0.00879 
    -0.23259 0.16630