2016-04-07 25 views
1

我想基于标题名称从文件中提取某些列。某些名称有空格(该文件是制表符分隔的)。由于下游应用程序将受到影响,我无法删除或替换这些空间。我所寻找的是提取输出它们的顺序基于标题名称的一些列和我的愿望通过标题名称打印某些列,其上带空格(awk,sed)

这里是我的文件的一个例子:

Sample Note Intragenic Rate ABCDE_177447 
1032 NA 0.97867626 0.9300704670625763 0.72782564 
ABCDE_177447 NA 0.97836965 1.0 0.87218356 
ABCDE_188399 NA 0.97859967 0.905527730405171 0.81188565 
ABCDE_189595 NA 0.9787659 0.9059075892313707 0.8089241 
ABCDE_189596 NA 0.9788054 0.9065243881070291 0.8092951 

我的期望输出;

Sample Intragenic ABCDE_177447 
1032 0.97867626 0.9300704670625763 0.72782564 
ABCDE_177447 0.97836965 0.87218356 
ABCDE_188399 0.97859967 0.81188565 
ABCDE_189595 0.9787659 0.8089241 
ABCDE_189596 0.9788054 0.8092951 

我曾尝试这种解决方案AWK extract columns from file based on header selected from 2nd file

,但它不与名称空间和同此解决方案的工作Extracting columns from a file

我也试过这种

$cat cols.awk 

BEGIN { 
n=split(cols,col) 
for (i=1; i<=n; i++) s[col[i]]=i 
} 
NR==1 { 
for (f=1; f<=NF; f++) 
    if ($f in s) c[s[$f]]=f 
next 
} 
{ sep="" 
for (f=1; f<=n; f++) { 
    printf("%c%s",sep,$c[f]) 
    sep=FS 
} 
print "" 
} 

但是当我跑我的脚本像awk -F\t -f cols.awk.sh -v cols="Note,Sample,Intragenic Rate" metrics.txt我得到了以下错误:

awk: illegal field $(), name "1" 
input record number 2, file metrics.txt 
source line number 12 

回答

1

您可以使用此AWK:

awk -v cols='Sample,Intragenic,ABCDE_177447' 'BEGIN{FS=OFS="\t"; nc=split(cols, a, ",")} NR==1{for (i=1; i<=NF; i++) hdr[$i]=i} {for (i=1; i<=nc; i++) if (a[i] in hdr) printf "%s%s", $hdr[a[i]], (i<nc?OFS:ORS)}' file 

Sample Intragenic ABCDE_177447 
1032 0.97867626 0.72782564 
ABCDE_177447 0.97836965 0.87218356 
ABCDE_188399 0.97859967 0.81188565 
ABCDE_189595 0.9787659 0.8089241 
ABCDE_189596 0.9788054 0.8092951 

使用cols命令行参数可以传递一个逗号分隔的打印列的列表。

这里是更具可读性AWK:

awk -v cols='Sample,Intragenic,ABCDE_177447' 'BEGIN { 
    FS=OFS="\t" 
    nc=split(cols, a, ",") 
} 
NR==1 { 
    for (i=1; i<=NF; i++) 
     hdr[$i]=i 
} 
{ 
    for (i=1; i<=nc; i++) 
     if (a[i] in hdr) 
     printf "%s%s", $hdr[a[i]], (i<nc?OFS:ORS) 
}' file 
+0

感谢@anubhava,你能告诉我该怎么做了,包括命令行参数,像yourfile.sh “cols1,COLS 2” inputfile中? – user2380782

+0

'yourfile.sh'里面可以保存这个awk命令:'awk -v cols =“$ 1”'BEGIN {FS = OFS =“\ t”; nc = split(cols,a,“,”)} NR == 1 {for(i = 1; i <= NF; i ++)hdr [$ i] = i} {for(i = 1; i <= nc ; i ++)if(a [i] in hdr)printf“%s%s”,$ hdr [a [i]],(i anubhava

+1

非常感谢@anubhava – user2380782

0
awk '{sub(/Note Intragenic Rate/,"Intragenic")}{sub(/NA/, "")}NR>2{sub($3, "")}1' file 

Sample Intragenic ABCDE_177447 
1032 0.97867626 0.9300704670625763 0.72782564 
ABCDE_177447 0.97836965 0.87218356 
ABCDE_188399 0.97859967 0.81188565 
ABCDE_189595 0.9787659 0.8089241 
ABCDE_189596 0.9788054 0.8092951