我想基于标题名称从文件中提取某些列。某些名称有空格(该文件是制表符分隔的)。由于下游应用程序将受到影响,我无法删除或替换这些空间。我所寻找的是提取输出它们的顺序基于标题名称的一些列和我的愿望通过标题名称打印某些列,其上带空格(awk,sed)
这里是我的文件的一个例子:
Sample Note Intragenic Rate ABCDE_177447
1032 NA 0.97867626 0.9300704670625763 0.72782564
ABCDE_177447 NA 0.97836965 1.0 0.87218356
ABCDE_188399 NA 0.97859967 0.905527730405171 0.81188565
ABCDE_189595 NA 0.9787659 0.9059075892313707 0.8089241
ABCDE_189596 NA 0.9788054 0.9065243881070291 0.8092951
我的期望输出;
Sample Intragenic ABCDE_177447
1032 0.97867626 0.9300704670625763 0.72782564
ABCDE_177447 0.97836965 0.87218356
ABCDE_188399 0.97859967 0.81188565
ABCDE_189595 0.9787659 0.8089241
ABCDE_189596 0.9788054 0.8092951
我曾尝试这种解决方案AWK extract columns from file based on header selected from 2nd file
,但它不与名称空间和同此解决方案的工作Extracting columns from a file
我也试过这种
$cat cols.awk
BEGIN {
n=split(cols,col)
for (i=1; i<=n; i++) s[col[i]]=i
}
NR==1 {
for (f=1; f<=NF; f++)
if ($f in s) c[s[$f]]=f
next
}
{ sep=""
for (f=1; f<=n; f++) {
printf("%c%s",sep,$c[f])
sep=FS
}
print ""
}
但是当我跑我的脚本像awk -F\t -f cols.awk.sh -v cols="Note,Sample,Intragenic Rate" metrics.txt
我得到了以下错误:
awk: illegal field $(), name "1"
input record number 2, file metrics.txt
source line number 12
感谢@anubhava,你能告诉我该怎么做了,包括命令行参数,像yourfile.sh “cols1,COLS 2” inputfile中? – user2380782
'yourfile.sh'里面可以保存这个awk命令:'awk -v cols =“$ 1”'BEGIN {FS = OFS =“\ t”; nc = split(cols,a,“,”)} NR == 1 {for(i = 1; i <= NF; i ++)hdr [$ i] = i} {for(i = 1; i <= nc ; i ++)if(a [i] in hdr)printf“%s%s”,$ hdr [a [i]],(i
anubhava
非常感谢@anubhava – user2380782