在使用sed格式化一条线时需要帮助

-1

>VFG000676(gb|AAD32411)_(lef)_anthrax_toxin_lethal_factor_precursor_[Anthrax_toxin_(VF0142)]_[Bacillus_anthracis_str._Sterne]

，我想输出是

>VFG000676\t(gb|AAD32411)\t(lef)\tanthrax_toxin_lethal_factor_precursor\t [Anthrax_toxin_(VF0142)]\t[Bacillus_anthracis_str._Sterne]

我用这个命令

grep '>' x.fa | sed 's/^>\(.*\) (gi.*) \(.*\) \[\(.*\)\].*/\1\t\2\t\3/' | sed 's/ /_/g' > output.tsv

但输出是不是我想要的。

更新：我终于解决了该问题通过使用下面的代码

grep '>' VFs_no_block.fa | sed 's/^>\(.*\)\((.*)\) \((.*)\) \(.*\) \(\[.*(.*)]\) \(\[.*]\).*/\1\t\2\t\3\t\4\t\5\t\6/' | sed 's/ /_/g' > VFDB_annotation_reference.tsv

来源

2017-01-18 Mahdi

请加一个简短的描述为标题（其中包含更多的信息，而不仅仅是[标签]） –

你能（也）描述（用文字）你想如何分割输入字符串？ –

变化OFS="\\t"到OFS="\t"，如果你真的想要的文字标签：

$ cat tst.awk 
BEGIN { OFS="\\t" } 
{ 
    c=0 
    while (match($0,/\[[^][]+\]|\([^)(]+\)|[^][)(]+/)) { 
     tgt = substr($0,RSTART,RLENGTH) 
     gsub(/^_+|_+$/,"",tgt) 
     if (tgt != "") { 
      printf "%s%s", (c++ ? OFS : ""), tgt 
     } 
     $0 = substr($0,RSTART+RLENGTH) 
    } 
    print 
} 

$ awk -f tst.awk file 
>VFG000676\t(gb|AAD32411)\t(lef)\tanthrax_toxin_lethal_factor_precursor\t[Anthrax_toxin_(VF0142)]\t[Bacillus_anthracis_str._Sterne]

来源

2017-01-18 18:08:26

谢谢你的回应。 – Mahdi

不客气。请参阅http://stackoverflow.com/help/someone-answers下一步该做什么。 –

在使用sed格式化一条线时需要帮助

回答

相关问题