从解析循环中打印多行

我已经写了一个循环来解析文件中的几行内容，并以更友好的格式提取我想要的信息，但是我得到的重复字符串是我正在解析的打印。我认为我在使用echo | sed命令时做错了什么（而且很愚蠢），但我现在看不到它了。任何人都可以指出我要出错的地方吗？从解析循环中打印多行

文件来分析容貌（略）所示：

##################################### topd Tree0 - Tree6 ####################################### 
* Percentage of taxa in common: 100.0% 
* Split Distance [differents/possibles]: 0.461538461538462 [ 12/26 ] 
* Disagreement [ taxa disagree/all taxa ]: [ 9/16 ], New Split Distance: 0, Taxa disagree: (PAUlopT PAKU2 PAKlopT PAUU4 PLTU1 PLTU3 PLTU4 PLTcif PLTlopT) 

##################################### topd Tree0 - Tree7 ####################################### 
* Percentage of taxa in common: 100.0% 
* Split Distance [differents/possibles]: 0.538461538461538 [ 14/26 ] 
* Disagreement [ taxa disagree/all taxa ]: [ 9/16 ], New Split Distance: 0, Taxa disagree: (PAKU2 PAKlopT PAUU4 PAUlopT PLTU1 PLTU2 PLTU3 PLTU4 PLTlopT) 

##################################### topd Tree0 - Tree8 ####################################### 
* Percentage of taxa in common: 100.0% 
* Split Distance [differents/possibles]: 0.230769230769231 [ 6/26 ] 
* Disagreement [ taxa disagree/all taxa ]: [ 4/16 ], New Split Distance: 0, Taxa disagree: (PLTU1 PLTU2 PLTU3 PLTU4)

而且我想只是头和类群不同意（即线1和线4月底）

但我发现了这其中线三重（在某些情况下给予不同类群名单，但如果它是一个单独的一个我还没有解决这个问题至今）：

Tree0 - Tree6 PAKlopT PAUlopT PLTU1 PLTU2 PLTU3 PLTU4 PLTlopT 
Tree0 - Tree6 PAKlopT PAUlopT PLTU1 PLTU2 PLTU3 PLTU4 PLTlopT 
Tree0 - Tree6 PAKlopT PAUlopT PLTU1 PLTU2 PLTU3 PLTU4 PLTlopT 
Tree0 - Tree6 PAUlopT PAKU2 PAKlopT PAUU4 PLTU1 PLTU3 PLTU4 PLTcif PLTlopT 
Tree0 - Tree6 PAUlopT PAKU2 PAKlopT PAUU4 PLTU1 PLTU3 PLTU4 PLTcif PLTlopT 
Tree0 - Tree7 PAUlopT PAKU2 PAKlopT PAUU4 PLTU1 PLTU3 PLTU4 PLTcif PLTlopT 
Tree0 - Tree7 PAUlopT PAKU2 PAKlopT PAUU4 PLTU1 PLTU3 PLTU4 PLTcif PLTlopT 
Tree0 - Tree7 PAUlopT PAKU2 PAKlopT PAUU4 PLTU1 PLTU3 PLTU4 PLTcif PLTlopT 
Tree0 - Tree7 PAKU2 PAKlopT PAUU4 PAUlopT PLTU1 PLTU2 PLTU3 PLTU4 PLTlopT 
Tree0 - Tree7 PAKU2 PAKlopT PAUU4 PAUlopT PLTU1 PLTU2 PLTU3 PLTU4 PLTlopT 
Tree0 - Tree8 PAKU2 PAKlopT PAUU4 PAUlopT PLTU1 PLTU2 PLTU3 PLTU4 PLTlopT 
Tree0 - Tree8 PAKU2 PAKlopT PAUU4 PAUlopT PLTU1 PLTU2 PLTU3 PLTU4 PLTlopT 
Tree0 - Tree8 PAKU2 PAKlopT PAUU4 PAUlopT PLTU1 PLTU2 PLTU3 PLTU4 PLTlopT 
Tree0 - Tree8 PLTU1 PLTU2 PLTU3 PLTU4 
Tree0 - Tree8 PLTU1 PLTU2 PLTU3 PLTU4

，这是我写的代码（I D oubt这是特别优雅或有效）

#!/bin/bash 

file="$1" 
### 

while read LINE ; 
do 
if [[ $LINE == "#"* ]] 
    then 
    header=$(echo $LINE | sed 's/\#//g' | sed 's/\ topd\ //g') 
fi 
if [[ $LINE == "* Disagreement"* ]] ; 
    then 
    taxa=$(echo $LINE | sed 's/.*(\(\ .*\ \))/\1/' | grep "^ " |sed 's/\ /\t/g') 
fi 

echo "$header""$taxa" 

done < $file

编辑：

我尝试过程中的实际文件： https://drive.google.com/open?id=0Bz_H3y-7pX9FX0lZTWNBdlpIQmc

来源

2016-07-13 Joe Healey

我建议使用如awk文本处理语言或sed的，而不是庆典。 – 123

您的脚本中的逻辑错误：您正在为您处理的每一行打印一行。只有在处理“*不一致”行后才打印。 –

bash可能不这样做的最好的语言，但使用bash正则表达式匹配会使它简单得多。

#!/bin/bash 

file="$1" 
### 

header_regex='# topd (.*) #' 
taxa_regex='Taxa disagree: \((.*)\)' 
while read line; do 
    if [[ $line =~ $header_regex ]]; then 
    header=${BASH_REMATCH[1]} 
    elif [[ $line =~ $taxa_regex ]]; then 
    taxa=${BASH_REMATCH[1]} 
    echo "$header $taxa" 
    fi  
done < "$file"

来源

2016-07-13 14:23:19 chepner

你可能想要逃避那些'＃'。 – 123

是的，出于某种原因，我认为你不能在'[['。 – chepner

我认为你也必须逃避空间。将正则表达式放在var中可能会更好。 – 123

您可以纯粹使用sed。我提出了两个步骤：

sed -n -e 's/#* \(.*\) #*$/\1/p' -e 's/.*(\(.*\))$/\1/p' < file.txt

这使得你与像这样的输出：

topd Tree0 - Tree6 
PAUlopT PAKU2 PAKlopT PAUU4 PLTU1 PLTU3 PLTU4 PLTcif PLTlopT 
topd Tree0 - Tree7 
PAKU2 PAKlopT PAUU4 PAUlopT PLTU1 PLTU2 PLTU3 PLTU4 PLTlopT 
topd Tree0 - Tree8

在您需要合并对线，也可与sed做了第二个步骤，只是通过管道先前的输出：

... | sed 'N;s/\n/\t/'

也许第二步可以以某种方式集成到第一个，但我不知道如何。

来源

2016-07-13 14:28:09

是的，我在sed中挣扎着多个正则表达式（从来没有真正做过）。这个解决方案非常接近，但我最终所做的是每个树对比的制表符分隔线。这使所有###的后面和超过2行。 –

对不起，没有仔细阅读。更新。 –

非常感谢。以下最终给出了我将要寻找的东西，尽管冗长的一行：'sed -n -e's /＃* \（。* \）＃* $/\ 1/p'-e's /.*（\（。* \））$/\ 1/p'

Shell不用于操作文本，它用于对工具进行排序调用，请参阅https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice。

做你在UNIX想要什么正确的方法是使用标准的UNIX通用文本处理工具AWK：

$ cat tst.awk 
/####/ { hdr = $3 " - " $5 } 
/Disagreement/ { gsub(/.*\(*| *\).*/,""); print hdr, $0 } 

$ awk -f tst.awk file 
Tree0 - Tree6 PAUlopT PAKU2 PAKlopT PAUU4 PLTU1 PLTU3 PLTU4 PLTcif PLTlopT 
Tree0 - Tree7 PAKU2 PAKlopT PAUU4 PAUlopT PLTU1 PLTU2 PLTU3 PLTU4 PLTlopT 
Tree0 - Tree8 PLTU1 PLTU2 PLTU3 PLTU4

来源

2016-07-13 17:39:06

从解析循环中打印多行

回答

相关问题