2017-03-22 23 views
2

我有一组文件(数百人)这类型的数据(管材作为列分隔符):如何在特定标签之间打印数据?

000|FILE___V20170307-003792 
102|000|DDKSB=DAGA;DAGA=ADGA;DAG-FGSA=ADGA|0001|KLJDFLKJBDL|00|ADGAHA||00|ASYAHA|||DAGHAH|0|GAFDGA|18||3|N|1||AHA|ASGAN|ASFAN||82|1||2|300|||0|0|0|0|10|0||0|0|KLJDFLKJBDL|2|||||||| 
102|0100|DDKSB=DAGA;DAGA=ADGA;DAG-FGSA=ADGA|00|KLJDFLKJBDL|00|ASDGAHA||00|ASYAHA|||DAGHAH|0|AGAH|5||3|N|1||AHA|ASGAN|ASDHAH||82|1||2|300|||0|0|0|0|54|0||0|0|KLJDFLKJBDL|2|||||||| 
010|ENDOFFILE|10 

如何我只取第一排和最后一排之间的行?第一列第一行有000,第一列最后一行有010。我尝试使用awk:

awk '/000/,/010/ { print > "output.txt" }' input_file.txt 

但它不起作用,它不检查从第一列中找到000和010。也许不知何故省略了第一行和最后一行的作品呢?

+0

http://stackoverflow.com/q/17988756/632407? http://stackoverflow.com/q/22221277/632407? http://stackoverflow.com/q/19177721/632407 – jm666

回答

1

您可以使用此sed

sed -n '/^000|/,/^010|/{/^0[01]0|/!p;}' file 

102|000|DDKSB=DAGA;DAGA=ADGA;DAG-FGSA=ADGA|0001|KLJDFLKJBDL|00|ADGAHA||00|ASYAHA|||DAGHAH|0|GAFDGA|18||3|N|1||AHA|ASGAN|ASFAN||82|1||2|300|||0|0|0|0|10|0||0|0|KLJDFLKJBDL|2|||||||| 
102|0100|DDKSB=DAGA;DAGA=ADGA;DAG-FGSA=ADGA|00|KLJDFLKJBDL|00|ASDGAHA||00|ASYAHA|||DAGHAH|0|AGAH|5||3|N|1||AHA|ASGAN|ASDHAH||82|1||2|300|||0|0|0|0|54|0||0|0|KLJDFLKJBDL|2|||||||| 

使用在find命令:

find . -name '*.txt' -exec sed -i '' -n '/^000|/,/^010|/{/^0[01]0|/!p;}' {} \; 
+0

谢谢,我该如何替换当前文件中的内容,我尝试了'code' find。 -name'* .txt'-exec sed -n'/^000 /,/^010/{/^0 [01] 0 /!p;}''{}'\;'code'但它只是打印结果? – jrara

+0

你可以使用:'find。 -name'* .txt'-exec sed -i''-n'/^000 | /,/^010 |/{/^0 [01] 0 | /!p;}'{} \;' – anubhava

1

你可以试试,

awk -v FS="|" '$1=="000",$1=="010" {print > "output.txt"}' input_file.txt 

你,

 
000|FILE___V20170307-003792 
102|000|DDKSB=DAGA;DAGA=ADGA;DAG-FGSA=ADGA|0001|KLJDFLKJBDL|00|ADGAHA||00|ASYAHA|||DAGHAH|0|GAFDGA|18||3|N|1||AHA|ASGAN|ASFAN||82|1||2|300|||0|0|0|0|10|0||0|0|KLJDFLKJBDL|2|||||||| 
102|0100|DDKSB=DAGA;DAGA=ADGA;DAG-FGSA=ADGA|00|KLJDFLKJBDL|00|ASDGAHA||00|ASYAHA|||DAGHAH|0|AGAH|5||3|N|1||AHA|ASGAN|ASDHAH||82|1||2|300|||0|0|0|0|54|0||0|0|KLJDFLKJBDL|2|||||||| 
010|ENDOFFILE|10 

only rows between the first row and the last row

awk -v FS="|" '$1=="010"{f=0} f{print > "output.txt"} $1=="000"{f=1}' input_file.txt 

你,

 
102|000|DDKSB=DAGA;DAGA=ADGA;DAG-FGSA=ADGA|0001|KLJDFLKJBDL|00|ADGAHA||00|ASYAHA|||DAGHAH|0|GAFDGA|18||3|N|1||AHA|ASGAN|ASFAN||82|1||2|300|||0|0|0|0|10|0||0|0|KLJDFLKJBDL|2|||||||| 
102|0100|DDKSB=DAGA;DAGA=ADGA;DAG-FGSA=ADGA|00|KLJDFLKJBDL|00|ASDGAHA||00|ASYAHA|||DAGHAH|0|AGAH|5||3|N|1||AHA|ASGAN|ASDHAH||82|1||2|300|||0|0|0|0|54|0||0|0|KLJDFLKJBDL|2|||||||| 
1

为了得到第一排和最后一排之间不考虑内容可言,用awk:

$ awk 'NR>2{print p} {p=$0}' file 
102|000|DDKSB=DAGA;DAGA=ADGA;DAG-FGSA=ADGA|0001|KLJDFLKJBDL|00|ADGAHA||00|ASYAHA|||DAGHAH|0|GAFDGA|18||3|N|1||AHA|ASGAN|ASFAN||82|1||2|300|||0|0|0|0|10|0||0|0|KLJDFLKJBDL|2|||||||| 
102|0100|DDKSB=DAGA;DAGA=ADGA;DAG-FGSA=ADGA|00|KLJDFLKJBDL|00|ASDGAHA||00|ASYAHA|||DAGHAH|0|AGAH|5||3|N|1||AHA|ASGAN|ASDHAH||82|1||2|300|||0|0|0|0|54|0||0|0|KLJDFLKJBDL|2|||||||| 

使用headtail

$ head -n -1 file |tail -n +2 

man head

-n, --lines=[-]K 
      print the first K lines instead of the first 10; with the 
      leading '-', print all but the last K lines of each file 

man tail

-n, --lines=K 
      output the last K lines, instead of the last 10; or use -n +K to 
      output lines starting with the Kth 

如果你有多个文件,您可以:

for f in files* ; do head -n -1 "$f" |tail -n +2 > newpath/"$f" ; done 
1

另一个与sed的方法:

sed -n '/^000/,/^010/{//d;p}' file 
  • /^000/,/^010/:从开始000到下一行开头的行010
  • //d:删除在上述地址范围
  • p匹配模式行:输出图案空间
0

我会以'C-like'的方式写更多:

awk 'BEGIN{ ok = 0; FS = "|" } { if($1 == "000" && ok == 0) { ok = 1; } if(ok == 1) { print; } if($1 == "010") { ok = -1; } }' file