我目前正在处理包含格式化为数据块的文件信息的大型数据集。我正在尝试从文件路径行获取一段数据,并将其作为新列添加到特定行上。该数据集包含格式化的,像这样的文件信息:使用awk或sed格式化特定数据
File path: /d9b50a6f54d5a1f8/7b3d459a3454703c/a6d1040ea2c84e10/afcbe93ced71e5e6/2b517a561f5da8a6/aab17eb15d782d7b/af38f2bcc4998af0/0d8eb680024af333.jar
Inode Num: 22525898
Chunk Hash Chunk Size (bytes) Compression Ratio (tenth)
45:97:2a:60:e3:69 3208 10
7a:8b:8e:20:7b:38 1982 10
b9:45:3d:f4:97:88 1849 10
Whole File Hash: 865999b40fd9
File path: /d9b50a6f54d5a1f8/7b3d459a3454703c/a6d1040ea2c84e10/afcbe93ced71e5e6/2b517a561f5da8a6/1e82b13443330bb3/12fd3e87b2f62dc8/6e1a9f0b0a281564.c
Inode Num: 31881221
Chunk Hash Chunk Size (bytes) Compression Ratio (tenth)
e8:b0:cb:6f:76:ff 1344 10
19:c5:b2:aa:b3:60 613 10
11:7c:7e:76:4b:d5 1272 10
36:e0:59:49:b6:4a 581 10
9c:31:bc:8a:39:94 3296 10
01:f0:56:3a:e1:a9 1140 10
Whole File Hash: 4b28b44ae03d
我所想要做的是采取文件类型(.jar和.C在这个例子中),并追加到各自的块散列行,以便最终格式化看起来像:
File path: /d9b50a6f54d5a1f8/7b3d459a3454703c/a6d1040ea2c84e10/afcbe93ced71e5e6/2b517a561f5da8a6/aab17eb15d782d7b/af38f2bcc4998af0/0d8eb680024af333.jar
Inode Num: 22525898
Chunk Hash Chunk Size (bytes) Compression Ratio (tenth)
45:97:2a:60:e3:69 3208 10 .jar
7a:8b:8e:20:7b:38 1982 10 .jar
b9:45:3d:f4:97:88 1849 10 .jar
Whole File Hash: 865999b40fd9
File path: /d9b50a6f54d5a1f8/7b3d459a3454703c/a6d1040ea2c84e10/afcbe93ced71e5e6/2b517a561f5da8a6/1e82b13443330bb3/12fd3e87b2f62dc8/6e1a9f0b0a281564.c
Inode Num: 31881221
Chunk Hash Chunk Size (bytes) Compression Ratio (tenth)
e8:b0:cb:6f:76:ff 1344 10 .c
19:c5:b2:aa:b3:60 613 10 .c
11:7c:7e:76:4b:d5 1272 10 .c
36:e0:59:49:b6:4a 581 10 .c
9c:31:bc:8a:39:94 3296 10 .c
01:f0:56:3a:e1:a9 1140 10 .c
Whole File Hash: 4b28b44ae03d
我已经有awk的代码拉文件类型和块散列线:
awk 'match($0,/\..+/) {print substr($0,RSTART,RLENGTH)}'
awk '/Chunk Hash/{flag=1;next}/Whole File Hash:/{flag=0}flag'
我只是对如何使用这些连接件不知道wk(或sed)将文件类型作为新列附加到其各自数据块中的每一行上。另一件需要注意的是,我正试图在bash脚本中做到这一点,如果这有所作为。
某些行加倍,应删除从地址范围块的'p'命令。 – SLePort
@Kenavoz呃,是的,'N'没有'-n'选项打印......谢谢! –
这很好,谢谢! –