2016-11-19 30 views
0

假设我们有一行文本存储在一个文件:Linux中稍后提取模式字符串和其他模式字符串的简短方法是什么?

// In the actual file this will be one line 
{unrelated_text1,ID:13, unrelated_text2,TIMESTAMP:1476280500,unrelated_text3}, 
{other_unrelated_text1,other_unrelated_text2,ID:25,TIMESTAMP:1476280600}, 
{ID:30,more_unrelated_text1,TIMESTAMP:1476280700}, 
{ID:40,final_unrelated_text} 

我要的是这个特定的输入提取3项

// The details, such as whether to put { character in front or not do not matter. 
// Any form of output which extracts only these 3 entries and groups them in a 
// visually nice way will do the job. 
{ID:13, TIMESTAMP:1476280500} 
{ID:25, TIMESTAMP:1476280600} 
{ID:30, TIMESTAMP:1476280700} 
// I do not want the last entry, because it does not contain timestamp field. 

到目前为止最接近的命令我发现是

grep -Po {ID:[0-9]+(.+?)} input_file 

它给出输出

{unrelated_text1,ID:13,unrelated_text2,TIMESTAMP:1476280500,unrelated_text3} 
{other_unrelated_text1,other_unrelated_text2,ID:25,TIMESTAMP:1476280600} 
{ID:30,more_unrelated_text1,TIMESTAMP:1476280700} 
{ID:40,final_unrelated_text} 

下次改进我正在寻找的是如何从每个条目中删除unrelated_text,并删除最后一个条目。

问题:在Linux中最简单的方法是什么?

回答

1

随着GNU AWK多焦RS和RT和单词边界:

$ awk -v RS='\\<(ID|TIMESTAMP):[0-9]+' 'NR%2{id=RT;next} RT{printf "{%s, %s}\n", id, RT}' file 
{ID:13, TIMESTAMP:1476280500} 
{ID:25, TIMESTAMP:1476280600} 
{ID:30, TIMESTAMP:1476280700} 

以上将工作不管输入是在一行或多行,也不管你有什么其他的文本该文件所依赖的是在每个相关TIMESTAMP之前出现的ID,并且在必要时不难更改。

相关问题