在AWK中匹配多行的正则表达式。 &&运算符？

我不确定& &运算符是否在正则表达式中工作。我想要做的是匹配一行，使它以一个数字开头并具有字母'a'，并且下一行以数字开头并具有字母'b'和下一行...字母'c' 。这个abc序列将被用作开始读取文件的唯一标识符。在AWK中匹配多行的正则表达式。 &&运算符？

下面是我在awk中想要的。

/(^[0-9]+ .*a)&&\n(^[0-9]+ .*b)&&\n(^[0-9]+ .*c) { 
print $0 
}

这些正则表达式的作品一样只是一个（^ [0-9] +。* A），但我不知道如何把它们串起来与下一行是这样的。

我的文件将是这样的：

JUNK UP HERE NOT STARTING WITH NUMBER 
1  a   0.110  0.069   
2  a   0.062  0.088   
3  a   0.062  0.121   
4  b   0.062  0.121   
5  c   0.032  0.100   
6  d   0.032  0.100   
7  e   0.032  0.100

而我想要的是：

3  a   0.062  0.121   
4  b   0.062  0.121   
5  c   0.032  0.100   
6  d   0.032  0.100   
7  e   0.032  0.100

来源

2012-10-03 chimpsarehungry

对于你的情况，因为你的“条款”（要三个条件，共同）不重叠，你真的不需要任何操作可言，只是“吃掉”的剩下的按照@ m.buettner的建议行事。在条件_do_重叠的情况下，比如如果你想检查一行包含符号和数字（但你不知道顺序），那么你会使用所谓的“前瞻断言”来实现这种匹配。 –

我只知道前瞻断言是python中的next（）函数。我试图在下面的答案。 – chimpsarehungry

我对Python并不熟悉，但是我正在谈论的是前瞻和lookbehind构造，我知道它支持Python：http://www.regular-expressions.info/lookaround.html。 –

[更新基于澄清]

一个高位是AWK是面向行的语言，所以你不会真正能够做一个正常的模式匹配跨线。通常的做法是单独匹配每一行，并在后面的子句/语句中找出所有正确的部分是否匹配。

什么我在这里做的是在第二场寻找一个a在同一行，在另一条线路上的第二个领域的b，并在第二个字段中c上第三行。在前两种情况下，我将这一行的内容以及它发生的行号储存起来。当第三条线匹配并且我们还没有找到整个序列时，我回去检查另外两条线是否存在并且有可接受的线号。如果一切正常，我会打印出缓冲的前一行，并设置一个标志，指示其他所有内容都应该打印。

这里的脚本：

$2 == "a" { a = $0; aLine = NR; } 
$2 == "b" { b = $0; bLine = NR; } 
$2 == "c" && !keepPrinting { 
    if ((bLine == (NR - 1)) && (aLine == (NR - 2))) { 
     print a; 
     print b; 
     keepPrinting = 1; 
    } 
} 
keepPrinting { print; }

，这里是一个文件，我测试了它：

JUNK UP HERE NOT STARTING WITH NUMBER 
1  a   0.110  0.069 
2  a   0.062  0.088 
3  a   0.062  0.121 
4  b   0.062  0.121 
5  c   0.032  0.100 
6  d   0.032  0.100 
7  e   0.032  0.100 
8  a   0.099  0.121 
9  b   0.098  0.121 
10 c   0.097  0.100 
11 x   0.000  0.200

这里就是我得到的，当我运行它：

$ awk -f blort.awk blort.txt 
3  a   0.062  0.121 
4  b   0.062  0.121 
5  c   0.032  0.100 
6  d   0.032  0.100 
7  e   0.032  0.100 
8  a   0.099  0.121 
9  b   0.098  0.121 
10 c   0.097  0.100 
11 x   0.000  0.200

来源

2012-10-04 00:44:06 danfuzz

这与我想要的类似。我应该提到在我的文件中abc将是一个独特的序列。我将用它作为阅读的起点。所以我想从你的测试文件中得到的输出是带有a，b，c，d，e，a，b，c，x的行。 – chimpsarehungry

我根据你的意见更新了我的答案。您发布的状态机解决方案从学术的角度来看很有趣，但也许这样的一个更实用？ – danfuzz

感谢danfuzz。我比我的状态机更容易向我的老板解释脚本。我所做的只是添加{if（（keepPrinting> 0）&&（++ keepPrinting <= 50））print $ 0}以获得匹配后我想要的行数。 – chimpsarehungry

不，它不工作。你可以尝试这样的事情：

/(^[0-9]+.*a[^\n]*)\n([0-9]+.*b[^\n]*)\n([0-9]+.*c[^\n]*)/

并重复说明为你需要的字母数量。

[^\n]*将匹配尽可能多的非换行字符（因此可以换行）。

来源

2012-10-03 23:46:57

没有。谢谢你告诉我，虽然 – chimpsarehungry

你会得到什么？ –

什么也没有。 – chimpsarehungry

我想在Python中这样做。通过在行之外创建一个迭代器，并尝试将下几行与next（）进行匹配。

lines = iter([line for line in open("FILE").readlines() if re.match(r'^([0-9])',line)]) 

for line in lines: 
    count = 50 
    if line.find('a'): 
     if next(lines).find('b'): 
      if next(lines).find('c'): 
       while count > 0: 
        print line 
        count -=1

但它只是不正确。理想情况下，我会找到匹配并打印从'a'开始的接下来的50行。也许我需要实现某种状态机。

来源

2012-10-04 16:55:52 chimpsarehungry

一位朋友为我写了这个awk程序。这是一台状态机。它的工作原理。

#!/usr/bin/awk -f 

BEGIN { 
    # We start out in the "idle" state. 
    state = "idle" 
} 

/^[0-9]+[[:space:]]+q/ { 
    # Everytime we encounter a "# q" we either print it or go to the 
    # "q_found" state. 
    if (state != "printing") { 
     state = "q_found" 
     line_q = $0 
    } 
} 

/^[0-9]+[[:space:]]+r/ { 
    # If we are in the q_found state and "# r" immediate follows, 
    # advance to the r_found state. Else, return to "idle" and 
    # wait for the "# q" to start us off. 
    if (state == "q_found") { 
     state = "r_found" 
     line_r = $0 
    } else if (state != "printing") { 
     state = "idle" 
    } 
} 

/^[0-9]+[[:space:]]+l/ { 
    # If we are in the r_found state and "# l" immediate follows, 
    # advance to the l_found state. Else, return to "idle" and 
    # wait for the "# q" to start us off. 
    if (state == "r_found") { 
     state = "l_found" 
     line_l = $0 
    } else if (state != "printing") { 
     state = "idle" 
    } 
} 

/^[0-9]+[[:space:]]+i/ { 
    # If we are in the l_found state and "# i" immediate follows, 
    # we're ready to start printing. First, display the lines we 
    # squirrelled away then move to the "printing" state. Else, 
    # go to "idle" and wait for the "# q" to start us off. 
    if (state == "l_found") { 
     state = "printing" 
     print line_q 
     print line_r 
     print line_l 
     line = 0 
    } else if (state != "printing") { 
     state = "idle" 
    } 
} 

/^[0-9]+[[:space:]]+/ { 
    # If in state "printing", print 50 lines then stop printing 
    if (state == "printing") { 
     if (++line < 48) print 
    } 
}

来源

2012-10-04 18:44:11 chimpsarehungry

在AWK中匹配多行的正则表达式。 &&运算符？

回答

相关问题