与多行正则表达式

所以我想处理下面的文本。我想要的是从每个课程的学分开始按照季节和年份结束数据。所以对于第一类它看起来像这样：与多行正则表达式

3 credits in Philosophical Perspectives 
PHIL 101L 
PHILOSOPHICAL PERSPECTIVES 
B 
3 
Fall 2014

另外我需要得到他们仍然需要的类。如果您注意到他们在历史中缺少3个学分。这里是我的文字：

3 credits in Philosophical Perspectives 
PHIL 101L 
PHILOSOPHICAL PERSPECTIVES 
B 
3 
Fall 2014 
Student View 
3 credits in Fine Arts 
ART 160L 
HIST WEST ART I 
B+ 
3 
Fall 2014 
3 credits in History 
Still Needed: 
Click here to see classes that satisfy this requirement. 
3 credits in Literature 
ENG 201L 
INTRO LINGUISTIC 
IP 
(3) 
Spring 2016 
3 credits in Math 
Still Needed: 
Click here to see classes that satisfy this requirement. 
3 credits in Natural Science 
BIOL 225L 
TOPICS IN NUTRITION 
A- 
3 
Spring 2015 
3 credits Ethics/Applied Ethics/Religious Studies 
REST 209L 
WORLD RELIGIONS 
A- 
3 
Spring 2015 
3 credits in Social Science 
ECON 104L 
PRINC MACROECONOM 
T 
3 
Fall 2014

来源

2016-03-13 MrCokeman

还有，你试过吗？正则表达式有一个多行修饰符 –

我只能得到这个。（\ d credits）（。*）（？= \ n）。只抓住第一行。我对于正则表达式很新，并没有真正掌握它。 – MrCokeman

(?:^|(?<=\n))\d+\s+credits[]\s\S]*?(?=\n\d+\s+credits|$)

您可以findall。看到演示使用。

https://regex101.com/r/gK9aI6/1

import re 
p = re.compile(r'(?:^|(?<=\n))\d+\s+credits[]\s\S]*?(?=\n\d+\s+credits|$)') 
test_str = "3 credits in Philosophical Perspectives\nPHIL 101L\nPHILOSOPHICAL PERSPECTIVES\nB\n3\nFall 2014\nStudent View\n3 credits in Fine Arts\nART 160L\nHIST WEST ART I\nB+\n3\nFall 2014\n3 credits in History\nStill Needed:\nClick here to see classes that satisfy this requirement.\n3 credits in Literature\nENG 201L\nINTRO LINGUISTIC\nIP\n(3)\nSpring 2016\n3 credits in Math\nStill Needed:\nClick here to see classes that satisfy this requirement.\n3 credits in Natural Science\nBIOL 225L\nTOPICS IN NUTRITION\nA-\n3\nSpring 2015\n3 credits Ethics/Applied Ethics/Religious Studies\nREST 209L\nWORLD RELIGIONS\nA-\n3\nSpring 2015\n3 credits in Social Science\nECON 104L\nPRINC MACROECONOM\nT\n3\nFall 2014" 

re.findall(p, test_str)

来源

2016-03-13 17:37:38 vks

感谢这个答案效果最好！ – MrCokeman

您可以结合非贪婪“什么”序列，并使用每组的最后一行的已知结构，把它解析成大块：

/((?:.\n?)*?(?:Fall|Summer|Spring|Winter)\s\d{4})/g

(?:.\n?)*? - 吃任何字符（可能后面带有换行符）一次性
然后简单地与最终序列：(?:Fall|Summer|Spring|Winter)\s\d{4}

See the demo here和注意，每个信贷实际上是在单一的正则表达式匹配。

来源

2016-03-13 17:58:54 sweaver2112

尝试下面的代码片段：

import re 

courses = r"....your...content" 

rx = re.compile(r"\d+.*?(?:FALL|SPRING)\s*\d{4}", re.IGNORECASE | re.DOTALL) 
for course in rx.finditer(courses): 
    print(course.group()) 
    print("----------------------------\n")

如果courses包含示例内容，输出将是：

3 credits in Philosophical Perspectives 
PHIL 101L 
PHILOSOPHICAL PERSPECTIVES 
B 
3 
Fall 2014 
---------------------------- 

3 credits in Fine Arts 
ART 160L 
HIST WEST ART I 
B+ 
3 
Fall 2014 
---------------------------- 

3 credits in History 
Still Needed: 
Click here to see classes that satisfy this requirement. 
3 credits in Literature 
ENG 201L 
INTRO LINGUISTIC 
IP 
(3) 
Spring 2016 
---------------------------- 

... omitting rest....

来源

2016-03-13 18:06:16 Saleem

与多行正则表达式

回答

相关问题