2017-07-23 56 views
0

我有一段文本,其中有多段由各种长度的虚线分开。我想用python来匹配段落之间的界限。我的要求是如下:匹配各种长度的刺激

  1. 匹配线仅包含虚线一个不同长度的线
  2. 线包括破折号和任何其它字符(多个)被排除

这里是一个样本文本块:

Believing neglected so so allowance existence departure in. 
In design active temper be uneasy. Thirty for remove plenty 
regard you summer though. He preference connection astonished 
on of yet. ------ Partiality on or continuing in particular principles as. 
Do believing oh disposing to supported allowance we. 
------- 
Admiration we surrounded possession frequently he. 
Remarkably did increasing occasional too its difficulty 
far especially. Known tiled but sorry joy balls. Bed sudden 

manner indeed fat now feebly. Face do with in need of 
wife paid that be. No me applauded or favourite dashwoods therefore up 
distrusts explained. 
----t-- 
------ 
And produce say the ten moments parties. Simple innate summer 
fat appear basket his desire joy. Outward clothes promise at gravity 
do excited. 
Sufficient particular impossible by reasonable oh expression is. Yet 
preference 
connection unpleasant yet melancholy but end appearance. And 
excellence partiality 
estimating terminated day everything. 
---------  

我已经试过如下:

r"-*.-"g or (.*?)-+ 

但是,我匹配所有包含两个或更多破折号的行,包括那些容器中的其他字符。

+0

可以通过'CHAR匹配特定长度的东西{MINLENGTH,MAXLENGTH}'或'CHAR {LENGTH}' – Luke

+1

你总是可以使用'^(M +) - + [^ \ S \ r \正] * $' – sln

回答

1

只需r"^[-]+$"应该工作。只要记得指定MULTILINE模式为^$分别匹配行的开始和行的结尾,而不仅仅是整个字符串的开始和结束。

实际上最后一行不匹配,因为它最后有空格。如果在破折号之后允许空格,则可以使用r"^[-]+[ ]*$"

另一件事 - 如果你也想只匹配的段落,而不是在最后,你可以使用之间的界限r"^[-]+[ ]*$[^\Z]"

编辑:从@ SLN的评论采取这里的一些细微差别,我忘了:

  1. 您可以通过在模式
  2. 的字符类[^\S\r\n]匹配所有空格除换行符的开始使用(?m)设置MULTILINE标志。您可以使用它而不是[ ],它仅匹配空格。