将字符串拆分为基于标题的部分

我有一个名为“Section 1”...“Section 20”的几个部分的字符串，并且希望将这个字符串拆分为这些单独的部分。这里有一个例子：将字符串拆分为基于标题的部分

Stuff we don't care about 

Section 1 
Text within this section, may contain the word section. 

And go on for quite a bit. 

Section 15 
Another section

我想这个分成

["Section 1\n Text within this section, may contain the word section.\n\nAnd go in for quite a bit.", 
"Section 15 Another section"]

我感觉不得到它的权利相当愚蠢的。我的尝试总是捕捉一切。现在我有

/(Section.+\d+$[\s\S]+)/

但我无法从中得到贪婪。

来源

2014-01-17 MattW.

一旦遇到“第1部分”，是否要捕获其他所有内容？或者，你想忽略第20节之后的文字吗？您想要在部分，*总是*紧随其后的行，还是在段之间会有段落/空白行？ –

这个例子很清楚。他希望每个部分（标题+文本）都是数组。 – robertodecurnex

有帮助吗？ –

在我看来，Regexp分裂文字如下：

/(?:\n\n|^)Section/

因此，代码为：

str = " 
Stuff we don't care about 

Section 1 
Text within this section, may contain the word section. 

And go on for quite a bit. 

Section 15 
Another section 
" 

newstr = str.split(/(?:\n\n|^)Section/, -1)[1..-1].map {|l| "Section " + l.strip } 
# => ["Section 1\nText within this section, may contain the word section.\n\nAnd go on for quite a bit.", "Section 15\nAnother section"]

来源

2014-01-17 16:46:08

对不起，每个部分中的文本更复杂，可能包含换行符等。我会更新它。 –

@MattW。我已更新答案 –

你可以使用这个表达式：

(?m)(Section\s*\d+)(.*?\1)$

Live demo

来源

2014-01-17 17:22:19 revo

我无法正常工作。我在最后忽略了“另一部分”，并给出奇怪的比赛 – robertodecurnex

@robertodecurnex你错了。 “另一部分”的意思是“第16部分”，例如，它虽然工作。 – revo

不，刚拿了样本，并使用你的链接 - > http://www.rubular.com/r/euxXwqo03d – robertodecurnex

您可以使用scan与此正则表达式/Section\s\d+\n(?:.(?!Section\s\d+\n))*/m

string.scan(/Section\s\d+\n(?:.(?!Section\s\d+\n))*/m)

Section\s\d+\n将匹配任何节头

(?:.(?!Section\s\d+\n))*将匹配任何东西，除了另一节头。

m将使点匹配换行符太

sample = <<SAMPLE 
Stuff we don't care about 

Section 1 
Text within this section, may contain the word section. 

And go on for quite a bit. 

Section 15 
Another section 
SAMPLE 

sample.scan(/Section\s\d+\n(?:.(?!Section\s\d+\n))*/m) 
#=> ["Section 1\nText within this section, may contain the word section.\n\nAnd go on for quite a bit.\n", "Section 15\nAnother section\n"]

来源

2014-01-17 18:18:39 robertodecurnex

我认为最简单的办法是：

str = "Stuff we don't care about 

Section 1 
Text within this section, may contain the word section. 

And go on for quite a bit. 

Section 15 
Another section" 

str[/^Section 1.+/m] # => "Section 1\nText within this section, may contain the word section.\n\nAnd go on for quite a bit.\n\nSection 15\nAnother section"

如果你在Section头破段，开始以同样的方式，然后取Enumerable的优势slice_before：

str = "Stuff we don't care about 

Section 1 
Text within this section, may contain the word section. 

And go on for quite a bit. 

Section 15 
Another section" 

str[/^Section 1.+/m].split("\n").slice_before(/^Section \d+/m).map{ |a| a.join("\n") } 
# => ["Section 1\nText within this section, may contain the word section.\n\nAnd go on for quite a bit.\n", 
#  "Section 15\nAnother section"]

slice_before文档说：

为每个分块元素创建一个枚举器。块的开始由模式和块定义。

来源

2014-01-17 19:06:49

请注意，第一行右侧有逗号。示例中有2个元素。 – robertodecurnex

这只会让你更容易。谢谢。 –

将字符串拆分为基于标题的部分

回答

相关问题