2012-01-24 27 views
4

我试图用正则表达式从字符串中筛选出一些垃圾文本,但似乎无法使其正常工作。我不是一个正则表达式专家(甚至没有关闭),我搜索了类似的例子,但似乎没有解决我的问题。匹配多行字符串中特定单词前的所有内容

我需要一个正则表达式,它匹配从字符串开头到字符串中特定单词但不是单词本身的所有内容。

这里有一个例子:

<p>This is the string I want to process with as you can see also contains HTML tags like <i>this</i> and <strong>this</strong></p> 
<p>I want to remove everything in the string BEFORE the word "giraffe" (but not "giraffe" itself and keep everything after it.</p> 

那么,我该如何匹配单词“长颈鹿”之前在字符串中的一切吗?

谢谢!

回答

5
resultString = Regex.Replace(subjectString, 
    @"\A    # Start of string 
    (?:    # Match... 
    (?!""giraffe"") # (unless we're at the start of the string ""giraffe"") 
    .    # any character (including newlines) 
    )*    # zero or more times", 
    "", RegexOptions.Singleline | RegexOptions.IgnorePatternWhitespace); 

应该工作。

4

为什么是正则表达式?

String s = "blagiraffe"; 
s = s.SubString(s.IndexOf("giraffe")); 
+0

+1,至少只要“长颈鹿”仍然是固定字符串。 –

+0

这给了我指定字符串后的输出... – plast1K

0

一个look-ahead会做的伎俩:

^.*(?=\s+giraffe) 
0

你可以使用的模式与前瞻这样

^.*?(?=giraffe)

1

试试这个:

var s = 
     @"<p>This is the string I want to process with as you can see also contains HTML tags like <i>this</i> and <strong>this</strong></p> 
     <p>I want to remove everything in the string BEFORE the word ""giraffe"" (but not ""giraffe"" itself and keep everything after it.</p>"; 
    var ex = new Regex("giraffe.*$", RegexOptions.Multiline); 
    Console.WriteLine(ex.Match(s).Value); 

此代码片段产生以下输出:

giraffe" (but not "giraffe" itself and keep everything after it.</p> 
+0

很好的使用Match()! –

相关问题