删除字符串中的分隔符之间的文本（使用正则表达式？）

考虑要求找到一组匹配的字符集，并删除它们之间的任何字符，以及这些字符/分隔符。删除字符串中的分隔符之间的文本（使用正则表达式？）

这里是分隔符的集：

[] square brackets 
() parentheses 
"" double quotes 
'' single quotes

这里是字符串的一些例子，应符合：

Given:      Results In: 
------------------------------------------- 
Hello "some" World   Hello World 
Give [Me Some] Purple  Give Purple 
Have Fifteen (Lunch Today) Have Fifteen 
Have 'a good'day    Have day

而这不应该匹配字符串的一些例子：

Does Not Match: 
------------------ 
Hello "world 
Brown]co[w 
Cheese'factory

如果给定的字符串不包含一组匹配的分隔符，则不会修改它。输入字符串可能有许多匹配的分隔符对。如果一组2个分隔符重叠（即he[llo "worl]d"），那么这将是一个我们可以忽略的边界情况。

算法将是这个样子：

string myInput = "Give [Me Some] Purple (And More) Elephants"; 
string pattern; //some pattern 
string output = Regex.Replace(myInput, pattern, string.Empty);

问：你如何用C＃实现这一目标？我倾向于一个正则表达式。

Bonus：有没有简单的方法来匹配常量或某种列表中的开始和结束分隔符？如果业务分析师想出新的分隔符集合，我所寻找的解决方案将很容易更改分隔符。

来源

2009-08-31 p.campbell

简单的正则表达式是：

string input = "Give [Me Some] Purple (And More) Elephants"; 
string regex = "(\\[.*\\])|(\".*\")|('.*')|(\\(.*\\))"; 
string output = Regex.Replace(input, regex, "");

至于这样做要建立你只需要建立各部分的正则表达式的自定义方式：

('.*') // example of the single quote check

然后有每个单独的正则表达式部分与我的原始示例中的OR（| in正则表达式）连接。一旦你建立了正则表达式字符串，只需运行一次。关键是让正则表达式进入单个检查，因为在一个项目上执行许多正则表达式匹配，然后遍历大量项目可能会导致性能显着下降。

在我的第一个例子中，将采取以下行的地方：

string input = "Give [Me Some] Purple (And More) Elephants"; 
string regex = "Your built up regex here"; 
string sOutput = Regex.Replace(input, regex, "");

我相信有人会发表一个很酷的LINQ表达式来构建基于分隔符对象中匹配或东西的阵列上的正则表达式。

来源

2009-08-31 21:44:43 Kelsey

这现在可以用作（很可能）为“给我[一些]紫色（和更多）[大]大象”。这可以通过使用'。*？'来解决而不是上面提供的表达式中的'*'。 – mayu 2012-09-19 02:23:10

我必须添加旧的格言，“你有问题，你想使用正则表达式，现在你有两个问题。”

，我想出了一个快速的正则表达式，希望能帮助你的方向，你正在寻找：

[.]*(\(|\[|\"|').*(\]|\)|\"|')[.]*

括号，括号，双引号被转义，而单引号能留下单独。

为了将上面的表达式转换成英文，我允许之前和之后的任何数字的字符，匹配匹配分隔符之间的表达式。

开放分隔符短语为(\(|\[|\"|') 这有一个匹配的结尾短语。为了将来可以进一步扩展，您可以删除实际的分隔符并将它们包含在配置文件，数据库或任何您可能选择的位置。

来源

2009-08-31 21:29:14

+1正则表达式似乎可以做他需要的东西。只需一个简单的正则表达式。需要更换才能完成。 – James 2009-08-31 21:35:31

凹凸为“...现在你有两个问题。”，LOL – 2009-08-31 22:57:20

一个简单的方法是这样：

string RemoveBetween(string s, char begin, char end) 
{ 
    Regex regex = new Regex(string.Format("\\{0}.*?\\{1}", begin, end)); 
    return regex.Replace(s, string.Empty); 
} 

string s = "Give [Me Some] Purple (And More) \\Elephants/ and .hats^"; 
s = RemoveBetween(s, '(', ')'); 
s = RemoveBetween(s, '[', ']'); 
s = RemoveBetween(s, '\\', '/'); 
s = RemoveBetween(s, '.', '^');

改变return语句下面将避免重复空的空间：

return new Regex(" +").Replace(regex.Replace(s, string.Empty), " ");

这样做的最终结果将是：

"Give Purple and "

Disclamer：一个单一的正则表达式会可能比这更快。

来源

2009-08-31 21:30:11

OP没有提及'和帽子'。 “给我紫色和更多的大象”是OP明确要求的。你为什么要扭曲他的话，并为这个等式增加帽子？ – 2012-09-27 10:13:46

+1。发现自己回到这个线程，并没有意识到我发表了上述评论！幽默感不佳。感谢您的回答。 – 2013-10-07 10:42:24

为什么要戴帽子？我想这是我自己对幽默的不良尝试;）。很高兴看到这仍然有用。 – 2013-10-07 13:43:14

使用以下正则表达式

(\{\S*\})

什么这个表达式所做的就是它取代了modifiedWord你要替换它的{}一词任何出现次数。

一些C＃示例代码：

static readonly Regex re = new Regex(@"(\{\S*\})", RegexOptions.Compiled); 
     /// <summary> 
     /// Pass text and collection of key/value pairs. The text placeholders will be substituted with the collection values. 
     /// </summary> 
     /// <param name="text">Text that containes placeholders such as {fullname}</param> 
     /// <param name="fields">a collection of key values pairs. Pass <code>fullname</code> and the value <code>Sarah</code>. 
     /// DO NOT PASS keys with curly brackets <code>{}</code> in the collection.</param> 
     /// <returns>Substituted Text</returns> 
     public static string ReplaceMatch(this string text, StringDictionary fields) 
     { 
      return re.Replace(text, match => fields[match.Groups[1].Value]); 
     }

在一个句子，如

正则表达式的主人公是一个实时在线{{的Silverlight}}定期表达测试仪。

它只会取代{的Silverlight}，而不是从第一次开始{支架到最后}支架。

来源

2015-04-16 21:10:54 jaxxbo

大厦Bryan Menard's regular expression，我做了一个扩展方法也将嵌套更换工作像 “[测试1 [Test2的] Test3的] Hello World” 的：

/// <summary> 
    /// Method used to remove the characters betweeen certain letters in a string. 
    /// </summary> 
    /// <param name="rawString"></param> 
    /// <param name="enter"></param> 
    /// <param name="exit"></param> 
    /// <returns></returns> 
    public static string RemoveFragmentsBetween(this string rawString, char enter, char exit) 
    { 
     if (rawString.Contains(enter) && rawString.Contains(exit)) 
     { 
      int substringStartIndex = rawString.IndexOf(enter) + 1; 
      int substringLength = rawString.LastIndexOf(exit) - substringStartIndex; 

      if (substringLength > 0 && substringStartIndex > 0) 
      { 
       string substring = rawString.Substring(substringStartIndex, substringLength).RemoveFragmentsBetween(enter, exit); 
       if (substring.Length != substringLength) // This would mean that letters have been removed 
       { 
        rawString = rawString.Remove(substringStartIndex, substringLength).Insert(substringStartIndex, substring).Trim(); 
       } 
      } 

      //Source: https://stackoverflow.com/a/1359521/3407324 
      Regex regex = new Regex(String.Format("\\{0}.*?\\{1}", enter, exit)); 
      return new Regex(" +").Replace(regex.Replace(rawString, string.Empty), " ").Trim(); // Removing duplicate and tailing/leading spaces 
     } 
     else 
     { 
      return rawString; 
     } 
    }

这种方法的使用将在建议的情况如下所示：

string testString = "[Test 1 [[Test2] Test3]] Hello World"; 
testString.RemoveFragmentsBetween('[',']');

返回字符串“Hello World”。

来源

2016-08-23 08:53:38

黄金！谢谢！ – 2016-09-23 09:54:26

删除字符串中的分隔符之间的文本（使用正则表达式？）

回答

相关问题