字符串分割到三列使用正则表达式

我有串象下面这样：字符串分割到三列使用正则表达式

rta_geo5: 09/24/14 15:10:38 - Reset_count = 6 
rta_geo5: 09/24/14 15:10:38 - restarting 
rta_geo5: 09/24/14 15:10:38 - memory allocation: 3500 lines

我的目标是这个字符串分割成三列，所以我可以把这个到数据库表：

------------------------------------------------------------- 
    | COL1  |  COL 2  | COL 3      | 
    ------------------------------------------------------------- 
    | rta_geo5 | 09/24/14 15:10:38 |Reset_count = 6    | 
    ------------------------------------------------------------- 
    |rta_geo5 | 09/24/14 15:10:38 |restarting     | 
    ------------------------------------------------------------- 
    | rta_geo5 | 09/24/14 15:10:38 |memory allocation: 3500 lines | 
    -------------------------------------------------------------

将使用以下语句可能吗？

string[] substrings = Regex.Split(input, pattern);

我只是需要适当的正则表达式。

来源

2014-09-25 ironcurtain

您是否试图自己构建模式？它是如何去的？ – Utkanos 2014-09-25 11:19:16

你想如何区分'rta_geo5：'和'allocation：'？你想用什么严格的规则来拆分？ – 2014-09-25 11:30:52

这看起来可能是固定的宽度。如果是这样，我个人只是拔出所需的子串。 – juharr 2014-09-25 11:47:37

而是分裂的，你可以使用named groups in regex

模式：

Regex ptrn = new Regex(@"^(?<col1>[^:]+):\s+(?<col2>\d{2}/\d{2}/\d{2} \d{2}:\d{2}:\d{2})\s+-\s+(?<col3>[^\r\n]+?)\s*$", 
    RegexOptions.ExplicitCapture|RegexOptions.IgnoreCase|RegexOptions.Multiline);

用法：

string s = @"rta_geo5: 09/24/14 15:10:38 - Reset_count = 6 
rta_geo5: 09/24/14 15:10:38 - restarting 
rta_geo5: 09/24/14 15:10:38 - memory allocation: 3500 lines"; 

var matches = ptrn.Matches(s);

访问：

matches.OfType<Match>() 
    .Select(match => new string[] 
     { 
     match.Groups["col1"].Value, 
     match.Groups["col2"].Value, 
     match.Groups["col3"].Value 
     }) 
    .ToList().ForEach(a=>System.Console.WriteLine(string.Join("\t|\t",a)));

或者：

foreach (Match match in matches) 
     { 
      string col1 = match.Groups["col1"].Value; 
      string col2 = match.Groups["col2"].Value; 
      string col3 = match.Groups["col3"].Value; 
      System.Console.WriteLine(col1 + "\t|\t" + col2 + "\t|\t" + col3); 
     }

输出：

rta_geo5 | 09/24/14 15:10:38 | Reset_count = 6 
rta_geo5 | 09/24/14 15:10:38 | restarting 
rta_geo5 | 09/24/14 15:10:38 | memory allocation: 3500 lines

来源

2014-09-25 11:56:40 Arie

这对我有用。谢谢！ – ironcurtain 2014-09-25 12:40:29

分裂这个：

(?:(?<=geo5):\s|(?<=\d{2}:\d{2}:\d{2})\s-\s)

演示在这里：

http://regex101.com/r/xF7iD7/1

来源

2014-09-25 11:21:00 aelor

我不会用正则表达式（或String.Split）对于这一点，但在这里你分析每一行的循环。我还会使用自定义类映射到数据库表以增加可重用性和可重用性。

类（简体）：

public class Data 
{ 
    public string Token1 { get; set; } // use a meaningful name 
    public string Token2 { get; set; } // use a meaningful name 
    public DateTime Date { get; set; } // use a meaningful name 

    public override string ToString() 
    { 
     return string.Format("Token1:[{0}] Date:[{1}] Token2:[{2}]", 
      Token1, 
      Date.ToString("MM/dd/yy HH:mm:ss", CultureInfo.InvariantCulture), 
      Token2); 
    } 
}

您的样本字符串：

：

string data = @"rta_geo5: 09/24/14 15:10:38 - Reset_count = 6 
rta_geo5: 09/24/14 15:10:38 - restarting 
rta_geo5: 09/24/14 15:10:38 - memory allocation: 3500 lines";

现在你可以使用普通字符串的方法来解析文本到List<Data>使用这个循环

string[] lines = data.Split(new[] { Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries); 
List<Data> allData = new List<Data>(); 
foreach (string line in lines) 
{ 
    string token1 = null, token2 = null; 
    DateTime dt; 
    int firstColonIndex = line.IndexOf(": "); 
    if (firstColonIndex >= 0) 
    { 
     token1 = line.Remove(firstColonIndex); 
     firstColonIndex += 2; // start next search after first token to find DateTime 
     int indexOfMinus = line.IndexOf(" - ", firstColonIndex); 
     if (indexOfMinus >= 0) 
     { 
      string datePart = line.Substring(firstColonIndex, indexOfMinus - firstColonIndex); 
      if (DateTime.TryParseExact(datePart, "MM/dd/yy HH:mm:ss", CultureInfo.InvariantCulture, DateTimeStyles.None, out dt)) 
      { 
       indexOfMinus += 3; // start next search after DateTime to get last token 
       token2 = line.Substring(indexOfMinus); 
       Data d = new Data { Token1 = token1, Token2 = token2, Date = dt }; 
       allData.Add(d); 
      } 
     } 
    } 
}

测试：

foreach (Data d in allData) 
    Console.WriteLine(d.ToString()); 

Token1:[rta_geo5] Date:[09/24/14 15:10:38] Token2:[Reset_count = 6] 
Token1:[rta_geo5] Date:[09/24/14 15:10:38] Token2:[restarting] 
Token1:[rta_geo5] Date:[09/24/14 15:10:38] Token2:[memory allocation: 3500 lines]

此方法比其他方法更详细但更有效/可维护。它还允许记录异常或使用其他方法解析它。

来源

2014-09-25 11:49:24

不知道什么是错的，但在我的PC上输出如下： row1：Token1：[data1] Date：[date] Token2：[data2 row2：data3 date data3] – ironcurtain 2014-09-25 12:43:26

@ironcurtain：我不知道。你有没有使用他的样本数据（'字符串数据= @ ...'）？我再次测试了代码，它正确地显示了上面的结果。你的string []'lines'包含了什么？你有没有复制粘贴换行符？ – 2014-09-25 12:45:33

我认为有一个问题，因为字符串是从UNIX系统中检索的，因为我检查了一些行没有断行。我决定将文件复制到本地计算机，然后拆分这些列。我没有测试你的解决方案，但我认为它会起作用。 – ironcurtain 2014-10-03 08:54:48

好了，有一个思考这个，不知道这是100％，但尝试：

(rta_geo5): (.*?) - (.*)

应根据需要将其分成3组。但是，它假设前导标识符始终为(rta_geo5)。

[编辑] -I通知裁判在线服务的正则表达式的答案之一，所以你可以尝试使用内部的我正则表达式：http://regex101.com/r/xF7iD7/1（对不起，没有账号还存在 - 但会马上创造） - 同样，关于rta_geo5块，你当然可以去完全本地与

(.*): (.*) - (.*)

看看它是如何工作无论哪种方式

来源

2014-09-25 11:55:58

字符串分割到三列使用正则表达式

回答

相关问题