2012-08-29 188 views
1

任何人都知道我会如何找到&替换字符串中的文本?基本上我有两个字符串:使用C#查找并替换字符串中的文本

string firstS = "/9j/4AAQSkZJRgABAQEAYABgAAD/2wBDABQODxIPDRQSERIXFhQYHzMhHxwcHz8tLyUzSkFOTUlBSEZSXHZkUldvWEZIZoxob3p9hIWET2ORm4+AmnaBhH//2wBDARYXFx8bHzwhITx/VEhUf39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f3//"; 

string secondS = "abcdefg2wBDABQODxIPDRQSERIXFh/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/abcdefg"; 

我要搜索firstS,看它是否包含在secondS的任何字符序列,然后替换它。它还需要与替换的字符的平方括号中的数字所取代:

[NUMBER-OF-CHARACTERS置换]

例如,由于firstSsecondS都包含 “2wBDABQODxIPDRQSERIXFh” 和“/ f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39 /”,则需要更换。那么firstS变成:

string firstS = "/9j/4AAQSkZJRgABAQEAYABgAAD/[22]QYHzMhHxwcHz8tLyUzSkFOTUlBSEZSXHZkUldvWEZIZoxob3p9hIWET2ORm4+AmnaBhH//2wBDARYXFx8bHzwhITx/VEhUf39[61]f3//"; 

希望是有道理的。我想我可以用正则表达式来做到这一点,但我不喜欢它的低效率。有人知道另一种更快的方法吗?

+0

http://en.wikipedia.org/wiki/Longest_common_substring_problem –

回答

3

有没有人知道另一种更快的方式?

是的,这个问题实际上有一个正确的名称。它被称为Longest Common Substring,它有一个reasonably fast solution

这是an implementation on ideone。它会查找并替换十个字符或更长的所有常见子字符串。

// This comes straight from Wikipedia article linked above: 
private static string FindLcs(string s, string t) { 
    var L = new int[s.Length, t.Length]; 
    var z = 0; 
    var ret = new StringBuilder(); 
    for (var i = 0 ; i != s.Length ; i++) { 
     for (var j = 0 ; j != t.Length ; j++) { 
      if (s[i] == t[j]) { 
       if (i == 0 || j == 0) { 
        L[i,j] = 1; 
       } else { 
        L[i,j] = L[i-1,j-1] + 1; 
       } 
       if (L[i,j] > z) { 
        z = L[i,j]; 
        ret = new StringBuilder(); 
       } 
       if (L[i,j] == z) { 
        ret.Append(s.Substring(i-z+1, z)); 
       } 
      } else { 
       L[i,j]=0; 
      } 
     } 
    } 
    return ret.ToString(); 
} 
// With the LCS in hand, building the answer is easy 
public static string CutLcs(string s, string t) { 
    for (;;) { 
     var lcs = FindLcs(s, t); 
     if (lcs.Length < 10) break; 
     s = s.Replace(lcs, string.Format("[{0}]", lcs.Length)); 
    } 
    return s; 
} 
1
0

我有一个类似的问题,但对于出现的词语!所以,我希望这可以帮助。我用SortedDictionary和二叉搜索树

/* Application counts the number of occurrences of each word in a string 
    and stores them in a generic sorted dictionary. */ 
using System; 
using System.Text.RegularExpressions; 
using System.Collections.Generic; 

public class SortedDictionaryTest 
{ 
    public static void Main(string[] args) 
    { 
     // create sorted dictionary 
     SortedDictionary< string, int > dictionary = CollectWords(); 

     // display sorted dictionary content 
     DisplayDictionary(dictionary); 
    } 

    // create sorted dictionary 
    private static SortedDictionary< string, int > CollectWords() 
    { 
     // create a new sorted dictionary 
     SortedDictionary< string, int > dictionary = 
     new SortedDictionary< string, int >(); 

     Console.WriteLine("Enter a string: "); // prompt for user input 
     string input = Console.ReadLine(); 

     // split input text into tokens 
     string[] words = Regex.Split(input, @"\s+"); 

     // processing input words 
     foreach (var word in words) 
     { 
     string wordKey = word.ToLower(); // get word in lowercase 

     // if the dictionary contains the word 
     if (dictionary.ContainsKey(wordKey)) 
     { 
      ++dictionary[ wordKey ]; 
     } 
     else 
      // add new word with a count of 1 to the dictionary 
      dictionary.Add(wordKey, 1); 
     } 

     return dictionary; 
    } 

    // display dictionary content 
    private static void DisplayDictionary< K, V >(
     SortedDictionary< K, V > dictionary) 
    { 
     Console.WriteLine("\nSorted dictionary contains:\n{0,-12}{1,-12}", 
     "Key:", "Value:"); 

     /* generate output for each key in the sorted dictionary 
     by iterating through the Keys property with a foreach statement*/ 
     foreach (K key in dictionary.Keys) 
     Console.WriteLine("{0,- 12}{1,-12}", key, dictionary[ key ]); 

     Console.WriteLine("\nsize: {0}", dictionary.Count); 
    } 
} 
0

这可能是狗缓慢,但如果你愿意承担一些技术债务,需要现在进行原型设计的东西,你可以使用LINQ。

string firstS = "123abc"; 
string secondS = "456cdeabc123"; 
int minLength = 3; 

var result = 
    from subStrCount in Enumerable.Range(0, firstS.Length) 
    where firstS.Length - subStrCount >= 3 
    let subStr = firstS.Substring(subStrCount, 3) 
    where secondS.Contains(subStr) 
    select secondS.Replace(subStr, "[" + subStr.Length + "]"); 

结果

456cdeabc[3] 
456cde[3]123