2013-09-24 60 views
-1

我使用this算法来查找2个字符串之间的公共子字符串。请帮助我做到这一点,但使用Array该字符串的常见子字符串,我应该忽略我的函数。最长的子字符串排除字符串列表

我的代码在Java中:

public static String longestSubstring(String str1, String str2) { 

     StringBuilder sb = new StringBuilder(); 
     if (str1 == null || str1.isEmpty() || str2 == null || str2.isEmpty()) { 
      return ""; 
     } 

     // java initializes them already with 0 
     int[][] num = new int[str1.length()][str2.length()]; 
     int maxlen = 0; 
     int lastSubsBegin = 0; 

     for (int i = 0; i < str1.length(); i++) { 
      for (int j = 0; j < str2.length(); j++) { 
       if (str1.charAt(i) == str2.charAt(j)) { 
        if ((i == 0) || (j == 0)) { 
         num[i][j] = 1; 
        } else { 
         num[i][j] = 1 + num[i - 1][j - 1]; 
        } 

        if (num[i][j] > maxlen) { 
         maxlen = num[i][j]; 
         // generate substring from str1 => i 
         int thisSubsBegin = i - num[i][j] + 1; 
         if (lastSubsBegin == thisSubsBegin) { 
          //if the current LCS is the same as the last time this block ran 
          sb.append(str1.charAt(i)); 
         } else { 
          //this block resets the string builder if a different LCS is found 
          lastSubsBegin = thisSubsBegin; 
          sb = new StringBuilder(); 
          sb.append(str1.substring(lastSubsBegin, i + 1)); 
         } 
        } 
       } 
      } 
     } 

     return sb.toString(); 
    } 

所以,我的功能应该是这样的:

public static String longestSubstring(String str1, String str2, String[] ignore) 
+0

您目前面临的解决方案有哪些问题?似乎对我来说代码很好。 – Meesh

+0

代码没有问题。阅读最后一条语句 – Prateek

+0

附注:在许多情况下,您忽略的一组字符串(停用词)最好存储在散列表/字典数据结构中。这是因为如果每次都必须遍历它,大量被忽略的单词会削弱您的算法。我对你的算法的建议是构建这个HashMap,然后在你的循环的深度,当你生成子串时,ping这个单词以查看它是否存在于被忽略的单词Hash中,并且只有当它不存在时才添加它。 – DRobinson

回答

0

据我了解,你要忽略那些包含至少一个串子ignore

if (str1.charAt(i) == str2.charAt(j)) { 
    if ((i == 0) || (j == 0)) { 
     num[i][j] = 1; 
    } else { 
     num[i][j] = 1 + num[i - 1][j - 1]; 
    } 


    // we must update `sb` on every step so that we can compare it with `ignore` 
    int thisSubsBegin = i - num[i][j] + 1; 
    if (lastSubsBegin == thisSubsBegin) { 
     sb.append(str1.charAt(i)); 
    } else { 
     lastSubsBegin = thisSubsBegin; 
     sb = new StringBuilder(); 
     sb.append(str1.substring(lastSubsBegin, i + 1)); 
    } 

    // check whether current substring contains any string from `ignore`, 
    // and if it does, find the longest one 
    int biggestIndex = -1; 
    for (String s : ignore) { 
     int startIndex = sb.lastIndexOf(s); 
     if (startIndex > biggestIndex) { 
      biggestIndex = startIndex;  
     } 
    }  

    //Then sb.substring(biggestIndex + 1) will not contain strings to be ignored 
    sb = sb.substring(biggestIndex + 1); 
    num[i][j] -= (biggestIndex + 1); 

    if (num[i][j] > maxlen) { 
     maxlen = num[i][j]; 
    } 
} 

如果忽略那些正是一样ignore任何串子, 然后当最长公共子候选发现,迭代ignore,并检查是否有电流子它。

0

为其中一个字符串创建一个后缀树,并运行第二个字符串以查看哪个子字符串可以在后缀树中找到。

关于后缀树的信息:http://en.wikipedia.org/wiki/Suffixtree