在java中的字符串匹配

我目前正在用我的“脏字”过滤器找到部分匹配。在java中的字符串匹配

例如：如果我通过在这两个参数替换字（ “驴”， “传球传球传给屁股”）

这种方法

private static String replaceWord(String word, String input) { 
    Pattern legacyPattern = Pattern.compile(word, Pattern.CASE_INSENSITIVE); 
    Matcher matcher = legacyPattern.matcher(input); 
    StringBuilder returnString = new StringBuilder(); 
    int index = 0; 
    while(matcher.find()) { 
     returnString.append(input.substring(index,matcher.start())); 
     for(int i = 0; i < word.length() - 1; i++) { 
      returnString.append('*'); 
     } 
     returnString.append(word.substring(word.length()-1)); 

     index = matcher.end(); 
    } 
    if(index < input.length() - 1){ 
     returnString.append(input.substring(index)); 
    } 
    return returnString.toString(); 
}

我得到P * 唱p * SP ** sed的计划** S

当我真的只是想“路过通过**秒。有谁知道如何避免使用此方法？这部分匹配任何帮助将是巨大的感谢！

来源

2014-02-17 atsituab

所以，你想''屁股'前面的一些空白？ –

你想看看[Word边界]（http://docs.oracle.com/javase/tutorial/essential/regex/bounds.html） –

This tutorial from Oracle应该指出你在正确的方向。

你想在你的模式用一个词边界：

Pattern p = Pattern.compile("\\bword\\b", Pattern.CASE_INSENSITIVE);

但请注意，这仍然是有问题的（如亵渎过滤总是）。定义边界的“非单词字符”是[0-9A-Za-z_]

因此例如_ass将不匹配。

你也有亵渎派生词......其中术语被预先计划地说，“洞”，“消灭”的问题，等等

来源

2014-02-17 21:28:18

我工作的一个肮脏的字眼过滤器，因为我们说话，我选择的选项是Soundex和一些正则表达式。

我首先用\ w过滤掉奇怪的字符，它是[a-zA-Z_0-9]。

然后使用soundex（String）创建一个字符串，您可以根据要测试的单词的soundex字符串进行检查。

String soundExOfDirtyWord = Soundex.soundex(dirtyWord); 
String soundExOfTestWord = Soundex.soundex(testWord); 
if (soundExOfTestWord.equals(soundExOfDirtyWord)) { 
    System.out.println("The test words sounds like " + dirtyWord); 
}

我只是在程序中保留一个脏字的列表，并让SoundEx运行它们来检查。 algorithm是值得关注的东西。

来源

2014-02-17 21:58:44

您也可以使用Matcher类中的replaceAll()方法。它用您指定的替换词替换所有模式的出现。像下面的东西。

private static String replaceWord(String word, String input) { 
     Pattern legacyPattern = Pattern.compile("\\b" + word + "\\b", Pattern.CASE_INSENSITIVE); 
     Matcher matcher = legacyPattern.matcher(input); 
     String replacement = ""; 
     for (int i = 0; i < word.length() - 1; i++) { 
      replacement += "*"; 
     } 
     replacement += word.charAt(word.length() - 1); 
     return matcher.replaceAll(replacement); 
    }

来源

2014-02-17 22:08:29

在java中的字符串匹配

回答

相关问题