如何用正则表达式替换部分字符串

我不是正则表达式的初学者，但是它们在perl中的使用看起来与在Java中有所不同。如何用正则表达式替换部分字符串

无论如何，我基本上有一个速记词和他们的定义字典。我想遍历字典中的单词并用它们的含义替换它们。在JAVA中这样做的最好方法是什么？

我看过String.replaceAll（），String.replace（），以及Pattern/Matcher类。我希望做的线沿线的一个不区分大小写的替代：

word =~ s/\s?\Q$short_word\E\s?/ \Q$short_def\E /sig

虽然我在这，你认为这是最好的提取所有字符串中的单词，然后申请我的字典或只是将字典应用于字符串？我知道我需要小心，因为速记词可以与其他速记含义的部分相匹配。

希望这一切都有道理。

谢谢。

澄清：

解释是这样的：笑：笑出声来，ROFL：在地上打滚大笑，LL：像柠檬

字符串是：笑，我是ROFL

替换文字：笑出声来，我在地板上笑嘻嘻笑

注意怎么没有添加到任何地方

来源

2010-09-24 ekawas

澄清：你的意思是你想要迭代字符串中的单词并用它的定义替换短字？例如，用一长串文本代替“例如，替换”与“例如免费替换”。如果不是，请提供前后示例。 – 2010-09-24 14:17:02

我更新了我的问题。这个例子是在底部 – ekawas 2010-09-24 15:06:04

危险是正常词语中的误报。 “fall”！=“felikes柠檬”

一种方法是将空白的词拆分（做多个空格需要保留？）然后循环执行'if contains（）{replace} else {output原创}的想法。

我的输出类将是一个StringBuffer

StringBuffer outputBuffer = new StringBuffer(); 
for(String s: split(inputText)) { 
    outputBuffer.append( dictionary.contains(s) ? dictionary.get(s) : s); 
    }

让你的分割方法很聪明，返回字分隔符也：

split("now is the time") -> now,<space>,is,<space>,the,<space><space>,time

那么你不必担心保护空白 - 上面的循环只会将不是字典单词的任何内容追加到StringBuffer中。

以下是retaining delimiters when regexing上最近的SO线程。

来源

2010-09-24 15:38:17

的第一件事情，是进入我的脑海里是这样的：

... 
// eg: lol -> laugh out loud 
Map<String, String> dictionatry; 

ArrayList<String> originalText; 
ArrayList<String> replacedText; 

for(String string : originalText) { 
    if(dictionary.contains(string)) { 
     replacedText.add(dictionary.get(string)); 
    } else { 
     replacedText.add(string); 
    } 
...

或者你可以使用一个StringBuffer来代替replacedText的。

来源

2010-09-24 15:09:29

你是否暗示我爆炸我的原始文本？另外，这里似乎有很多开销？你认为爆炸文本和保持这些数组比使用正则表达式更好（高效）吗？ – ekawas 2010-09-24 15:17:29

在Java中，String类是不可变的，所以一旦创建并初始化，就不能在同一个引用上进行更改。所以每个替换调用都会创建一个新的String。我建议这个实现的另一个原因是因为它很容易阅读和理解。你只需将你的大字符串分解成一个列表并将这2个列表保存在内存中。 – 2010-09-24 15:31:41

谢谢。我喜欢你的答案，但我用另一个。 – ekawas 2010-09-24 16:10:00

如果你坚持使用正则表达式，这会工作（以佐尔坦·巴拉兹字典映射方法）：

Map<String, String> substitutions = loadDictionaryFromSomewhere(); 
int lengthOfShortestKeyInMap = 3; //Calculate 
int lengthOfLongestKeyInMap = 3; //Calculate 

StringBuffer output = new StringBuffer(input.length()); 
Pattern pattern = Pattern.compile("\\b(\\w{" + lengthOfShortestKeyInMap + "," + lengthOfLongestKeyInMap + "})\\b"); 
Matcher matcher = pattern.matcher(input); 
while (matcher.find()) { 
    String candidate = matcher.group(1); 
    String substitute = substitutions.get(candidate); 
    if (substitute == null) 
     substitute = candidate; // no match, use original 
    matcher.appendReplacement(output, Matcher.quoteReplacement(substitute)); 
} 
matcher.appendTail(output); 
// output now contains the text with substituted words

如果您打算处理许多输入，预编译模式比使用String.split()更有效，它编译一个新的Pattern每个呼叫。

（编辑）编译所有的钥匙到一个单一的模式产生一个更有效的方法，就像这样：

Pattern pattern = Pattern.compile("\\b(lol|rtfm|rofl|wtf)\\b"); 
// rest of the method unchanged, don't need the shortest/longest key stuff

这使得正则表达式引擎跳过这事发生在足够短，但AREN任何言语在列表中，节省了大量的地图访问。

来源

2010-09-24 15:42:10 Barend

我不认为|'我的字典中的每个关键字都是一个好方法，因为在插入我的定义之前，我需要检查关键字是什么。 – ekawas 2010-09-24 16:08:48

这是检查隐式在'substitute = substitutions.get（candidate）'中。 – Barend 2010-09-24 19:05:25

如何用正则表达式替换部分字符串

回答

相关问题