从字符串中删除字符串列表 - C＃.NET

我有一个需要从字符串中删除的停用词表的列表。从字符串中删除字符串列表 - C＃.NET

List<string> stopwordsList = stopwords.getStopWordList(); 
string text = PDF.getText(); 
foreach (string stopword in stopwordsList) 
{ 
    text = text.Replace(stopword, ""); 
} 
PDF.setText(text);

..in调试我可以看到stopwordsList被正确填充，但它似乎像text.Replace()是有没有任何效果。

我在做什么错？

编辑：注意我也试过text.Replace()本身，而不是text = text.Replace()。既没有工作。

来源

2013-12-10 John ' Mark' Smith

什么是gettext的函数的返回？ – Max

无法重现您的问题。 – ken2k

您是否调试过它并检查foreach循环的每次迭代中的停用词？我很确定这些都是不正确的，因为否则代码看起来很好。 – Tobberoth

虽然我不认为你的代码有什么问题，但我会做这样的事情。

string someText = "this is some text just some dummy text Just text"; 
List<string> stopwordsList = new List<string>() { "some", "just", "text" };  
someText = string.Join(" ", someText.Split().Where(w => !stopwordsList.Contains(w, StringComparer.InvariantCultureIgnoreCase)));

如果套管很重要，您可以忽略StringComparer.InvariantCultureIgnoreCase部分。

注意我也试着text.Replace（）自身，而不是文本= text.Replace（）

你应该知道，替换函数返回，如果你想它应该如何处理字符串更新的字符串。所以你现在基本上正在做。即text = text.Replace()

来源

2013-12-10 12:52:03 Ehsan

为什么downvote？ –

想知道的一样。 @huMptyduMpty – Ehsan

在OP没有任何其他输入的情况下，我也倾向于认为区分大小写是罪魁祸首。顺便说一下，如果停用词列表非常大，我将使用Hashset 而不是列表。 –

有一个问题，虽然......以前的所有解决方案都不考虑字边界。例如，'hell'这个词可能是个不好的词，但'hello'这个词完全有效。另外，替换应该只对完整的单词进行，否则你可能会得到奇怪的结果。

这里是码取单词边界到：

var text = "Hello world, this is a great test!"; 
var badWords = new List<string>() 
{ 
    "Hello", 
    "great" 
}; 

var wordMatches = Regex.Matches(text, "\\w+") 
    .Cast<Match>() 
    .OrderByDescending(m => m.Index); 

foreach (var m in wordMatches) 
    if (badWords.Contains(m.Value)) 
     text = text.Remove(m.Index, m.Length); 

Debug.WriteLine(text);

来源

2013-12-10 13:01:24

不错，反正这个工作更好;）：text = text.Remove（m.Index，m.Length + 1）; – Dragouf

并非总是如此。你假设在被删除的单词后面有一个[空格]，但它也可能是某种标点符号，它给句子（句号，问号，感叹号等）赋予了完整的含义。在原始答案中处理字符串会更聪明，然后删除重复的空格。有很多样本如何做到这一点。一个在这里：[链接]（http://stackoverflow.com/questions/206717/how-do-i-replace-multiple-spaces-with-a-single-space-in-c） –

你是对的。 – Dragouf

从字符串中删除字符串列表 - C＃.NET

回答

相关问题