C＃ - 一个文本文件中删除重复的行

有人能证明文件是如何被检查重复的行，然后任何重复被删除或者覆盖现有文件，或创建一个新的文件与重复的行删除C＃ - 一个文本文件中删除重复的行

2011-06-17 Michael

@Felice Pollano没有伴侣，除非我一个28岁的学生：D – Michael

好的，但无论如何，你要求完成一项工作... –

如果你使用.NET4那么你可以使用的File.ReadLines和File.WriteAllLines组合：

var previousLines = new HashSet<string>(); 

File.WriteAllLines(destinationPath, File.ReadLines(sourcePath) 
             .Where(line => previousLines.Add(line)));

它的功能与LINQ的Distinct方法几乎相同，但有一个重要区别：Distinct的输出不能保证与输入序列的顺序相同。明确地使用HashSet<T>确实提供了这种保证。

来源

2011-06-17 15:04:50 LukeH

伪代码：

open file reading only 

List<string> list = new List<string>(); 

for each line in the file: 
    if(!list.contains(line)): 
     list.append(line) 

close file 
open file for writing 

for each string in list: 
    file.write(string);

来源

2011-06-17 14:59:01 mrK

男人，非常感谢你的伪代码帮了我很多 – BOSS

没问题的人。 – mrK

File.WriteAllLines(topath, File.ReadAllLines(frompath).Distinct().ToArray());

编辑：修改.NET 3.5

来源

2011-06-17 15:09:03 Blindy

如何文件的大都是我们讨论的工作？

一种策略可能是逐行读取一行，并将其加载到数据结构中，以便轻松检查现有项目，如Hashset<int>。我知道我可以可靠地使用GetHashCode（）对文件的每个字符串行进行散列（内部用于检查字符串相等性 - 这是我们想要确定重复项），并检查已知散列。所以，像

var known = new Hashset<int>(); 
using (var dupe_free = new StreamWriter(@"c:\path\to\dupe_free.txt")) 
{ 
    foreach(var line in File.ReadLines(@"c:\path\to\has_dupes.txt") 
    { 
     var hash = line.GetHashCode(); 
     if (!known.Contains(hash)) 
     { 
      known.Add(hash); 
      dupe_free.Write(line); 
     } 
    } 
}

或者，你可以利用的LINQ的Distinct()方法，并做到这一点的一条线，作为Blindy建议：

File.WriteAllLines(@"c:\path\to\dupe_free.txt", File.ReadAllLines((@"c:\path\to\has_dupes.txt").Distinct().ToArray());

来源

2011-06-17 15:10:35

@LukeH对，这就是为什么我的主要答案是读写他们在手写循环;哈希集是一个便宜的查找，并与gethashcode它保证正确的顺序和唯一性。 –

// Requires .NET 3.5 
private void RemoveDuplicate(string sourceFilePath, string destinationFilePath) 
{ 
    var readLines = File.ReadAllLines(sourceFilePath, Encoding.Default); 

    File.WriteAllLines(destinationFilePath, readLines.Distinct().ToArray(), Encoding.Default); 
}

来源

2011-06-17 15:10:53

C＃ - 一个文本文件中删除重复的行

回答

相关问题