检查对象数组的唯一性

我正在从文件（如CSV和Excel）读取数据，并且需要确保文件中的每一行都是唯一的。检查对象数组的唯一性

每行将被表示为object[]。由于当前的体系结构，这不能改变。此阵列中的每个对象可以有不同的类型（decimal,string,int等）。

的文件可以这个样子：

foo 1  5 // Not unique 
bar 1  5 
bar 2  5 
foo 1  5 // Not unique

的文件可能有200.000+行和列4-91。

我现在所拥有的代码看起来是这样的：

IList<object[]> rows = new List<object[]>(); 

using (var reader = _deliveryObjectReaderFactory.CreateReader(deliveryObject)) 
{ 
    // Read the row. 
    while (reader.Read()) 
    { 
     // Get the values from the file. 
     var values = reader.GetValues(); 

     // Check uniqueness for row 
     foreach (var row in rows) 
     { 
      bool rowsAreDifferent = false; 

      // Check uniqueness for column. 
      for (int i = 0; i < row.Length; i++) 
      { 
       var earlierValue = row[i]; 
       var newValue = values[i]; 
       if (earlierValue.ToString() != newValue.ToString()) 
       { 
        rowsAreDifferent = true; 
        break; 
       } 
      } 
      if(!rowsAreDifferent) 
       throw new Exception("Rows are not unique"); 
     } 
     rows.Add(values); 
    } 
}

所以，我的问题，是否可以更有效地完成？如使用散列，并检查散列的唯一性呢？

来源

2016-05-17 smoksnes

你确实意识到两个对象可能具有相同的散列并且仍然不相等，不是吗？换句话说，如果你的哈希是正确的，一个文件可能有重复哈希，但仍然有唯一的行。 – phoog

与自定义相等比较器一起使用HashSet 怎么样？ – Jehof

@phoog，是的，我很清楚这一点。解决方案将首先检查散列，如果散列相等，则必须检查其他值。但是也许首先检查散列效率更高，而不是总是检查所有的值。 – smoksnes

你可以使用一个HashSet<object[]>与自定义IEqualityComparer<object[]>这样的：

HashSet<object[]> rows = new HashSet<object[]>(new MyComparer()); 

while (reader.Read()) 
{ 
    // Get the values from the file. 
    var values = reader.GetValues();  
    if (!rows.Add(values)) 
     throw new Exception("Rows are not unique"); 
}

这MyComparer可以实现这样的：

public class MyComparer : IEqualityComparer<object[]> 
{ 
    public bool Equals(object[] x, object[] y) 
    { 
     if (ReferenceEquals(x, y)) return true; 
     if (ReferenceEquals(x, null) || ReferenceEquals(y, null) || x.Length != y.Length) return false; 
     return x.Zip(y, (a, b) => a == b).All(c => c); 
    } 
    public int GetHashCode(object[] obj) 
    { 
     unchecked 
     { 
      // this returns 0 if obj is null 
      // otherwise it combines the hashes of all elements 
      // like hash = (hash * 397)^nextHash 
      // if an array element is null its hash is assumed as 0 
      // (this is the ReSharper suggestion for GetHashCode implementations) 
      return obj?.Aggregate(0, (hash, o) => (hash * 397)^(o?.GetHashCode() ?? 0)) ?? 0; 
     } 
    } 
}

我不能完全肯定是否a==b部分作品适用于所有类型。

来源

2016-05-17 06:34:37

哦，只是看到@Jehof已经建议这个，当我正在写，所以你可能已经知道如何做到这一点... –

是的，我试了一下现在。但没有花哨的C＃6特性。 ;） – smoksnes

最后的回报声明看起来很可怕。我可能需要大量的咖啡和15分钟的时间来弄清楚它为什么会这样做。你介意添加一行还是两行，评论'？'操作符，以及为什么你乘以391？ – Marco

检查对象数组的唯一性

回答

相关问题