如何在二维int数组中找到最常见的int？

好的，所以我刚开始考虑如何为Paint.NET实现一个新的图形插件，我需要知道如何在二维整数数组中找到最常见的整数。有没有内置的C＃方式来做到这一点？或者，有没有人有一个爽快的方式来做到这一点？如何在二维int数组中找到最常见的int？

数组将是这个样子：

300 300 300 300 300 300 300 
    0 150 300 300 300 300 300 
    0 0 150 300 300 300 300 
    0 0 0 0 300 300 300 
    0 0 0 0 150 300 300 
    0 0 0 0 0 150 300 
    0 0 0 0 0 0 300

我需要知道，300是阵列中最常用的号码。如果没有“最常见”，那么只需返回中心数（阵列减少总是奇数x奇数）0.

我将使用“强力”算法来实现这一点，除非您的专家能够出现用更快的东西。

任何帮助将非常感激。

谢谢！

编辑：更多信息...

值几乎总是非常不同的（比我的例子数组更加多样化）。值将在0-360的范围内。根据算法的速度，阵列的大小将为5x5至17x17。对于大图像中的每个像素，结果将会计算一次...所以速度越快越好。 ;）

来源

2009-01-29 BoltBait

听起来像一个有趣的问题 - 我敢打赌有一个答案。让我感兴趣。 – Jeffrey 2009-01-29 18:59:39

如果是平局（例如300和125都有相同的命中次数），你会怎么做？ – 2009-01-29 19:39:23

@迈克尔，在最初的问题中说：“如果没有”最常见的“，那么只需返回中心号码”，这意味着迄今为止发布的解决方案都不符合要求。 – BoltBait 2009-01-29 19:42:07

查看Paint.NET中的LocalHistogramEffect代码，特别是LocalHistorgramEffect.RenderRect。

我走过输入图像，用目标像素的'r'像素维持每个源像素的强度直方图。当输出像素被遍历时，它将前沿添加到直方图并减去后沿。它能很好地处理所有的边缘情况，而且速度非常快。这是“中性”，“未聚焦”，“轮廓”和“去除噪音”效果的基础。

调整这个以支持Hue而不是RGB强度将是相当微不足道的。

性能非常好，而且为了您的目的，它在O（r^2 + w r + n w）中运行，其中r是半径，w是图像的宽度，n是直方图中的层数。

-tjackson

来源

2009-04-03 17:30:03 ddrcoder

它至少是O（n * m）任何你切片的方式 - 你将不得不看每个单元格至少一次。节省的地方是在你寻找最常见之前积累每个价值的数量的地方;如果你的整数在一个相对较小的范围内变化（它们都是uint16，我们假设），那么你可以简单地使用平面数组而不是地图。

我猜你也可以保留，只要你已经小于（N * M的运行计数X，目前的顶部和第二最近的候选为“最常见的”和早期输出的ÿ ） - （xy）个细胞留下来看，因为在那时，亚军无法超越最佳人选。

这样的整数运算速度相当快;即使对于百万像素图像，蛮力算法也只需要几毫秒。

我注意到你已经编辑你的原始问题，说像素值从0..255 - 在这种情况下，肯定要用一个简单的平面数组;这个尺寸足够小，可以轻松适应l1 dcache，并且在平面阵列中的查找速度非常快。

编辑：一旦你建立了直方图数组，处理“没有最常见的数字”的情况是非常简单的：你所要做的就是遍历它，找到“最”和“第二大多数“常见数字;如果他们同样频繁，那么根据定义，没有人最常见。

const int numLevels = 360; // you said each cell contains a number [0..360) 
int levelFrequencyCounts[numLevels]; // assume this has been populated such that levelFrequencyCounts[i] = number of cells containing "i" 
int mostCommon = 0, runnerUp = 0; 
for (int i = 1 ; i < numLevels ; ++i) 
{ 
    if (levelFrequencyCounts[i] > levelFrequencyCounts[mostCommon]) 
    { 
    runnnerUp = mostCommon; 
    mostCommon = i; 
    } 
} 

if (levelFrequencyCounts[mostCommon] != levelFrequencyCounts[runnerUp]) 
{ 
    return mostCommon; 
} 
else 
{ 
    return CenterOfInputData; // (something like InputData[n/2][m/2]) 
}

来源

2009-01-29 19:04:38 Crashworks

我会怎么做在C＃这样的事情？

事情是这样的：

Dictionary<int, int> d = new Dictionary<int, int>(); 
foreach (int value in matrix) 
{ 
if (!d.ContainsKey(value)) 
    d.Add(value, 1); 
else 
    d[value] = d[value] + 1; 
} 
KeyValuePair<int, int> biggest = null; 
foreach (KeyValuePair<int, int> found in d) 
{ 
    if ((biggest == null) || (biggest.Value < found.Value)) 
    biggest = found; 
}

来源

2009-01-29 19:06:59 ChrisW

一种选择是LINQ - 有点效率低下，而且确定非巨大的阵列：

var max = (from cell in data.Cast<int>() 
       group cell by cell into grp 
       select new { Key = grp.Key, Count = grp.Count() } into agg 
       orderby agg.Count descending 
       select agg).First(); 
    Console.WriteLine(max.Key + ": " + max.Count);

或用交错数组：

var max = (from row in data 
       from cell in row 
       group cell by cell into grp 
       select new {Key = grp.Key, Count = grp.Count()} into agg 
       orderby agg.Count descending 
       select agg).First(); 
    Console.WriteLine(max.Key + ": " + max.Count);

实际上，我可能会使用字典/计数。这个没有LINQ的例子，只是“因为”：

Dictionary<int, int> counts = new Dictionary<int, int>(); 
    foreach (int value in data) 
    { 
     int count; 
     counts.TryGetValue(value, out count); 
     counts[value] = count + 1; 
    } 
    int maxCount = -1, maxValue = 0; 
    foreach (KeyValuePair<int, int> pair in counts) 
    { 
     if (pair.Value > maxCount) 
     { 
      maxCount = pair.Value; 
      maxValue = pair.Key; 
     } 
    } 
    Console.WriteLine(maxCount + ": " + maxValue);

来源

2009-01-29 19:07:37

如果速度是你最关心的问题，不要使用字典。坚持一个字节数组。试试这个：

// stores hit counts (0-360) 
short[] hitCounts = new short[361]; 

// iterate through 2d array and increment hit counts 
for (int i = 0; i < toEvaluate.Length; i++) 
{ 
    for (int j = 0; j < toEvaluate[i].Length; j++) 
     hitCounts[toEvaluate[i][j]]++; 
} 

int greatestHitCount = 0; // the hit count of the current greatest value 
int greatest = -1; // the current greatest valeu 

// iterate through values (0-360) and evalute hit counts 
for (int i = 0; i < hitCounts.Length; i++) 
{ 
    // the hit count of hitCounts[i] is higher than the current greatest hit count value 
    if (hitCounts[i] > greatestHitCount) 
    { 
     greatestHitCount = vals[i]; // store the new hit count 
     greatest = i; // store the greatest value 
    } 
    // there is already a value with the same hit count (which is the greatest) 
    else if (hitCounts[i] == greatestHitCount) 
     greatest = -1; // there are more than one value, we can't use this if it ends up being the greatest 
} 

if (greatest >= 0) // no greatest value found 
    return greatest; 

// figure out the middle x and y value 
int x = (toEvaluate.Length - 1)/2 + 1; 
int y = (toEvaluate[x].Length - 1)/2 + 1; 

// return the value at the center of the 2d array as the value 
return toEvaluate[x][y];

当速度成为关注可读性的问题时，最终会遇到一些难看的代码。以上这些肯定可以从重构中受益（因此过分注释），但它应该运行得很快。如果速度不够快，可以通过将其移至非托管代码来获得更多优化。

来源

2009-01-29 19:54:30

迈克尔打我的帖子，但我这样做，是这样的：

 int MaxValueIn2dArray(int[,] matrix) 
    { 
     var d = new int[360]; 
     int MaxValue = 0; 
     for (int x = 0; x <= matrix.GetUpperBound(0); x++) 
     { 
      for (int y = 0; y <= matrix.GetUpperBound(1); y++) 
      { 
       d[matrix[x, y]]++; 
      } 
     } 
     foreach (int value in d) 
     { 
      if (value > MaxValue) MaxValue = value; 
     } 
     return MaxValue; 
    }

这将需要为您的特定需求进行优化。

来源

2009-01-29 20:03:37 pro3carp3

您的图片：

300+ 300+ 300+ 300 300 300 300 
    0+ 150+ 300+ 300 300 300 300 
    0+ 0+ 150+ 300 300 300 300 
    0 0 0 0 300 300 300 
    0 0 0 0 150 300 300 
    0 0 0 0 0 150 300 
    0 0 0 0 0 0 300

标记（+）号是你的窗口。 w，h是你的窗口尺寸。申请bucket sorting（正如其他人所建议的，因为你的数值范围相当有限）。请不要像Crashworks暗示的那样减半评估。不要抛弃你的结果。这是第一步。

300- 300- 300- 300 300 300 300 
    0. 150. 300. 300 300 300 300 
    0. 0. 150. 300 300 300 300 
    0+ 0+ 0+ 0 300 300 300 
    0 0 0 0 150 300 300 
    0 0 0 0 0 150 300 
    0 0 0 0 0 0 300

改变你的窗户。而不是添加，减去您传递的最后一行/列中的存储桶并添加新的存储桶。通过这种方式，你可以检查每个像素2（w + h）次，即当它穿过窗口边界时，而不是w * h次，即当该像素在窗口中时，在天真的实现中。

换句话说，你需要将你的窗口是这样的：

| ^->|^
| | | | 
| | | | 
V->| V->|

我假设你正在试图实现非线性卷积过滤器。

更正欢迎。

来源

2009-01-29 22:43:29 artificialidiot

所有我提供的是该检查每一个细胞的任何算法（这是一个很值得你期待怎样做）做两件多余的东西：

1）确保程序退出时计对于当前最常用的值>（M x N/2）。如果网格上的内容覆盖率> 50％，那么这是最常见的值，不需要继续。如果你的例程只需要正确的时间大部分时间，那么你可以降低百分比，并把它当作启发式。你甚至可以运行一些分析，如果覆盖率> 37.6％，那么它会成为最常见的值的99.9％，然后使用该百分比。2.如果有任何方法可以确定最常见的值可能在哪一边，角落或一般位置（外边缘，中间等），那么可以按照这个顺序一起扫描使用上面的优化1可以减少很多扫描。例如，在你的例子中，右上角对共同价值很重要。如果这可以通过某种启发式方法确定，则可以以某种方式从右上角到左下角进行扫描。如果需要的扫描模式很复杂，请预先生成它。

来源

2009-02-20 17:22:15

如何在二维int数组中找到最常见的int？

回答

相关问题