2012-09-02 37 views
0

我知道在处理vb6之前我已经问过这种问题,并且它太慢了,所以我决定使用C#来完成这项工作;现在相同的代码以两倍的速度运行,但仍然太慢。使用C#的数组排列算法的辞典排序算法#

它慢的原因是它从每列的末尾开始检查所有行的词典排序。

我相信会加快这一点,如果我从第一列开始排序过程检查所有行,并检测该列的第一个字节的最低行,并可能多个行具有相同的第一个低字节并将它们分组对于检查第二个(下一个)列的下一个步骤,如果它们都是相同的移动到下一个列等等,则检查第二个字节中哪个是最低字节。如果它检测到下一个行字节的不同,那么列代码是为第一个字节完成的,然后继续找到第二个最低的..这实际上是我认为这个过程应该工作以获得一个很好的速度提升..但不幸的是,我对这个排序技术有很大的困惑,并最终使用了什么有人帮助我。

当前代码通过从最后一列对所有行进行蛮力排序来工作......然后它将一列向左移动并重新对每一行重新排序直到它到达第一列并对其排序。这很慢,因为它没有明显的原因进行迭代。

假设有256列和256行共65536个数组元素,使用当前代码并且说它必须多次对每行进行排序,直到每行得到适当的排序顺序。对于每列可能需要65,536次迭代。因此,每次调用该函数时总共估计256 * 65536 = 16,777,216迭代,这就是缓慢的原因。

我知道这是很多要求,但如果任何人有空闲时间,也许已经做到了这一点,可以帮助我,我会很感激。

这是我到目前为止所使用的代码。

byte[] sortArrayOfArraysLexicoGraphically(ref byte[] data) { 
    byte[] lexicoGraphicalIndexes; 
    long dataSize = data.Length; 
    long squareRootMinusOne; 
    int squareRoot; 
    int row = 0; 
    bool rowSwapped; 
    byte[] tmpRow; 

    squareRoot = (int)Math.Sqrt(dataSize); 
    tmpRow = new byte[squareRoot]; 
    squareRootMinusOne = squareRoot - 1; 
    lexicoGraphicalIndexes = new byte[squareRoot]; 

    for(short column = 0; column < lexicoGraphicalIndexes.Length; column++) { 
     lexicoGraphicalIndexes[column] = (byte)column; 
    } 

    for(long column = squareRootMinusOne; column >= 0; column -= 1) { 
     do { 
      rowSwapped = false; 
      do { 
       if(data[(row * squareRoot) + column] > data[((row + 1) * squareRoot) + column]) { 
        //Swaps a full row in a few copies. 
        //Copies full row to tmpRow 
        Buffer.BlockCopy(data, (row * squareRoot), tmpRow, 0, squareRoot); 
        //Replace first row with second row. 
        Buffer.BlockCopy(data, ((row + 1) * squareRoot), data, (row * squareRoot), squareRoot); 
        //Replace second row with tmpRow 
        Buffer.BlockCopy(tmpRow, 0, data, ((row + 1) * squareRoot), squareRoot); 
        swapBytes(ref lexicoGraphicalIndexes, row, row + 1); 
        rowSwapped = true; 
       } 
       row++; 
      } while (row < squareRootMinusOne); 
      row = 0; 
     } while (rowSwapped != false); 
    } 
    return lexicoGraphicalIndexes; 
} 

public void swapBytes(ref byte[] data, long firstIndex, long secondIndex) { 
    byte tmpFirstByte = data[firstIndex]; 
    data[firstIndex] = data[secondIndex]; 
    data[secondIndex] = tmpFirstByte; 
} 

回答

0

最后写这么长的怪物,但它似乎对一些测试运行,这样的伎俩..不确定它是否完美需要更多测试,我会在更多测试时更新。

int[] sortArrayOfArraysLexicoGraphically(ref byte[] data) { 
     int[] lexicoGraphicalIndexes; 
     long dataSize = data.Length; 
     int squareRoot; 
     bool rowSwapped; 

     squareRoot = (int)Math.Sqrt(dataSize); 
     lexicoGraphicalIndexes = new int[squareRoot]; 

     for(int column = 0; column < lexicoGraphicalIndexes.Length; column++) { 
      lexicoGraphicalIndexes[column] = column; 
     } 

     byte currentLowestRowByte = 255; //set to highest to avoid unassigned local variable error. 
     int previousLowestRowByte = -1; //this is only used after the second pass. 
     int lowestRowIndex = -1; //hopefully this won't mess anything up. 
     List<int> lowestRowIndexes = new List<int>(); 
     bool stillSorting = true; 
     int startRow = 0; //which row to start with, as the sorting process gets more sorted this number increases. 
     int startColumn = 0; //first column 

     while(stillSorting) { 
      //Resets 
      lowestRowIndexes.Clear(); 
      startColumn = 0; 
      currentLowestRowByte = 255; 
      lowestRowIndex = -1; 

      //first step finds the lowest row in the first column 
      for(int row = 0; row < squareRoot; row += 1) { 
       if(data[(row * squareRoot) + startColumn] <= currentLowestRowByte && 
        data[(row * squareRoot) + startColumn] > previousLowestRowByte) { 
        currentLowestRowByte = data[(row * squareRoot) + startColumn]; 
        lowestRowIndex = row; 
       } 
      } 

      //Resets for next pass. 
      previousLowestRowByte = currentLowestRowByte; 

      //Check if sorting process is already finished. (No matches found from step 1). 
      if(lowestRowIndex == -1) { 
       stillSorting = false; 
       break; 
      } 

      //second step finds all the similar rows with the current lowestRowByte. 
      for(int row = 0; row < squareRoot; row += 1) { 
       if(data[(row * squareRoot) + startColumn] == currentLowestRowByte) { 
        lowestRowIndexes.Add(row); 
       } 
      } 

      //third step loops through all lowestRowIndexes to find which one comes first, second, third, etc... 
      if(lowestRowIndexes.Count > 1) { 
       //This sorts the same rows, rows*rows amount of times, until they are sorted correctly. 
       rowSwapped = true; 
       while(rowSwapped != false) { 
        rowSwapped = false; 
        for (int row = 0; row < lowestRowIndexes.Count; row++) 
        { 
         if((row+1) >= lowestRowIndexes.Count) 
          break; 
         //Current first row byte checked with Next first row byte in lowestRowIndexes. 
         //If both are equal keep going unto next column until a break is found, if any break. 
         startColumn = 1; 
         while(rowSwapped == false) { 
          //Reached beyond the last column. 
          if(startColumn == squareRoot) 
           break; 

          if(data[(lowestRowIndexes[row] * squareRoot) + startColumn] == data[(lowestRowIndexes[row+1] * squareRoot) + startColumn]) 
           startColumn++; 

          if(data[(lowestRowIndexes[row] * squareRoot) + startColumn] < data[(lowestRowIndexes[row+1] * squareRoot) + startColumn]) { 
           break; //Sorted already, get out. 
          } else if(data[(lowestRowIndexes[row] * squareRoot) + startColumn] > data[(lowestRowIndexes[row+1] * squareRoot) + startColumn]) { 
           swapBytesRow(ref data, lowestRowIndexes[row], lowestRowIndexes[row+1], squareRoot); 
           swapBytes(ref lexicoGraphicalIndexes, lowestRowIndexes[row], lowestRowIndexes[row+1]); 
           rowSwapped = true; //a swap has occurred. 
           break; 
          } 
         } 
        } 
       } 

       //forth step re-sorts all the pre-sorted lowestRowIndexes into master array, using startRow variable. 
       foreach(int row in lowestRowIndexes) { 

        //First checks if row is already in the proper sorted location. 
        if(row != startRow) { 
         swapBytesRow(ref data, startRow, row, squareRoot); 
         swapBytes(ref lexicoGraphicalIndexes, startRow, row); 
         startRow++; //skip Rows starting from value < startRow as they are perfectly sorted. 
        } else { 
         startRow++; //skip Rows starting from value < startRow as they are perfectly sorted. 
        }      
       } 
      } else { 
       //Only one instance of this lowestRowByte existed. so obviously this is the next best sorted match. 
       swapBytesRow(ref data, startRow, lowestRowIndex, squareRoot); 
       swapBytes(ref lexicoGraphicalIndexes, startRow, lowestRowIndex); 
       startRow++; //skip Rows starting from value < startRow as they are perfectly sorted. 
      } 
     } 
     return lexicoGraphicalIndexes; 
    } 

public void swapBytes(ref byte[] data, long firstIndex, long secondIndex) { 
     byte tmpFirstByte = data[firstIndex]; 
     data[firstIndex] = data[secondIndex]; 
     data[secondIndex] = tmpFirstByte; 
    } 

public void swapBytes(ref int[] data, long firstIndex, long secondIndex) { 
     int tmpFirstByte = data[firstIndex]; 
     data[firstIndex] = data[secondIndex]; 
     data[secondIndex] = tmpFirstByte; 
    } 

public void swapBytesRow(ref byte[] data, int firstRowIndex, int secondRowIndex, int rowSize) { 
     byte[] tmpFirstRowBytes = new byte[rowSize]; 
     //Copies full row to tmpFirstRowBytes 
     Buffer.BlockCopy(data, (firstRowIndex * rowSize), tmpFirstRowBytes, 0, rowSize); 
     //Replace first row with second row. 
     Buffer.BlockCopy(data, (secondRowIndex * rowSize), data, (firstRowIndex * rowSize), rowSize); 
     //Replace second row with tmpFirstRowBytes 
     Buffer.BlockCopy(tmpFirstRowBytes, 0, data, (secondRowIndex * rowSize), rowSize); 
    } 
+1

您的代码存在的问题是您使用了具有O(N * N)复杂性的排序算法(类似于Bubble Sort)。您应该像[QuickSort](http://en.wikipedia.org/wiki/Quicksort)一样实施更好的排序算法,或像我一样使用.Net的排序功能。 –

+0

真的看不到其他任何方式。我排序共同具有相同的第一个字节的行,然后只是交换第一行,第二个等等。按顺序,这就是它。罕见情况下的主要排序代码最终可能最多排序5-7行。只包含共同第一个字节的行只是立即添加到顶部,而不进行排序只是交换先前在那里的内容。 – SSpoke

+1

我强烈建议从[这里]下载morelinq的二进制文件(http://code.google.com/p/morelinq/)并测试上述代码。我真的不能说我非常了解你的算法,我的代码也做了同样的事情,但至少你可以看到速度的不同。 (PS:你不必使用更多的LINQ,将输入数组分割成Sqrt(N)部分,我只是为了“易于使用”而使用它) –

1

我必须说你的排序算法非常糟糕。即使没有任何优化和使用基本的LINQ,你可以获得几十倍的加速。

我做了一个试验用的大小N *的数据N,其中N = 200(我不知道下面的代码是否完全匹配你的,是100%正确的,但至少你可以试试,看看结果)

List<byte[]> result = data.Batch(N) 
          .OrderBy(b => b, new ArrayComparer()) 
          .Select(b => b.ToArray()) 
          .ToList(); 

编辑

就地排序,甚至更快。

var list = data.Batch(N).Select(x => x.ToArray()).ToList(); 
list.Sort(new ArrayComparer()); 

-

public class ArrayComparer : IComparer<IEnumerable<byte>> 
{ 
    public int Compare(IEnumerable<byte> x, IEnumerable<byte> y) 
    { 
     var xenum = x.GetEnumerator(); 
     var yenum = y.GetEnumerator(); 
     while (xenum.MoveNext() && yenum.MoveNext()) 
     { 
      if (xenum.Current != yenum.Current) 
        return xenum.Current - yenum.Current; 
     } 
     return 0; 
    } 
} 

PS:Batchmorelinq扩展方法

+0

我想要一些自给自足的东西,我知道我没有提到那是我的错。真的不喜欢这些新技术像LinQ加上我测试后所需的额外文件,仍然没有工作无法理解'ThrowIfNull','ThrowIfNonPositive'可能还有一些其他文件,我不得不包括太..我结束了自己写这篇文章花了我4个小时,但似乎现在工作得更快..我将它作为答案张贴。 – SSpoke