2009-08-11 17 views
65

我想要某种保留自然排序顺序的字符串比较函数。有没有像Java这样内置的东西?我在String class中找不到任何东西,而Comparator class只知道两个实现。Java中的自然排序顺序字符串比较 - 是否内置?

我可以推出自己的(这不是一个很难的问题),但我宁愿没有,如果我没有重新发明轮子。

在我的特定情况下,我有我要排序的软件版本字符串。所以我想“1.2.10.5”被认为比“1.2.9.1”更大。


“天然”的排序顺序,我的意思是比较字符串人将它们进行比较,而不是“ASCII-betical”排序顺序,只有有意义的程序员的方式。换句话说,“image9.jpg”小于“image10.jpg”,“album1set2page9photo1.jpg”小于“album1set2page10photo5.jpg”,“1.2.9.1”小于“1.2.10.5”

+0

搞笑,每个人 - 重读的问题,并删除该贴的答案! .. :)我想这是DOWNVOTE的力量! :);) – OscarRyz 2009-08-11 19:01:09

+2

顺便说一句。在讨论字符串时,数字不是自然顺序,所以这个问题有点令人误解。 – OscarRyz 2009-08-11 19:02:33

+1

没有像我所知的内置。编码时比花时间花费的时间少,所以我一般都跑自己的轮子...... :) – glmxndr 2009-08-11 19:10:40

回答

49

在Java中的“自然”命令的意思是“辞书”命令,所以像你要找的一个核心没有实现对于。

有开源的实现。

这里有一个:

NaturalOrderComparator.java

请务必阅读:

Cougaar Open Source License

我希望这有助于!

+0

谢谢。我修改了这个问题,以澄清我的意思是“自然的” – Kip 2009-08-11 19:38:33

+1

我喜欢SO--只是随意地拨动,这里是我过去一周推迟推出的一个快速解决方案。这是一个非常有用的链接 - 谢谢! (并且许可与我们的项目兼容) – 2009-08-11 20:55:24

+4

当选择像这样的开源实现时,您应该确保它们完全符合您的期望。许多人专注于在用户界面中提供看起来直观和美观的订单。有时他们接受并跳过空格,跳过前导零,最重要的是在较长的字符串之前放置较短的字符串,当它们相等时。字符串1.020将被放置在1.20之后。如果您使用它来确定两个版本是否相同,您可以在这种情况下得到一个错误的否定。即当检查compareTo()返回0. – sigget 2011-09-27 22:50:34

8

String实现Comparable,这就是Java中的自然排序(使用可比较的接口进行比较)。您可以将字符串放入TreeSet中,或使用Collections或Arrays类进行排序。

然而,在你的情况,你不想“自然排序”你真的想要一个自定义的比较,然后你就可以在Collections.sort方法或Arrays.sort方法,需要一个比较使用。

关于您在比较器中实现的具体逻辑(数字以点分隔),我不知道任何现有的标准实现,但正如您所说,这不是一个难题。

编辑:在您的意见,您的链接让你here,它做了体面的工作,如果你不介意的事实,它是区分大小写。下面是修改了代码,让您在String.CASE_INSENSITIVE_ORDER传:

/* 
    * The Alphanum Algorithm is an improved sorting algorithm for strings 
    * containing numbers. Instead of sorting numbers in ASCII order like 
    * a standard sort, this algorithm sorts numbers in numeric order. 
    * 
    * The Alphanum Algorithm is discussed at http://www.DaveKoelle.com 
    * 
    * 
    * This library is free software; you can redistribute it and/or 
    * modify it under the terms of the GNU Lesser General Public 
    * License as published by the Free Software Foundation; either 
    * version 2.1 of the License, or any later version. 
    * 
    * This library is distributed in the hope that it will be useful, 
    * but WITHOUT ANY WARRANTY; without even the implied warranty of 
    * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU 
    * Lesser General Public License for more details. 
    * 
    * You should have received a copy of the GNU Lesser General Public 
    * License along with this library; if not, write to the Free Software 
    * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA 
    * 
    */ 

    import java.util.Comparator; 

    /** 
    * This is an updated version with enhancements made by Daniel Migowski, 
    * Andre Bogus, and David Koelle 
    * 
    * To convert to use Templates (Java 1.5+): 
    * - Change "implements Comparator" to "implements Comparator<String>" 
    * - Change "compare(Object o1, Object o2)" to "compare(String s1, String s2)" 
    * - Remove the type checking and casting in compare(). 
    * 
    * To use this class: 
    * Use the static "sort" method from the java.util.Collections class: 
    * Collections.sort(your list, new AlphanumComparator()); 
    */ 
    public class AlphanumComparator implements Comparator<String> 
    { 
     private Comparator<String> comparator = new NaturalComparator(); 

     public AlphanumComparator(Comparator<String> comparator) { 
      this.comparator = comparator; 
     } 

     public AlphanumComparator() { 

     } 

     private final boolean isDigit(char ch) 
     { 
      return ch >= 48 && ch <= 57; 
     } 

     /** Length of string is passed in for improved efficiency (only need to calculate it once) **/ 
     private final String getChunk(String s, int slength, int marker) 
     { 
      StringBuilder chunk = new StringBuilder(); 
      char c = s.charAt(marker); 
      chunk.append(c); 
      marker++; 
      if (isDigit(c)) 
      { 
       while (marker < slength) 
       { 
        c = s.charAt(marker); 
        if (!isDigit(c)) 
         break; 
        chunk.append(c); 
        marker++; 
       } 
      } else 
      { 
       while (marker < slength) 
       { 
        c = s.charAt(marker); 
        if (isDigit(c)) 
         break; 
        chunk.append(c); 
        marker++; 
       } 
      } 
      return chunk.toString(); 
     } 

     public int compare(String s1, String s2) 
     { 

      int thisMarker = 0; 
      int thatMarker = 0; 
      int s1Length = s1.length(); 
      int s2Length = s2.length(); 

      while (thisMarker < s1Length && thatMarker < s2Length) 
      { 
       String thisChunk = getChunk(s1, s1Length, thisMarker); 
       thisMarker += thisChunk.length(); 

       String thatChunk = getChunk(s2, s2Length, thatMarker); 
       thatMarker += thatChunk.length(); 

       // If both chunks contain numeric characters, sort them numerically 
       int result = 0; 
       if (isDigit(thisChunk.charAt(0)) && isDigit(thatChunk.charAt(0))) 
       { 
        // Simple chunk comparison by length. 
        int thisChunkLength = thisChunk.length(); 
        result = thisChunkLength - thatChunk.length(); 
        // If equal, the first different number counts 
        if (result == 0) 
        { 
         for (int i = 0; i < thisChunkLength; i++) 
         { 
          result = thisChunk.charAt(i) - thatChunk.charAt(i); 
          if (result != 0) 
          { 
           return result; 
          } 
         } 
        } 
       } else 
       { 
        result = comparator.compare(thisChunk, thatChunk); 
       } 

       if (result != 0) 
        return result; 
      } 

      return s1Length - s2Length; 
     } 

     private static class NaturalComparator implements Comparator<String> { 
      public int compare(String o1, String o2) { 
       return o1.compareTo(o2); 
      } 
     } 
    } 
+6

“自然排序”是排序“image9”的广泛使用术语。 jpg“小于”image10.jpg“。它是“自然的”,因为它是人类如何分类它们的,而不是计算机默认做的非自然的“ascii-betical”分类。 http://www.codinghorror.com/blog/archives/001018.html – Kip 2009-08-11 19:28:38

+0

我已经更新了这个问题,在这方面更加明确 – Kip 2009-08-11 19:39:34

+0

在您发布的codinghorror链接中,它有一个链接,它可以获取到这个链接Java实现:http://www.davekoelle.com/alphanum.html – Yishai 2009-08-11 20:57:57

9

我已经测试过其他人在这里提到的三个Java实现,发现他们的工作稍有不同,但没有像我期望的那样。

AlphaNumericStringComparatorAlphanumComparator都不会忽略空格,因此pic2被放置在pic 1之前。

另一方面NaturalOrderComparator不仅忽略空格而且还忽略所有前导零以便sig[1]sig[0]之前。

关于性能AlphaNumericStringComparator比另外两个慢了〜x10。

+0

它必须是这样的,因为AlphaNumericStringComparator使用的是正则表达式(无论如何这是一个坏主意) – 2013-05-08 13:39:29

2

如何使用String中的split()方法解析单个数字字符串,然后逐个比较它们?

@Test 
public void test(){ 
    System.out.print(compare("1.12.4".split("\\."), "1.13.4".split("\\."),0)); 
} 


public static int compare(String[] arr1, String[] arr2, int index){ 
    // if arrays do not have equal size then and comparison reached the upper bound of one of them 
    // then the longer array is considered the bigger (--> 2.2.0 is bigger then 2.2) 
    if(arr1.length <= index || arr2.length <= index) return arr1.length - arr2.length; 
    int result = Integer.parseInt(arr1[index]) - Integer.parseInt(arr2[index]); 
    return result == 0 ? compare(arr1, arr2, ++index) : result; 
} 

我没有检查角落案件,但应该工作,这是相当紧凑

+0

这更受限制:它只处理由点分隔的整数列表的字符串。我可能想比较“user1album12photo4.jpg”到“user1album13photo4.jpg” – Kip 2012-07-05 15:08:52

+0

true ...我只专注于软件版本字符串。对不起,但仍然,我个人喜欢解决方案 – bennidi 2012-07-27 12:48:40

1

它concats的数字,然后进行比较。如果它不适用,它会继续。

public int compare(String o1, String o2) { 
if(o1 == null||o2 == null) 
    return 0; 
for(int i = 0; i<o1.length()&&i<o2.length();i++){ 
    if(Character.isDigit(o1.charAt(i)) || Character.isDigit(o2.charAt(i))) 
    { 
    String dig1 = "",dig2 = "";  
    for(int x = i; x<o1.length() && Character.isDigit(o1.charAt(i)); x++){        
     dig1+=o1.charAt(x); 
    } 
    for(int x = i; x<o2.length() && Character.isDigit(o2.charAt(i)); x++){ 
     dig2+=o2.charAt(x); 
    } 
    if(Integer.valueOf(dig1) < Integer.valueOf(dig2)) 
     return -1; 
    if(Integer.valueOf(dig1) > Integer.valueOf(dig2)) 
     return 1; 
    }  
if(o1.charAt(i)<o2.charAt(i)) 
    return -1; 
if(o1.charAt(i)>o2.charAt(i)) 
    return 1; 
} 
return 0; 

}

0

可能是一个迟到的答复。但我的答案可以帮助需要比较器的其他人。

我也验证了其他比较器的几个。但是我的比较起来似乎比其他人有效。也尝试了伊has发布的那个。对于100个条目的字母数字数据集的数据,我只用了一半的时间。

/** 
* Sorter that compares the given Alpha-numeric strings. This iterates through each characters to 
* decide the sort order. There are 3 possible cases while iterating, 
* 
* <li>If both have same non-digit characters then the consecutive characters will be considered for 
* comparison.</li> 
* 
* <li>If both have numbers at the same position (with/without non-digit characters) the consecutive 
* digit characters will be considered to form the valid integer representation of the characters 
* will be taken and compared.</li> 
* 
* <li>At any point if the comparison gives the order(either > or <) then the consecutive characters 
* will not be considered.</li> 
* 
* For ex., this will be the ordered O/P of the given list of Strings.(The bold characters decides 
* its order) <i><b>2</b>b,<b>100</b>b,a<b>1</b>,A<b>2</b>y,a<b>100</b>,</i> 
* 
* @author kannan_r 
* 
*/ 
class AlphaNumericSorter implements Comparator<String> 
{ 
    /** 
    * Does the Alphanumeric sort of the given two string 
    */ 
    public int compare(String theStr1, String theStr2) 
    { 
     char[] theCharArr1 = theStr1.toCharArray(); 
     char[] theCharArr2 = theStr2.toCharArray(); 
     int aPosition = 0; 
     if (Character.isDigit(theCharArr1[aPosition]) && Character.isDigit(theCharArr2[aPosition])) 
     { 
      return sortAsNumber(theCharArr1, theCharArr2, aPosition++); 
     } 
     return sortAsString(theCharArr1, theCharArr2, 0); 
    } 

    /** 
    * Sort the given Arrays as string starting from the given position. This will be a simple case 
    * insensitive sort of each characters. But at any given position if there are digits in both 
    * arrays then the method sortAsNumber will be invoked for the given position. 
    * 
    * @param theArray1 The first character array. 
    * @param theArray2 The second character array. 
    * @param thePosition The position starting from which the calculation will be done. 
    * @return positive number when the Array1 is greater than Array2<br/> 
    *   negative number when the Array2 is greater than Array1<br/> 
    *   zero when the Array1 is equal to Array2 
    */ 
    private int sortAsString(char[] theArray1, char[] theArray2, int thePosition) 
    { 
     int aResult = 0; 
     if (thePosition < theArray1.length && thePosition < theArray2.length) 
     { 
      aResult = (int)theArray1[thePosition] - (int)theArray2[thePosition]; 
      if (aResult == 0) 
      { 
       ++thePosition; 
       if (thePosition < theArray1.length && thePosition < theArray2.length) 
       { 
        if (Character.isDigit(theArray1[thePosition]) && Character.isDigit(theArray2[thePosition])) 
        { 
         aResult = sortAsNumber(theArray1, theArray2, thePosition); 
        } 
        else 
        { 
         aResult = sortAsString(theArray1, theArray2, thePosition); 
        } 
       } 
      } 
     } 
     else 
     { 
      aResult = theArray1.length - theArray2.length; 
     } 
     return aResult; 
    } 

    /** 
    * Sorts the characters in the given array as number starting from the given position. When 
    * sorted as numbers the consecutive characters starting from the given position upto the first 
    * non-digit character will be considered. 
    * 
    * @param theArray1 The first character array. 
    * @param theArray2 The second character array. 
    * @param thePosition The position starting from which the calculation will be done. 
    * @return positive number when the Array1 is greater than Array2<br/> 
    *   negative number when the Array2 is greater than Array1<br/> 
    *   zero when the Array1 is equal to Array2 
    */ 
    private int sortAsNumber(char[] theArray1, char[] theArray2, int thePosition) 
    { 
     int aResult = 0; 
     int aNumberInStr1; 
     int aNumberInStr2; 
     if (thePosition < theArray1.length && thePosition < theArray2.length) 
     { 
      if (Character.isDigit(theArray1[thePosition]) && Character.isDigit(theArray1[thePosition])) 
      { 
       aNumberInStr1 = getNumberInStr(theArray1, thePosition); 
       aNumberInStr2 = getNumberInStr(theArray2, thePosition); 

       aResult = aNumberInStr1 - aNumberInStr2; 

       if (aResult == 0) 
       { 
        thePosition = getNonDigitPosition(theArray1, thePosition); 
        if (thePosition != -1) 
        { 
         aResult = sortAsString(theArray1, theArray2, thePosition); 
        } 
       } 
      } 
      else 
      { 
       aResult = sortAsString(theArray1, theArray2, ++thePosition); 
      } 
     } 
     else 
     { 
      aResult = theArray1.length - theArray2.length; 
     } 
     return aResult; 
    } 

    /** 
    * Gets the position of the non digit character in the given array starting from the given 
    * position. 
    * 
    * @param theCharArr /the character array. 
    * @param thePosition The position after which the array need to be checked for non-digit 
    *  character. 
    * @return The position of the first non-digit character in the array. 
    */ 
    private int getNonDigitPosition(char[] theCharArr, int thePosition) 
    { 
     for (int i = thePosition; i < theCharArr.length; i++) 
     { 
      if (!Character.isDigit(theCharArr[i])) 
      { 
       return i; 
      } 
     } 
     return -1; 
    } 

    /** 
    * Gets the integer value of the number starting from the given position of the given array. 
    * 
    * @param theCharArray The character array. 
    * @param thePosition The position form which the number need to be calculated. 
    * @return The integer value of the number. 
    */ 
    private int getNumberInStr(char[] theCharArray, int thePosition) 
    { 
     int aNumber = 0; 
     for (int i = thePosition; i < theCharArray.length; i++) 
     { 
      if(!Character.isDigit(theCharArray[i])) 
      { 
       return aNumber; 
      } 
      aNumber += aNumber * 10 + (theCharArray[i] - 48); 
     } 
     return aNumber; 
    } 
} 
+0

调用'sortAsNumber()'中的'++'没有任何作用(并且是错误的;它会跳过第一个数字)。 – 2017-07-26 12:45:30

6

看看这个实现。它应该尽可能快,没有任何正则表达式或数组操作或方法调用,只是一对标志和很多情况。

这应该排序字符串内数字的任意组合,并正确支持相等的数字并继续前进。

public static int naturalCompare(String a, String b, boolean ignoreCase) { 
    if (ignoreCase) { 
     a = a.toLowerCase(); 
     b = b.toLowerCase(); 
    } 
    int aLength = a.length(); 
    int bLength = b.length(); 
    int minSize = Math.min(aLength, bLength); 
    char aChar, bChar; 
    boolean aNumber, bNumber; 
    boolean asNumeric = false; 
    int lastNumericCompare = 0; 
    for (int i = 0; i < minSize; i++) { 
     aChar = a.charAt(i); 
     bChar = b.charAt(i); 
     aNumber = aChar >= '0' && aChar <= '9'; 
     bNumber = bChar >= '0' && bChar <= '9'; 
     if (asNumeric) 
      if (aNumber && bNumber) { 
       if (lastNumericCompare == 0) 
        lastNumericCompare = aChar - bChar; 
      } else if (aNumber) 
       return 1; 
      else if (bNumber) 
       return -1; 
      else if (lastNumericCompare == 0) { 
       if (aChar != bChar) 
        return aChar - bChar; 
       asNumeric = false; 
      } else 
       return lastNumericCompare; 
     else if (aNumber && bNumber) { 
      asNumeric = true; 
      if (lastNumericCompare == 0) 
       lastNumericCompare = aChar - bChar; 
     } else if (aChar != bChar) 
      return aChar - bChar; 
    } 
    if (asNumeric) 
     if (aLength > bLength && a.charAt(bLength) >= '0' && a.charAt(bLength) <= '9') // as number 
      return 1; // a has bigger size, thus b is smaller 
     else if (bLength > aLength && b.charAt(aLength) >= '0' && b.charAt(aLength) <= '9') // as number 
      return -1; // b has bigger size, thus a is smaller 
     else if (lastNumericCompare == 0) 
      return aLength - bLength; 
     else 
      return lastNumericCompare; 
    else 
     return aLength - bLength; 
} 
0

使用RuleBasedCollator可能是一种选择,以及。虽然您必须提前添加所有排序顺序规则,但如果您想要考虑更大的数字,它也不是一个好的解决方案。

2 < 10添加特定的自定义是虽然很容易的,可能是分拣特殊版本标识符像Trusty < Precise < Xenial < Yakkety有用。

RuleBasedCollator localRules = (RuleBasedCollator) Collator.getInstance(); 

String extraRules = IntStream.range(0, 100).mapToObj(String::valueOf).collect(joining(" < ")); 
RuleBasedCollator c = new RuleBasedCollator(localRules.getRules() + " & " + extraRules); 

List<String> a = asList("1-2", "1-02", "1-20", "10-20", "fred", "jane", "pic01", "pic02", "pic02a", "pic 5", "pic05", "pic 7", "pic100", "pic100a", "pic120", "pic121"); 
shuffle(a); 

a.sort(c); 
System.out.println(a); 
相关问题