2012-02-07 14 views
1

我将FIX消息字符串(ASCII)作为ByteBuffer。我解析标签值对并将值作为原始对象存储在树标签中,并以标签作为关键字。所以我需要根据其类型将byte []值转换为int/double/date等。包含ascii字符串的byte []快速转换为int/double/date等无新字符串

最简单的方法是创建新的字符串并将其传递给标准转换器函数。例如

int convertToInt(byte[] buffer, int offset, int length) 
{ 
    String valueStr = new String(buffer, offset, length); 
    return Integer.parseInt(valueStr); 
} 

据我所知,在Java中,创建新的对象是非常便宜的,仍然是没有什么办法来此ASCII字节[]直接转换为基本类型。我尝试使用手写函数来完成这项工作,但发现它很耗时,并且不会带来更好的性能。

是否有任何第三方库这样做,最重要的是值得去做?

+2

测量性能,即微基准很难并且几乎总是出错。如果您需要整体性能,则将字符串化是个坏主意。您应该使用'ByteBufefr.putInt'来代替。除此之外,手写'ByteBuffer'解析将会执行,如果使用'ByteBuffer'则不会将其转换为byte [],这会破坏ByteBuffer本身的用途。 – bestsss 2012-02-07 07:24:53

+0

谢谢bestss,但它是ASCII ByteBuffer,不是二进制的,所以不能使用getInt,putInt。 – Mahendra 2012-02-07 07:51:23

+0

什么是你称为ASCII byteBuffer(标准jdk中没有这样的类) – bestsss 2012-02-07 07:57:46

回答

2

最重要的是值得去做吗?

几乎肯定不会 - 你应该测量检查,这是去显著努力减轻其性能瓶颈。

你现在的表现如何?它需要成为什么? (“越快越好”是不是一个很好的目标,否则你永远不会停止 - 工作出来的时候,你可以说你是“完成”。)

配置文件的代码 - 是问题真的在字符串创建?检查你多久收集一次垃圾(再次使用分析器)。

每个解析类型可能具有不同的特征。例如,对于解析整数,如果你发现了的时间显著量你有一个单一的数字,你可能要特殊情况是:

if (length == 1) 
{ 
    char c = buffer[index]; 
    if (c >= '0' && c <= '9') 
    { 
     return c - '0'; 
    } 
    // Invalid - throw an exception or whatever 
} 

...但如何检查往往这发生在你走下去之前。对于从未实际发生的特定优化应用大量检查是相反的。

+0

我同意你的说法。我意识到,要获得单位数微秒的性能改善将是太多努力。 Profiler说新的String会导致很多次要的集合。但我认为,在应用程序上下文中剖析解析库以获得更清晰的图像会更有意义。 – Mahendra 2012-02-07 07:55:54

+0

目前,需要大约20微秒的时间才能创建40个以上标记值对中的树图。 – Mahendra 2012-02-07 08:02:28

1

看看ByteBuffer。它具有执行此操作的功能,包括处理字节顺序(字节顺序)。

+0

我不认为ByteBuffer有任何解析*文本*数据,是吗? – 2012-02-07 07:38:29

+0

@JonSkeet - 不,但OP说:“我需要将byte []值转换为int/double/date等。” – 2012-02-07 07:44:53

+2

谢谢泰德!它说byte []包含ascii字符串。 – Mahendra 2012-02-07 07:51:35

1

一般来说,我没有任何偏好粘贴这样的代码,但不管怎么说,100线它是如何做(生产代码) 使用它,但有一定的参考代码,这是不错的(通常)

package t1; 

import java.io.UnsupportedEncodingException; 
import java.nio.ByteBuffer; 

public class IntParser { 
    final static byte[] digits = { 
     '0' , '1' , '2' , '3' , '4' , '5' , 
     '6' , '7' , '8' , '9' , 'a' , 'b' , 
     'c' , 'd' , 'e' , 'f' , 'g' , 'h' , 
     'i' , 'j' , 'k' , 'l' , 'm' , 'n' , 
     'o' , 'p' , 'q' , 'r' , 's' , 't' , 
     'u' , 'v' , 'w' , 'x' , 'y' , 'z' 
    }; 

    static boolean isDigit(byte b) { 
    return b>='0' && b<='9'; 
    } 

    static int digit(byte b){ 
     //negative = error 

     int result = b-'0'; 
     if (result>9) 
      result = -1; 
     return result; 
    } 

    static NumberFormatException forInputString(ByteBuffer b){ 
     byte[] bytes=new byte[b.remaining()]; 
     b.get(bytes); 
     try { 
      return new NumberFormatException("bad integer: "+new String(bytes, "8859_1")); 
     } catch (UnsupportedEncodingException e) { 
      throw new RuntimeException(e); 
     } 
    } 
    public static int parseInt(ByteBuffer b){ 
     return parseInt(b, 10, b.position(), b.limit()); 
    } 
    public static int parseInt(ByteBuffer b, int radix, int i, int max) throws NumberFormatException{ 
     int result = 0; 
     boolean negative = false; 


     int limit; 
     int multmin; 
     int digit;  

     if (max > i) { 
      if (b.get(i) == '-') { 
       negative = true; 
       limit = Integer.MIN_VALUE; 
       i++; 
      } else { 
       limit = -Integer.MAX_VALUE; 
      } 
      multmin = limit/radix; 
      if (i < max) { 
       digit = digit(b.get(i++)); 
       if (digit < 0) { 
        throw forInputString(b); 
       } else { 
        result = -digit; 
       } 
      } 
      while (i < max) { 
       // Accumulating negatively avoids surprises near MAX_VALUE 
       digit = digit(b.get(i++)); 
       if (digit < 0) { 
        throw forInputString(b); 
       } 
       if (result < multmin) { 
        throw forInputString(b); 
       } 
       result *= radix; 
       if (result < limit + digit) { 
        throw forInputString(b); 
       } 
       result -= digit; 
      } 
     } else { 
      throw forInputString(b); 
     } 
     if (negative) { 
      if (i > b.position()+1) { 
       return result; 
      } else { /* Only got "-" */ 
       throw forInputString(b); 
      } 
     } else { 
      return -result; 
     } 
    } 

} 
2

我不会建议但是,在处理很多FIX消息时,请同意Jon的意见,这很快就会增加。 下面的方法将允许填充空格的数字。如果你需要处理小数,那么代码会稍有不同。两种方法之间的速度差异是因子11. ConvertToLong结果为0个GC。以下代码位于c#:

///<summary> 
///Converts a byte[] of characters that represent a number into a .net long type. Numbers can be padded from left 
/// with spaces. 
///</summary> 
///<param name="buffer">The buffer containing the number as characters</param> 
///<param name="startIndex">The startIndex of the number component</param> 
///<param name="endIndex">The EndIndex of the number component</param> 
///<returns>The price will be returned as a long from the ASCII characters</returns> 
public static long ConvertToLong(this byte[] buffer, int startIndex, int endIndex) 
{ 
    long result = 0; 
    for (int i = startIndex; i <= endIndex; i++) 
    { 
     if (buffer[i] != 0x20) 
     { 
      // 48 is the decimal value of the '0' character. So to convert the char value 
      // of an int to a number we subtract 48. e.g '1' = 49 -48 = 1 
      result = result * 10 + (buffer[i] - 48); 
     } 
    } 
    return result; 
} 

/// <summary> 
/// Same as above but converting to string then to long 
/// </summary> 
public static long ConvertToLong2(this byte[] buffer, int startIndex, int endIndex) 
{ 
    for (int i = startIndex; i <= endIndex; i++) 
    { 
     if (buffer[i] != SpaceChar) 
     { 
      return long.Parse(System.Text.Encoding.UTF8.GetString(buffer, i, (endIndex - i) + 1)); 
     } 
    } 
    return 0; 
} 

[Test] 
public void TestPerformance(){ 
    const int iterations = 200 * 1000; 
    const int testRuns = 10; 
    const int warmUp = 10000; 
    const string number = " 123400"; 
    byte[] buffer = System.Text.Encoding.UTF8.GetBytes(number); 

    double result = 0; 
    for (int i = 0; i < warmUp; i++){ 
     result = buffer.ConvertToLong(0, buffer.Length - 1); 
    } 
    for (int testRun = 0; testRun < testRuns; testRun++){ 
     Stopwatch sw = new Stopwatch(); 
     sw.Start(); 
     for (int i = 0; i < iterations; i++){ 
      result = buffer.ConvertToLong(0, buffer.Length - 1); 
     } 
     sw.Stop(); 
     Console.WriteLine("Test {4}: {0} ticks, {1}ms, 1 conversion takes = {2}μs or {3}ns. GCs: {5}", sw.ElapsedTicks, 
      sw.ElapsedMilliseconds, (((decimal) sw.ElapsedMilliseconds)/((decimal) iterations))*1000, 
      (((decimal) sw.ElapsedMilliseconds)/((decimal) iterations))*1000*1000, testRun, 
      GC.CollectionCount(0) + GC.CollectionCount(1) + GC.CollectionCount(2)); 
    } 
} 
RESULTS 
ConvertToLong: 
Test 0: 9243 ticks, 4ms, 1 conversion takes = 0.02000μs or 20.00000ns. GCs: 2 
Test 1: 8339 ticks, 4ms, 1 conversion takes = 0.02000μs or 20.00000ns. GCs: 2 
Test 2: 8425 ticks, 4ms, 1 conversion takes = 0.02000μs or 20.00000ns. GCs: 2 
Test 3: 8333 ticks, 4ms, 1 conversion takes = 0.02000μs or 20.00000ns. GCs: 2 
Test 4: 8332 ticks, 4ms, 1 conversion takes = 0.02000μs or 20.00000ns. GCs: 2 
Test 5: 8331 ticks, 4ms, 1 conversion takes = 0.02000μs or 20.00000ns. GCs: 2 
Test 6: 8409 ticks, 4ms, 1 conversion takes = 0.02000μs or 20.00000ns. GCs: 2 
Test 7: 8334 ticks, 4ms, 1 conversion takes = 0.02000μs or 20.00000ns. GCs: 2 
Test 8: 8335 ticks, 4ms, 1 conversion takes = 0.02000μs or 20.00000ns. GCs: 2 
Test 9: 8331 ticks, 4ms, 1 conversion takes = 0.02000μs or 20.00000ns. GCs: 2 
ConvertToLong2: 
Test 0: 109067 ticks, 55ms, 1 conversion takes = 0.275000μs or 275.000000ns. GCs: 4 
Test 1: 109861 ticks, 56ms, 1 conversion takes = 0.28000μs or 280.00000ns. GCs: 8 
Test 2: 102888 ticks, 52ms, 1 conversion takes = 0.26000μs or 260.00000ns. GCs: 9 
Test 3: 105164 ticks, 53ms, 1 conversion takes = 0.265000μs or 265.000000ns. GCs: 10 
Test 4: 104083 ticks, 53ms, 1 conversion takes = 0.265000μs or 265.000000ns. GCs: 11 
Test 5: 102756 ticks, 52ms, 1 conversion takes = 0.26000μs or 260.00000ns. GCs: 13 
Test 6: 102219 ticks, 52ms, 1 conversion takes = 0.26000μs or 260.00000ns. GCs: 14 
Test 7: 102086 ticks, 52ms, 1 conversion takes = 0.26000μs or 260.00000ns. GCs: 15 
Test 8: 102672 ticks, 52ms, 1 conversion takes = 0.26000μs or 260.00000ns. GCs: 17 
Test 9: 102025 ticks, 52ms, 1 conversion takes = 0.26000μs or 260.00000ns. GCs: 18