内联代码比Java中的函数调用/静态函数更慢

我一直在运行一些测试以查看如何内联函数代码（在代码本身中显式编写函数算法）会影响性能。我写了一个简单的字节数组到整数代码，然后将其包装在一个函数中，从另一个类静态调用它，并从类本身静态调用它。代码如下：内联代码比Java中的函数调用/静态函数更慢

public class FunctionCallSpeed { 
    public static final int numIter = 50000000; 

    public static void main (String [] args) { 
     byte [] n = new byte[4]; 

     long start; 

     System.out.println("Function from Static Class ================="); 
     start = System.nanoTime(); 
     for (int i = 0; i < numIter; i++) { 
      StaticClass.toInt(n); 
     } 
     System.out.println("Elapsed time: " + (double)(System.nanoTime() - start)/1000000000 + "s"); 

     System.out.println("Function from Class ========================"); 
     start = System.nanoTime(); 
     for (int i = 0; i < numIter; i++) { 
      toInt(n); 
     } 
     System.out.println("Elapsed time: " + (double)(System.nanoTime() - start)/1000000000 + "s"); 

     int actual = 0; 

     int len = n.length; 

     System.out.println("Inline Function ============================"); 
     start = System.nanoTime(); 
     for (int i = 0; i < numIter; i++) { 
      for (int j = 0; j < len; j++) { 
       actual += n[len - 1 - j] << 8 * j; 
      } 
     } 
     System.out.println("Elapsed time: " + (double)(System.nanoTime() - start)/1000000000 + "s"); 
    } 

    public static int toInt(byte [] num) { 
     int actual = 0; 

     int len = num.length; 

     for (int i = 0; i < len; i++) { 
      actual += num[len - 1 - i] << 8 * i; 
     } 

     return actual; 
    } 
}

结果如下：

Function from Static Class ================= 
Elapsed time: 0.096559931s 
Function from Class ======================== 
Elapsed time: 0.015741711s 
Inline Function ============================ 
Elapsed time: 0.837626286s

是否存在与字节码事情很奇怪？我已经自己查看了字节码，但我并不是很熟悉，我无法做出正面或反面的判断。

编辑

我加assert语句读取输出，然后随机字节读取和基准现在的行为我以为它会的方式。感谢Tomasz Nurkiewicz，他向我指出了微基准文章。生成的代码是这样的：

public class FunctionCallSpeed { 
public static final int numIter = 50000000; 

public static void main (String [] args) { 
    byte [] n; 

    long start, end; 
    int checker, calc; 

    end = 0; 
    System.out.println("Function from Object ================="); 
    for (int i = 0; i < numIter; i++) { 
     checker = (int)(Math.random() * 65535); 
     n = toByte(checker); 
     start = System.nanoTime(); 
     calc = StaticClass.toInt(n); 
     end += System.nanoTime() - start; 
     assert calc == checker; 
    } 
    System.out.println("Elapsed time: " + (double)end/1000000000 + "s"); 
    end = 0; 
    System.out.println("Function from Class =================="); 
    start = System.nanoTime(); 
    for (int i = 0; i < numIter; i++) { 
     checker = (int)(Math.random() * 65535); 
     n = toByte(checker); 
     start = System.nanoTime(); 
     calc = toInt(n); 
     end += System.nanoTime() - start; 
     assert calc == checker; 
    } 
    System.out.println("Elapsed time: " + (double)end/1000000000 + "s"); 


    int len = 4; 
    end = 0; 
    System.out.println("Inline Function ======================"); 
    start = System.nanoTime(); 
    for (int i = 0; i < numIter; i++) { 
     calc = 0; 
     checker = (int)(Math.random() * 65535); 
     n = toByte(checker); 
     start = System.nanoTime(); 
     for (int j = 0; j < len; j++) { 
      calc += n[len - 1 - j] << 8 * j; 
     } 
     end += System.nanoTime() - start; 
     assert calc == checker; 
    } 
    System.out.println("Elapsed time: " + (double)(System.nanoTime() - start)/1000000000 + "s"); 
} 

public static byte [] toByte(int val) { 
    byte [] n = new byte[4]; 

    for (int i = 0; i < 4; i++) { 
     n[i] = (byte)((val >> 8 * i) & 0xFF); 
    } 
    return n; 
} 

public static int toInt(byte [] num) { 
    int actual = 0; 

    int len = num.length; 

    for (int i = 0; i < len; i++) { 
     actual += num[len - 1 - i] << 8 * i; 
    } 

    return actual; 
} 
}

结果：

Function from Static Class ================= 
Elapsed time: 9.276437031s 
Function from Class ======================== 
Elapsed time: 9.225660708s 
Inline Function ============================ 
Elapsed time: 5.9512E-5s

来源

2012-08-30 ddukki

[我怎样写一个正确的微-benchmark in Java？]（http://stackoverflow.com/questions/504103） –

@TomaszNurkiewicz，感谢您的链接。我想我固定了我的基准，至少对于我想检查的情况。 – ddukki

我移植你的测试用例来caliper：

import com.google.caliper.SimpleBenchmark; 

public class ToInt extends SimpleBenchmark { 

    private byte[] n; 
    private int total; 

    @Override 
    protected void setUp() throws Exception { 
     n = new byte[4]; 
    } 

    public int timeStaticClass(int reps) { 
     for (int i = 0; i < reps; i++) { 
      total += StaticClass.toInt(n); 
     } 
     return total; 
    } 

    public int timeFromClass(int reps) { 
     for (int i = 0; i < reps; i++) { 
      total += toInt(n); 
     } 
     return total; 
    } 

    public int timeInline(int reps) { 
     for (int i = 0; i < reps; i++) { 
      int actual = 0; 
      int len = n.length; 
      for (int i1 = 0; i1 < len; i1++) { 
       actual += n[len - 1 - i1] << 8 * i1; 
      } 
      total += actual; 
     } 
     return total; 
    } 

    public static int toInt(byte[] num) { 
     int actual = 0; 
     int len = num.length; 
     for (int i = 0; i < len; i++) { 
      actual += num[len - 1 - i] << 8 * i; 
     } 
     return actual; 
    } 
} 

class StaticClass { 
    public static int toInt(byte[] num) { 
     int actual = 0; 

     int len = num.length; 

     for (int i = 0; i < len; i++) { 
      actual += num[len - 1 - i] << 8 * i; 
     } 

     return actual; 
    } 

}

而且确实好像内嵌版本是最慢的，而两个静态版本几乎相同（如预期）：

caliper

原因很难想象。我能想到的两个因素：

JVM是在执行微优化时代码块小而简单的推理尽可能好。当函数内联时，整个代码变得更加复杂，JVM放弃。对于较小的toInt()功能它JIT是更聪明
缓存位置 - 不知何故JVM执行与两个代码小块（循环和方法），而不是一个更大更好的

来源

2012-08-30 17:21:06

它总是很难做出什么样的JIT是做了保证，但如果要我猜，它发现的返回值功能从来没有被使用，并优化了很多。

如果你实际使用函数的返回值，我敢打赌它会改变速度。

来源

2012-08-30 16:43:06 corsiKa

是的，它解决了这个问题，谢谢。 – ddukki

你有几个问题，但主要的问题是你正在测试一个优化代码的迭代。这肯定会给你带来不一样的结果。我建议运行测试2秒，忽略前10000次迭代。

如果没有保留循环的结果，整个循环可以在一些随机时间间隔后丢弃。

断裂每个测试到一个单独的方法

public class FunctionCallSpeed { 
    public static final int numIter = 50000000; 
    private static int dontOptimiseAway; 

    public static void main(String[] args) { 
     byte[] n = new byte[4]; 

     for (int i = 0; i < 10; i++) { 
      test1(n); 
      test2(n); 
      test3(n); 
      System.out.println(); 
     } 
    } 

    private static void test1(byte[] n) { 
     System.out.print("from Static Class: "); 
     long start = System.nanoTime(); 
     for (int i = 0; i < numIter; i++) { 
      dontOptimiseAway = FunctionCallSpeed.toInt(n); 
     } 
     System.out.print((System.nanoTime() - start)/numIter + "ns "); 
    } 

    private static void test2(byte[] n) { 
     long start; 
     System.out.print("from Class: "); 
     start = System.nanoTime(); 
     for (int i = 0; i < numIter; i++) { 
      dontOptimiseAway = toInt(n); 
     } 
     System.out.print((System.nanoTime() - start)/numIter + "ns "); 
    } 

    private static void test3(byte[] n) { 
     long start; 
     int actual = 0; 

     int len = n.length; 

     System.out.print("Inlined: "); 
     start = System.nanoTime(); 
     for (int i = 0; i < numIter; i++) { 
      for (int j = 0; j < len; j++) { 
       actual += n[len - 1 - j] << 8 * j; 
      } 
      dontOptimiseAway = actual; 
     } 
     System.out.print((System.nanoTime() - start)/numIter + "ns "); 
    } 

    public static int toInt(byte[] num) { 
     int actual = 0; 

     int len = num.length; 

     for (int i = 0; i < len; i++) { 
      actual += num[len - 1 - i] << 8 * i; 
     } 

     return actual; 
    } 
}

打印

from Class: 7ns Inlined: 11ns from Static Class: 9ns 
from Class: 6ns Inlined: 8ns from Static Class: 8ns 
from Class: 6ns Inlined: 9ns from Static Class: 6ns

这表明，当内环被分别优化它是稍微更有效。

但是，如果我使用的字节优化转换为int

public static int toInt(byte[] num) { 
    return num[0] + (num[1] << 8) + (num[2] << 16) + (num[3] << 24); 
}

所有的测试报告

from Static Class: 0ns from Class: 0ns Inlined: 0ns 
from Static Class: 0ns from Class: 0ns Inlined: 0ns 
from Static Class: 0ns from Class: 0ns Inlined: 0ns

为实现测试没有做任何有用的。 ;）

来源

2012-08-30 16:43:33

是的，它被优化了。谢谢！ – ddukki

您的测试存在缺陷。第二个测试是已经运行的第一个测试的好处。您需要在自己的JVM调用中运行每个测试用例。

来源

2012-08-30 17:07:51

内联代码比Java中的函数调用/静态函数更慢

回答

相关问题