2013-08-22 35 views
4

我在Haskell和Scala中有一段非常简单的代码。此代码旨在以非常严格的循环运行,因此性能很重要。问题是Haskell比Scala慢10倍左右。这里是Haskell代码。Haskell矢量性能与Scala相比

{-# LANGUAGE BangPatterns #-} 
import qualified Data.Vector.Unboxed as VU 

newtype AffineTransform = AffineTransform {get :: (VU.Vector Double)} deriving (Show) 

{-# INLINE runAffineTransform #-} 
runAffineTransform :: AffineTransform -> (Double, Double) -> (Double, Double) 
runAffineTransform affTr (!x, !y) = (get affTr `VU.unsafeIndex` 0 * x + get affTr `VU.unsafeIndex` 1 * y + get affTr `VU.unsafeIndex` 2, 
             get affTr `VU.unsafeIndex` 3 * x + get affTr `VU.unsafeIndex` 4 * y + get affTr `VU.unsafeIndex` 5) 

testAffineTransformSpeed :: AffineTransform -> Int -> (Double, Double) 
testAffineTransformSpeed affTr count = go count (0.5, 0.5) 
    where go :: Int -> (Double, Double) -> (Double, Double) 
     go 0 res = res 
     go !n !res = go (n-1) (runAffineTransform affTr res) 

还可以做些什么来改进此代码?

+0

你是如何编译它以及使用哪个编译器的? –

+0

它是用ghc 7.6.3编译的。选项是“-O2-Wall -funbox-strict-fields -threaded -rtsopts”。我期望-funbox-strict-fields就足够了,但事实并非如此。那么,我是一个完整的新手,所以我的期望可能会有所改变。 – Aurimas

+1

为什么你需要'Vector'来表示仿射变换?为它制作ADT并处理坐标数组似乎更合理。 – leventov

回答

8

的主要问题是

runAffineTransform affTr (!x, !y) = (get affTr `VU.unsafeIndex` 0 * x 
            + get affTr `VU.unsafeIndex` 1 * y 
            + get affTr `VU.unsafeIndex` 2, 
             get affTr `VU.unsafeIndex` 3 * x 
            + get affTr `VU.unsafeIndex` 4 * y 
            + get affTr `VU.unsafeIndex` 5) 

产生对的thunkrunAffineTransform被调用时,组件不会被评估,它们仍然是thunk,直到某些消费者要求对它们进行评估。

testAffineTransformSpeed affTr count = go count (0.5, 0.5) 
    where go :: Int -> (Double, Double) -> (Double, Double) 
     go 0 res = res 
     go !n !res = go (n-1) (runAffineTransform affTr res) 

不是消费者,在res轰仅评估为最外层的构造,(,),和你这是在最终只评估了

runAffineTransform affTr (runAffineTrasform affTr (runAffineTransform affTr (...))) 

结果是,当最后的正常形式是需要的。

如果强行结果的部件要立即进行评估,

runAffineTransform affTr (!x, !y) = case 
    ( get affTr `U.unsafeIndex` 0 * x 
    + get affTr `U.unsafeIndex` 1 * y 
    + get affTr `U.unsafeIndex` 2 
    , get affTr `U.unsafeIndex` 3 * x 
    + get affTr `U.unsafeIndex` 4 * y 
    + get affTr `U.unsafeIndex` 5 
) of (!a,!b) -> (a,b) 

,让它被内联,使用自定义严格对拆箱Double# s到jtobin的版本的主要区别在于:在testAffineTransformSpeed的循环中,您将使用盒装的Double s作为参数进行一次初始迭代,最后,结果的组件将被装箱,这会增加一些常量开销(每个循环的大约5纳秒)。循环的主要部分在两种情况下都采用Int#和两个Double#参数,并且除了达到n = 0时的装箱外,循环体是相同的。

当然,通过使用unboxed严格配对类型迫使组件立即评估更好。

+0

什么时候“!(!x,!y)'有用? – is7s

+0

在'let'或'where'的绑定中。但是这里只是从'!res @(!x,!y)'中删除一个字符太少而已。感谢您的注意。 –

9

我定义的以下严格/装箱对类型:

import System.Random.MWC -- for later 
import Control.DeepSeq 

data SP = SP { 
    one :: {-# UNPACK #-} !Double 
    , two :: {-# UNPACK #-} !Double 
    } deriving Show 

instance NFData SP where 
    rnf p = rnf (one p) `seq` rnf (two p) `seq`() 

并在runAffineTransform函数代替它:

runAffineTransform2 :: AffineTransform -> SP -> SP 
runAffineTransform2 affTr !(SP x y) = 
    SP ( get affTr `U.unsafeIndex` 0 * x 
     + get affTr `U.unsafeIndex` 1 * y 
     + get affTr `U.unsafeIndex` 2 ) 

    ( get affTr `U.unsafeIndex` 3 * x 
     + get affTr `U.unsafeIndex` 4 * y 
     + get affTr `U.unsafeIndex` 5 ) 
{-# INLINE runAffineTransform2 #-} 

然后跑这个基准套件:

main :: IO() 
main = do 
    g <- create 
    zs <- fmap (AffineTransform . U.fromList) 
      (replicateM 100000 (uniformR (0 :: Double, 1) g)) 

    let myConfig = defaultConfig { cfgPerformGC = ljust True } 

    defaultMainWith myConfig (return()) [ 
     bench "yours" $ nf (testAffineTransformSpeed zs) 10 
    , bench "mine" $ nf (testAffineTransformSpeed2 zs) 10 
    ] 

编译与-O2并跑,并观察到一些(~4倍)加速:

benchmarking yours 
mean: 257.4559 ns, lb 256.2492 ns, ub 258.9761 ns, ci 0.950 
std dev: 6.889905 ns, lb 5.688330 ns, ub 8.839753 ns, ci 0.950 
found 5 outliers among 100 samples (5.0%) 
    3 (3.0%) high mild 
    2 (2.0%) high severe 
variance introduced by outliers: 20.944% 
variance is moderately inflated by outliers 

benchmarking mine 
mean: 69.56408 ns, lb 69.29910 ns, ub 69.86838 ns, ci 0.950 
std dev: 1.448874 ns, lb 1.261444 ns, ub 1.718074 ns, ci 0.950 
found 4 outliers among 100 samples (4.0%) 
    4 (4.0%) high mild 
variance introduced by outliers: 14.190% 
variance is moderately inflated by outliers 

Full code is in gist here

编辑

我也贴标准的输出报告here

+1

不错,现在在我的电脑上它的性能略好于Scala(Java)。什么是要学的教训?严格标注是不够的(他们不会自动取消装箱?)?有时你必须手动创建无箱(解压缩)数据结构? – Aurimas

+1

@ user2705843幻灯片(4)和(9)与此处相关:http://johantibell.com/files/haskell-performance-patterns.html – jtobin

+2

“SP”的NFData实例完全是浪费工作。事实上,这其中很多。它隐式地创建'Double'构造函数来指向'SP'中的解压缩值,强制它们,然后抛弃它们。 'rnf a = seq a()'会更加高效和正确。 – Carl