2013-06-23 39 views
9

作为练习运行慢,我把阿卡这些ScalaJava例子移植到弗雷格。虽然它运行良好,但运行速度比Scala(540毫秒)慢(11秒)。阿卡与弗雷格比斯卡拉对应

module mmhelloworld.akkatutorialfregecore.Pi where 
import mmhelloworld.akkatutorialfregecore.Akka 

data PiMessage = Calculate | 
       Work {start :: Int, nrOfElements :: Int} | 
       Result {value :: Double} | 
       PiApproximation {pi :: Double, duration :: Duration} 

data Worker = private Worker where 
    calculatePiFor :: Int -> Int -> Double 
    calculatePiFor !start !nrOfElements = loop start nrOfElements 0.0 f where 
     loop !curr !n !acc f = if n == 0 then acc 
           else loop (curr + 1) (n - 1) (f acc curr) f 
     f !acc !i = acc + (4.0 * fromInt (1 - (i `mod` 2) * 2)/fromInt (2 * i + 1)) 

    onReceive :: Mutable s UntypedActor -> PiMessage -> ST s() 
    onReceive actor Work{start=start, nrOfElements=nrOfElements} = do 
     sender <- actor.sender 
     self <- actor.getSelf 
     sender.tellSender (Result $ calculatePiFor start nrOfElements) self 

data Master = private Master { 
    nrOfWorkers :: Int, 
    nrOfMessages :: Int, 
    nrOfElements :: Int, 
    listener :: MutableIO ActorRef, 
    pi :: Double, 
    nrOfResults :: Int, 
    workerRouter :: MutableIO ActorRef, 
    start :: Long } where 

    initMaster :: Int -> Int -> Int -> MutableIO ActorRef -> MutableIO UntypedActor -> IO Master 
    initMaster nrOfWorkers nrOfMessages nrOfElements listener actor = do 
     props <- Props.forUntypedActor Worker.onReceive 
     router <- RoundRobinRouter.new nrOfWorkers 
     context <- actor.getContext 
     workerRouter <- props.withRouter router >>= (\p -> context.actorOf p "workerRouter") 
     now <- currentTimeMillis() 
     return $ Master nrOfWorkers nrOfMessages nrOfElements listener 0.0 0 workerRouter now 

    onReceive :: MutableIO UntypedActor -> Master -> PiMessage -> IO Master 
    onReceive actor master Calculate = do 
     self <- actor.getSelf 
     let tellWorker start = master.workerRouter.tellSender (work start) self 
      work start = Work (start * master.nrOfElements) master.nrOfElements 
     forM_ [0 .. master.nrOfMessages - 1] tellWorker 
     return master 
    onReceive actor master (Result newPi) = do 
     let (!newNrOfResults, !pi) = (master.nrOfResults + 1, master.pi + newPi) 
     when (newNrOfResults == master.nrOfMessages) $ do 
      self <- actor.getSelf 
      now <- currentTimeMillis() 
      duration <- Duration.create (now - master.start) TimeUnit.milliseconds 
      master.listener.tellSender (PiApproximation pi duration) self 
      actor.getContext >>= (\context -> context.stop self) 
     return master.{pi=pi, nrOfResults=newNrOfResults} 

data Listener = private Listener where 
    onReceive :: MutableIO UntypedActor -> PiMessage -> IO() 
    onReceive actor (PiApproximation pi duration) = do 
     println $ "Pi approximation: " ++ show pi 
     println $ "Calculation time: " ++ duration.toString 
     actor.getContext >>= ActorContext.system >>= ActorSystem.shutdown 

calculate nrOfWorkers nrOfElements nrOfMessages = do 
    system <- ActorSystem.create "PiSystem" 
    listener <- Props.forUntypedActor Listener.onReceive >>= flip system.actorOf "listener" 
    let constructor = Master.initMaster nrOfWorkers nrOfMessages nrOfElements listener 
     newMaster = StatefulUntypedActor.new constructor Master.onReceive 
    factory <- UntypedActorFactory.new newMaster 
    masterActor <- Props.fromUntypedFactory factory >>= flip system.actorOf "master" 
    masterActor.tell Calculate 
    getLine >> return() --Not to exit until done 

main _ = calculate 4 10000 10000 

我是在做一些与Akka有关的事情,或者是因为在Frege中懒惰而导致行动迟缓?例如,当我最初有fold(严格折叠)代替loopWorker.calculatePiFor时,花了27s。

依赖关系:

  1. 阿卡对弗雷格原生定义:Akka.fr
  2. Java助手延长阿卡班,因为我们不能在 弗雷格扩展类:Actors.java

回答

6

我并不很熟悉与演员,但假设最严格的循环确实是loop你可以避免传递函数f作为参数。

其一,传递函数的应用程序不能采取实际的传递函数的严格的优势。相反,代码生成必须保守地认为传递的函数懒惰地提取它的参数并返回一个懒惰的结果。

其次,在我们的情况下,你使用f真的只是曾经在这里,所以可以内嵌了。 (这是它是如何在你链接的文章中的Scala代码来完成。)

看看在下面的示例代码,模仿你的尾递归生成的代码:

test b c = loop 100 0 f 
    where 
     loop 0 !acc f = acc 
     loop n !acc f = loop (n-1) (acc + f (acc-1) (acc+1)) f -- tail recursion 
     f x y = 2*x + 7*y 

我们到达那里:

// arg2$f is the accumulator 
arg$2 = arg$2f + (int)frege.runtime.Delayed.<java.lang.Integer>forced(
     f_3237.apply(PreludeBase.INum_Int._minusƒ.apply(arg$2f, 1)).apply(
      PreludeBase.INum_Int._plusƒ.apply(arg$2f, 1) 
     ).result() 
    );  

你在这里看到的是f被称为懒洋洋地导致也懒洋洋地计算所有参数expressios。请注意这需要的方法调用次数! 在你的情况下,代码仍然应该是这样的:

(double)Delayed.<Double>forced(f.apply(acc).apply(curr).result()) 

这意味着,二闭包是建立在盒装值ACC和CURR,然后将结果计算,即函数f被称为与拆箱参数,并且结果再次被装箱,只是为了在下一个循环中再次取消装箱(强制)。

现在比较下面,我们只是不通过f而是直接把它叫做:

test b c = loop 100 0 
    where 
     loop 0 !acc = acc 
     loop n !acc = loop (n-1) (acc + f (acc-1) (acc+1)) 
     f x y = 2*x + 7*y 

我们得到:

arg$2 = arg$2f + f(arg$2f - 1, arg$2f + 1); 

好多了! 最后,在上述情况下,我们可以没有一个函数调用都:

 loop n !acc = loop (n-1) (acc + f) where 
     f = 2*x + 7*y 
     x = acc-1 
     y = acc+1 

而且这得到:

final int y_3236 = arg$2f + 1; 
final int x_3235 = arg$2f - 1; 
... 
arg$2 = arg$2f + ((2 * x_3235) + (7 * y_3236)); 

请尝试了这一点,让我们知道会发生什么。性能的主要提升应该来自不通过f,而内联可能会在JIT中完成。

fold的额外费用可能是因为您在应用它之前还必须创建一些列表。

+2

非常好!现在降到1.3秒。我查看了生成的Java源代码。它现在退化成一个'''while'''循环。 –