1

当试图在Spark的MLLib(1.4)中使用ALS在Windows上训练机器学习模型时,Pyspark始终以StackoverflowError为终止。我试图按照https://stackoverflow.com/a/31484461/36130中描述的方式添加检查点 - 似乎没有帮助(虽然每次运行都会创建一个新目录,但它总是空的)。使用PySpark进行ALS培训引发StackOverflowError

这里的训练码和堆栈跟踪:

ranks = [8, 12] 
lambdas = [0.1, 10.0] 
numIters = [10, 20] 
bestModel = None 
bestValidationRmse = float("inf") 
bestRank = 0 
bestLambda = -1.0 
bestNumIter = -1 

for rank, lmbda, numIter in itertools.product(ranks, lambdas, numIters): 
    ALS.checkpointInterval = 2 
    model = ALS.train(training, rank, numIter, lmbda) 
    validationRmse = computeRmse(model, validation, numValidation) 

    if (validationRmse < bestValidationRmse): 
     bestModel = model 
     bestValidationRmse = validationRmse 
     bestRank = rank 
     bestLambda = lmbda 
     bestNumIter = numIter 

testRmse = computeRmse(bestModel, test, numTest) 

堆栈跟踪:

15/08/27 02:02:58 ERROR Executor: Exception in task 3.0 in stage 56.0 (TID 127) 
java.lang.StackOverflowError 
    at java.io.ObjectInputStream$BlockDataInputStream.readInt(Unknown Source) 
    at java.io.ObjectInputStream.readHandle(Unknown Source) 
    at java.io.ObjectInputStream.readClassDesc(Unknown Source) 
    at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source) 
    at java.io.ObjectInputStream.readObject0(Unknown Source) 
    at java.io.ObjectInputStream.defaultReadFields(Unknown Source) 
    at java.io.ObjectInputStream.readSerialData(Unknown Source) 
    at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source) 
    at java.io.ObjectInputStream.readObject0(Unknown Source) 
    at java.io.ObjectInputStream.defaultReadFields(Unknown Source) 
    at java.io.ObjectInputStream.readSerialData(Unknown Source) 
    at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source) 
    at java.io.ObjectInputStream.readObject0(Unknown Source) 
    at java.io.ObjectInputStream.defaultReadFields(Unknown Source) 
    at java.io.ObjectInputStream.readSerialData(Unknown Source) 
    at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source) 
    at java.io.ObjectInputStream.readObject0(Unknown Source) 
    at java.io.ObjectInputStream.readObject(Unknown Source) 
    at scala.collection.immutable.$colon$colon.readObject(List.scala:362) 
    at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) 
    at java.lang.reflect.Method.invoke(Unknown Source) 
    at java.io.ObjectStreamClass.invokeReadObject(Unknown Source) 
    at java.io.ObjectInputStream.readSerialData(Unknown Source) 
+0

你的数据的大小和给定的堆大小火花? – eliasah

+0

输入文件是24MB(约100k条记录)和spark.executor.memory - 4G和JVM内存设置为2G – atVelu

+0

我假设你正在本地模式下运行? – eliasah

回答

1

尝试设置检查点目录

sc.setCheckpointDir("/check_point_dir")