我做了一个独立的Apache集群7个。要运行Scala代码,代码是火花数据表格集
/** Our main function where the action happens */
def main(args: Array[String]) {
// Set the log level to only print errors
Logger.getLogger("org").setLevel(Level.ERROR)
// Create a SparkContext without much actual configuration
// We want EMR's config defaults to be used.
val conf = new SparkConf()
conf.setAppName("MovieSimilarities1M")
val sc = new SparkContext(conf)
val input = sc.textFile("file:///home/ralfahad/LearnSpark/SBTCreate/customer-orders.csv")
val mappedInput = input.map(extractCustomerPricePairs)
val totalByCustomer = mappedInput.reduceByKey((x,y) => x + y)
val flipped = totalByCustomer.map(x => (x._2, x._1))
val totalByCustomerSorted = flipped.sortByKey()
val results = totalByCustomerSorted.collect()
// Print the results.
results.foreach(println)
}
}
步骤是:
我创建使用.jar文件SBT
使用提交作业火花提交* .jar
但是我的执行程序找不到sc.textFile("file:///home/ralfahad/LearnSpark/SBTCreate/customer-orders.csv")
此customer-orders.csv文件存储在我的主PC中。
完整堆栈跟踪:
error: [Stage 0:> (0 + 2)/2]17/09/25 17:32:35 ERROR TaskSetManager: Task 0 in stage 0.0 failed 4 times; aborting job Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 5, 141.225.166.191, executor 2): java.io.FileNotFoundException: File file:/home/ralfahad/LearnSpark/SBTCreate/customer-orders.csv does not exist
我怎么解决这个问题呢?
请修改代码以在群集中运行。
错误:[阶段0:>(0 + 2)/ 2] 17/09/25 17:32:35错误TaskSetManager:阶段0.0中的任务0失败4次;中止作业 线程“main”中的异常org.apache.spark.SparkException:由于阶段失败而导致作业中止:阶段0中的任务0。0失败4次,最近失败:在阶段0.0(TID 5,141.225.166.191,执行器2)中丢失任务0.3:java.io.FileNotFoundException:文件文件:/home/ralfahad/LearnSpark/SBTCreate/customer-orders.csv不存在 –