我尝试在我使用Google Dataproc创建的spark集群上加载databricks csv库(参见https://github.com/databricks/spark-csv)。所有这些都使用PySpark。在pyspark中加载Databricks csv库
我启动PySpark,我输入:
spark-submit --packages com.databricks:spark-csv_2.11:1.2.0 --verbose
,但我得到这样的回答:
Using properties file: /usr/lib/spark/conf/spark-defaults.conf
Adding default property: spark.executor.extraJavaOptions=-Xbootclasspath/p:/usr/local/share/google/alpn/alpn-boot-8.1.3.v20150130.jar
Adding default property: spark.history.fs.logDirectory=file:///var/log/spark/events
Adding default property: spark.eventLog.enabled=true
Adding default property: spark.driver.maxResultSize=937m
Adding default property: spark.shuffle.service.enabled=true
Adding default property: spark.yarn.historyServer.address=fb-cluster-1-m:18080
Adding default property: spark.driver.memory=1874m
Adding default property: spark.dynamicAllocation.maxExecutors=100000
Adding default property: spark.scheduler.minRegisteredResourcesRatio=0.0
Adding default property: spark.yarn.am.memory=2176m
Adding default property: spark.driver.extraJavaOptions=-Xbootclasspath/p:/usr/local/share/google/alpn/alpn-boot-8.1.3.v20150130.jar
Adding default property: spark.master=yarn-client
Adding default property: spark.executor.memory=2176m
Adding default property: spark.eventLog.dir=file:///var/log/spark/events
Adding default property: spark.dynamicAllocation.enabled=true
Adding default property: spark.executor.cores=1
Adding default property: spark.yarn.executor.memoryOverhead=384
Adding default property: spark.dynamicAllocation.minExecutors=1
Adding default property: spark.dynamicAllocation.initialExecutors=100000
Adding default property: spark.akka.frameSize=512
Error: Must specify a primary resource (JAR or Python or R file)
Run with --help for usage help or --verbose for debug output
这种矛盾与后lebigot合并文档https://github.com/databricks/spark-csv在https://github.com/databricks/spark-csv/issues/59
谁能帮我?
您是否启动了一个pyspark shell并输入'spark-submit'命令?如果你还没有试过'pyspark --packages com.databricks:spark-csv_2.11:1.2.0',你可以试试吗?我可以在本地运行。 –
是的我正在启动一个pyspark shell并输入spark-submit。我尝试了你的命令,它的工作原理,谢谢。但是,该包是否安装好?或者只是暂时下载? – sweeeeeet
嘿@sweeeeeet,我添加了一个更多的信息的答案。希望这有助于! –