您应该能够从您的python脚本编程式地使用pip,例如,
import pip
pip.main(['install', '--user', 'cloudant'])
这为我工作:
helloSpark.py运行后
import sys
from pyspark import SparkContext
import pip
pip.main(['install', '--user', 'cloudant'])
from cloudant.client import Cloudant
client = Cloudant('username', 'password', account='account', connect=True)
# do some spark processing
def computeStatsForCollection(sc,countPerPartitions=100000,partitions=5):
totalNumber = min(countPerPartitions * partitions, sys.maxsize)
rdd = sc.parallelize(range(totalNumber),partitions)
return (rdd.mean(), rdd.variance())
if __name__ == "__main__":
sc = SparkContext(appName="Hello Spark")
print("Hello Spark Demo. Compute the mean and variance of a collection")
stats = computeStatsForCollection(sc);
print(">>> Results: ")
print(">>>>>>>Mean: " + str(stats[0]));
print(">>>>>>>Variance: " + str(stats[1]));
sc.stop()
run.sh
./spark-submit.sh --vcap ./vcap.json --deploy-mode cluster \
--master https://169.54.219.20:8443 \
--conf spark.service.spark_version=1.6
helloSpark.py
标准输出:
$ cat stdout_1498114277669877424
no extra config
load default config from : /usr/local/src/spark160master/spark/profile/batch/
Requirement already satisfied: cloudant in /gpfs/global_fs01/sym_shared/YPProdSpark/user/s9c8-cbcae60bfa1d3e-39ca506ba762/.local/lib/python2.7/site-packages
Requirement already satisfied: requests<3.0.0,>=2.7.0 in /usr/local/src/bluemix_jupyter_bundle.v47/notebook/lib/python2.7/site-packages (from cloudant)
Traceback (most recent call last):
File "/tmp/spark-160-ego-master/work/spark-driver-380d8ae7-4ddc-452e-bb29-1665375a348c/helloSpark.py", line 8, in <module>
client = Cloudant('username', 'password', account='account', connect=True)
File "/gpfs/fs01/user/s9c8-cbcae60bfa1d3e-39ca506ba762/.local/lib/python2.7/site-packages/cloudant/client.py", line 443, in __init__
self.connect()
File "/gpfs/fs01/user/s9c8-cbcae60bfa1d3e-39ca506ba762/.local/lib/python2.7/site-packages/cloudant/client.py", line 114, in connect
self.session_login(self._user, self._auth_token)
File "/gpfs/fs01/user/s9c8-cbcae60bfa1d3e-39ca506ba762/.local/lib/python2.7/site-packages/cloudant/client.py", line 172, in session_login
resp.raise_for_status()
File "/usr/local/src/bluemix_jupyter_bundle.v47/notebook/lib/python2.7/site-packages/requests/models.py", line 840, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://account.cloudant.com/_session
不幸的是,我没有保存输出的我第一次运行该通报说,它已经安装Cloudant脚本。但在这里您可以看到Cloudant库可用,并尝试使用无效凭据连接到群集,因此Cloudant返回了401错误。
你可能不希望尝试点子安装每次运行脚本的时候,所以你可以试试这个:
try:
import cloudant
except:
import pip
pip.main(['install', '--user', 'cloudant'])
这将尝试加载Cloudant库。如果加载出错(例如,因为尚未安装),它将与pip一起安装。
您链接的文章是关于python云代工应用程序,而不是关于spark应用程序,所以不幸的是不相关。 –