2017-07-26 32 views
0

我正在尝试将Spark 2.1作业的指标整合到Ganglia。如何整合Ganglia for Spark 2.1作业指标,忽略Ganglia指标的Spark

我的火花default.conf看起来像

*.sink.ganglia.class org.apache.spark.metrics.sink.GangliaSink 
*.sink.ganglia.name Name 
*.sink.ganglia.host $MASTERIP 
*.sink.ganglia.port $PORT 

*.sink.ganglia.mode unicast 
*.sink.ganglia.period 10 
*.sink.ganglia.unit seconds 

当我提交我的工作,我可以看到警告

Warning: Ignoring non-spark config property: *.sink.ganglia.host=host 
Warning: Ignoring non-spark config property: *.sink.ganglia.name=Name 
Warning: Ignoring non-spark config property: *.sink.ganglia.mode=unicast 
Warning: Ignoring non-spark config property: *.sink.ganglia.class=org.apache.spark.metrics.sink.GangliaSink 
Warning: Ignoring non-spark config property: *.sink.ganglia.period=10 
Warning: Ignoring non-spark config property: *.sink.ganglia.port=8649 
Warning: Ignoring non-spark config property: *.sink.ganglia.unit=seconds 

我的环境细节

Hadoop : Amazon 2.7.3 - emr-5.7.0 
Spark : Spark 2.1.1, 
Ganglia: 3.7.2 

如果你有任何投入或Ganglia的任何其他选择,请回复。

回答

0

在该页面中: https://spark.apache.org/docs/latest/monitoring.html

Spark also supports a Ganglia sink which is not included in the default build due to licensing restrictions: 

GangliaSink: Sends metrics to a Ganglia node or multicast group. 
**To install the GangliaSink you’ll need to perform a custom build of Spark**. Note that by embedding this library you will include LGPL-licensed code in your Spark package. For sbt users, set the SPARK_GANGLIA_LGPL environment variable before building. For Maven users, enable the -Pspark-ganglia-lgpl profile. In addition to modifying the cluster’s Spark build user 
+0

我已经使用EMR火花与神经节设置我希望它包括包装,但其忽略度量的配置。 –

0

根据spark docs

度量系统经由一个配置文件,该火花预计将存在于$ SPARK_HOME/CONF /指标。属性。可以通过spark.metrics.conf配置属性指定自定义文件位置。

,以便代替在spark-default.conf具有这些confs,它们移动到$SPARK_HOME/conf/metrics.properties