0
我使用星火壳牌(斯卡拉2.10和Spark流org.apache.spark:spark-streaming-kafka-0-10_2.10:2.0.1
)来测试一个Spark /卡夫卡消费者:星火流卡夫卡消费者不喜欢DSTREAM
import org.apache.kafka.clients.consumer.ConsumerRecord
import org.apache.kafka.common.serialization.StringDeserializer
import org.apache.spark.streaming.kafka010._
import org.apache.spark.streaming.kafka010.LocationStrategies.PreferConsistent
import org.apache.spark.streaming.kafka010.ConsumerStrategies.Subscribe
import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.dstream.DStream
val kafkaParams = Map[String, Object](
"bootstrap.servers" -> "mykafka01.example.com:9092",
"key.deserializer" -> classOf[StringDeserializer],
"value.deserializer" -> classOf[StringDeserializer],
"group.id" -> "mykafka",
"auto.offset.reset" -> "latest",
"enable.auto.commit" -> (false: java.lang.Boolean)
)
val topics = Array("mytopic")
def createKafkaStream(ssc: StreamingContext, topics: Array[String], kafkaParams: Map[String,Object]) : DStream[(String, String)] = {
KafkaUtils.createDirectStream[String, String](ssc, PreferConsistent, Subscribe[String, String](topics, kafkaParams))
}
def messageConsumer(): StreamingContext = {
val ssc = new StreamingContext(SparkContext.getOrCreate(), Seconds(10))
createKafkaStream(ssc, topics, kafkaParams).foreachRDD(rdd => {
rdd.collect().foreach { msg =>
try {
println("Received message: " + msg._2)
} catch {
case e @ (_: Exception | _: Error | _: Throwable) => {
println("Exception: " + e.getMessage)
e.printStackTrace()
}
}
}
})
ssc
}
val ssc = StreamingContext.getActiveOrCreate(messageConsumer)
ssc.start()
ssc.awaitTermination()
当我运行此我得到以下异常:
<console>:60: error: type mismatch;
found : org.apache.spark.streaming.dstream.InputDStream[org.apache.kafka.clients.consumer.ConsumerRecord[String,String]]
required: org.apache.spark.streaming.dstream.DStream[(String, String)]
KafkaUtils.createDirectStream[String, String](ssc, PreferConsistent, Subscribe[String, String](topics, kafkaParams))
^
我已经过检查斯卡拉/ API文档一遍又一遍,而这种代码看起来像它应该正确执行。任何想法,我要去哪里?
@smeeb我没有该版本的kafka lib来尝试,但是您可以查看重载的'createDirectStream'方法来查看它们的返回类型以及它们所采用的参数。看起来你正在调用的方法返回'DStream [ConsumerRecord [K,V]]'而不是你期望的'DStream [K,V]'。或者,如果它是唯一的选择,改变你的代码接受'DStream [ConsumerRecord [K,V]]',然后映射到'(K,V'。 – khachik
再次感谢@khachik(+1) - 当你说“ *看起来,你是callig的方法返回'DStream [ConsumerRecord [K,V]]'*“...你在哪里看到这个证据,我看着我认为是[正确的javadocs]( http://spark.apache.org/docs/2.0.1/api/java/org/apache/spark/streaming/kafka/KafkaUtils.html),我不会看到任何重载的'createDirectStream'方法返回'DStream [ConsumerRecord [K,V]]的想法?再次感谢!!! – smeeb
@smeeb看[综合指南](https://spark.apache.org/docs/latest/streaming-kafka-0-10 -integration.html),你调用的方法返回一个'ConsumerRecords'流,你应该将它映射为获取键/值对:'stream.map(record =>(record.key,record.value))'。你发布的javadocs似乎是用于不同的版本。 – khachik