2017-04-07 66 views
2

我想通过Kafka发送一个非常简单的JSON对象,并使用Python和kafka-python将其读出。但是,我总是看到以下错误:使用Kafka-Python的解串器无法使用来自Kafka的JSON消息

2017-04-07 10:28:52,030.30.9998989105:kafka.future:8228:ERROR:10620:Error processing callback 
Traceback (most recent call last): 
    File "C:\Anaconda2\lib\site-packages\kafka\future.py", line 79, in _call_backs 
    f(value) 
    File "C:\Anaconda2\lib\site-packages\kafka\consumer\fetcher.py", line 760, in _handle_fetch_response 
    unpacked = list(self._unpack_message_set(tp, messages)) 
    File "C:\Anaconda2\lib\site-packages\kafka\consumer\fetcher.py", line 539, in _unpack_message_set 
    tp.topic, msg.value) 
    File "C:\Anaconda2\lib\site-packages\kafka\consumer\fetcher.py", line 570, in _deserialize 
    return f(bytes_) 
    File "C:\Users\myUser\workspace\PythonKafkaTest\src\example.py", line 55, in <lambda> 
    value_deserializer=lambda m: json.loads(m).decode('utf-8')) 
    File "C:\Anaconda2\lib\json\__init__.py", line 339, in loads 
    return _default_decoder.decode(s) 
    File "C:\Anaconda2\lib\json\decoder.py", line 364, in decode 
    obj, end = self.raw_decode(s, idx=_w(s, 0).end()) 
    File "C:\Anaconda2\lib\json\decoder.py", line 382, in raw_decode 
    raise ValueError("No JSON object could be decoded") 
ValueError: No JSON object could be decoded 

我做了一些研究,但此错误的最常见的原因是JSON是错误的。我已经尝试打印出JSON,然后通过将以下内容添加到我的代码中并将JSON打印出来而没有错误。

while True: 
     json_obj1 = json.dumps({"dataObjectID": "test1"}) 
     print json_obj1 
     producer.send('my-topic', {"dataObjectID": "test1"}) 
     producer.send('my-topic', {"dataObjectID": "test2"}) 
     time.sleep(1) 

这使我怀疑我可以产生json,但不会消耗它。

这里是我的代码:

import threading 
import logging 
import time 
import json 

from kafka import KafkaConsumer, KafkaProducer 


class Producer(threading.Thread): 
    daemon = True 

    def run(self): 
     producer = KafkaProducer(bootstrap_servers='localhost:9092', 
           value_serializer=lambda v: json.dumps(v).encode('utf-8')) 

     while True: 
      producer.send('my-topic', {"dataObjectID": "test1"}) 
      producer.send('my-topic', {"dataObjectID": "test2"}) 
      time.sleep(1) 


class Consumer(threading.Thread): 
    daemon = True 

    def run(self): 
     consumer = KafkaConsumer(bootstrap_servers='localhost:9092', 
           auto_offset_reset='earliest', 
           value_deserializer=lambda m: json.loads(m).decode('utf-8')) 
     consumer.subscribe(['my-topic']) 

     for message in consumer: 
      print (message) 


def main(): 
    threads = [ 
     Producer(), 
     Consumer() 
    ] 

    for t in threads: 
     t.start() 

    time.sleep(10) 

if __name__ == "__main__": 
    logging.basicConfig(
     format='%(asctime)s.%(msecs)s:%(name)s:%(thread)d:' + 
       '%(levelname)s:%(process)d:%(message)s', 
     level=logging.INFO 
    ) 
    main() 

我可以成功地发送和接收字符串,如果我删除value_serializer和value_deserializer。当我运行代码,我可以看到我在发送JSON这里是一个简短snipit:

ConsumerRecord(topic=u'my-topic', partition=0, offset=5742, timestamp=None, timestamp_type=None, key=None, value='{"dataObjectID": "test1"}', checksum=-1301891455, serialized_key_size=-1, serialized_value_size=25) 
ConsumerRecord(topic=u'my-topic', partition=0, offset=5743, timestamp=None, timestamp_type=None, key=None, value='{"dataObjectID": "test2"}', checksum=-1340077864, serialized_key_size=-1, serialized_value_size=25) 
ConsumerRecord(topic=u'my-topic', partition=0, offset=5744, timestamp=None, timestamp_type=None, key=None, value='test', checksum=1495943047, serialized_key_size=-1, serialized_value_size=4) 
ConsumerRecord(topic=u'my-topic', partition=0, offset=5745, timestamp=None, timestamp_type=None, key=None, value='\xc2Hello, stranger!', checksum=-1090450220, serialized_key_size=-1, serialized_value_size=17) 
ConsumerRecord(topic=u'my-topic', partition=0, offset=5746, timestamp=None, timestamp_type=None, key=None, value='test', checksum=1495943047, serialized_key_size=-1, serialized_value_size=4) 
ConsumerRecord(topic=u'my-topic', partition=0, offset=5747, timestamp=None, timestamp_type=None, key=None, value='\xc2Hello, stranger!', checksum=-1090450220, serialized_key_size=-1, serialized_value_size=17) 

所以,我试图从消费者移除value_deserializer,而代码执行,但没有消息传出解串器作为一个字符串,这不是我所需要的。那么,为什么value_deserializer没有工作?是否有不同的方式从我应该使用的Kafka消息中获取JSON?

回答

1

原来问题是value_deserializer=lambda m: json.loads(m).decode('utf-8')的解码部分,当我将其更改为value_deserializer=lambda m: json.loads(m),然后我看到从Kafka读取的对象的类型现在是字典。其基于从Python的JSON文件,以下信息是正确的:

|---------------------|------------------| 
|  JSON   |  Python  | 
|---------------------|------------------| 
|  object   |  dict  | 
|---------------------|------------------| 
|  array   |  list  | 
|---------------------|------------------| 
|  string   |  unicode  | 
|---------------------|------------------| 
|  number (int) |  int, long | 
|---------------------|------------------| 
|  number (real) |  float  | 
|---------------------|------------------| 
|  true   |  True  | 
|---------------------|------------------| 
|  false   |  False  | 
|---------------------|------------------| 
|  null   |  None  | 
|---------------------|------------------| 
4

我的问题是第一解码消息为UTF-8后问题,然后json.load /转储:

value_deserializer=lambda m: json.loads(m.decode('utf-8')) 

而不是:

value_deserializer=lambda m: json.loads(m).decode('utf-8') 

希望这也将工作的制片人方

相关问题