2012-10-04 54 views
0

我想查询使用Python 2.7和pymongo-2.3采用这样的MongoDB数据库:的UnicodeDecodeError在遍历集合的MongoDB

from pymongo import Connection 

connection = Connection() 
db = connection['db-name'] 
collections = db.subName 
entries = collections['collection-name'] 
print entries 
# > Collection(Database(Connection('localhost', 27017), u'db-name'), u'subName.collection-name') 

for entry in entries.find(): 
    pass 

迭代器失败,即使我不与entry做任何事对象:

Traceback (most recent call last): 
File "/Users/../mongo.py", line 27, in <module> 
    for entry in entries.find(): 
File "/Library/Python/2.7/site-packages/pymongo-2.3-py2.7-macosx-10.8-intel.egg/pymongo/cursor.py", line 778, in next 
File "/Library/Python/2.7/site-packages/pymongo-2.3-py2.7-macosx-10.8-intel.egg/pymongo/cursor.py", line 742, in _refresh 
File "/Library/Python/2.7/site-packages/pymongo-2.3-py2.7-macosx-10.8-intel.egg/pymongo/cursor.py", line 686, in __send_message 
File "/Library/Python/2.7/site-packages/pymongo-2.3-py2.7-macosx-10.8-intel.egg/pymongo/helpers.py", line 111, in _unpack_response 
UnicodeDecodeError: 'utf8' codec can't decode byte 0xfc in position 744: invalid start byte 

我不是我试图查询的数据库的创建者。 有没有人知道我在做什么错,我该如何解决?谢谢。


更新:我设法使用try-exceptpymongo/helpers.py跳过出错行,但我宁愿不涉及数据丢失的解决方案。

try: 
    result["data"] = bson.decode_all(response[20:], as_class, tz_aware, uuid_subtype) 
except: 
    result["data"] = [] 

回答

2

你能使用蒙戈外壳尝试相同的操作?我想弄清楚它是Python特定的还是数据库中的损坏:

$ mongo db-name 
> var collection = db.getCollection('subName.collection-name') 
> collection.find().forEach(function(doc) { printjson(doc); }) 
+0

对不起。为了加快速度,尝试过'''''''''''''''''''''''它似乎与Python相关。 –

+0

好的,你可以在Python中做'entry.find()。sort([('_ id',1)]):print entry ['_ id']'?这会在有问题的文档之前给你提供_id。然后把这个_id放在shell中,'collection.findOne({_ id:{$ gt:my_id}})'并在这里发布。 –

+0

奇怪的是,这现在在控制台(也在Python中)失败:''collection.findOne({_ id:{$ gt:ObjectId(“4ebcd5f0ed7c5031a103ba68”)}})'我不知道为什么我没有抓住第一个我试过了。 '解码失败。可能无效utf-8 string' - 一堆垃圾 - '为什么:TypeError:UTF-8字符0x1234567太大 src/mongo/shell/utils.js:1018' –