0
我有收集与下面的数据(集合包含超过10万条记录)MongoDB的重复计数问题
> db.LogBuff.find()
{ "_id" : ObjectId("578899d5d2b76f77d083f16c"), "SUBJECT" : "DD", "SYS" : "A" }
{ "_id" : ObjectId("578899d5d2b76f77d083f16d"), "SUBJECT" : "AA", "SYS" : "B" }
{ "_id" : ObjectId("578899d5d2b76f77d083f16e"), "SUBJECT" : "BB", "SYS" : "A" }
{ "_id" : ObjectId("578899d5d2b76f77d083f16f"), "SUBJECT" : "AA", "SYS" : "C" }
{ "_id" : ObjectId("578899d5d2b76f77d083f170"), "SUBJECT" : "BB", "SYS" : "A" }
{ "_id" : ObjectId("578899d5d2b76f77d083f171"), "SUBJECT" : "BB", "SYS" : "A" }
{ "_id" : ObjectId("578899d5d2b76f77d083f172"), "SUBJECT" : "CC", "SYS" : "B" }
{ "_id" : ObjectId("578899d5d2b76f77d083f173"), "SUBJECT" : "AA", "SYS" : "A" }
{ "_id" : ObjectId("578899d5d2b76f77d083f174"), "SUBJECT" : "CC", "SYS" : "A" }
{ "_id" : ObjectId("578899d5d2b76f77d083f175"), "SUBJECT" : "DD", "SYS" : "A" }
{ "_id" : ObjectId("578899d5d2b76f77d083f176"), "SUBJECT" : "AA", "SYS" : "A" }
{ "_id" : ObjectId("578899d5d2b76f77d083f177"), "SUBJECT" : "BB", "SYS" : "C" }
{ "_id" : ObjectId("578899d5d2b76f77d083f178"), "SUBJECT" : "CC", "SYS" : "D" }
{ "_id" : ObjectId("578899d5d2b76f77d083f179"), "SUBJECT" : "DD", "SYS" : "A" }
{ "_id" : ObjectId("578899d5d2b76f77d083f17a"), "SUBJECT" : "AA", "SYS" : "B" }
{ "_id" : ObjectId("578899d5d2b76f77d083f17b"), "SUBJECT" : "BB", "SYS" : "B" }
{ "_id" : ObjectId("578899d5d2b76f77d083f17c"), "SUBJECT" : "AA", "SYS" : "A" }
{ "_id" : ObjectId("578899d5d2b76f77d083f17d"), "SUBJECT" : "CC", "SYS" : "C" }
我希望得到以下类型的输出
{ "_id" : { "SUBJECT" : "CC", "SYS" : "C" }, "COUNT" : 1 }
{ "_id" : { "SUBJECT" : "DD", "SYS" : "A" }, "COUNT" : 3 }
{ "_id" : { "SUBJECT" : "AA", "SYS" : "B" }, "COUNT" : 2 }
{ "_id" : { "SUBJECT" : "AA", "SYS" : "C" }, "COUNT" : 1 }
{ "_id" : { "SUBJECT" : "CC", "SYS" : "B" }, "COUNT" : 1 }
{ "_id" : { "SUBJECT" : "BB", "SYS" : "A" }, "COUNT" : 3 }
{ "_id" : { "SUBJECT" : "BB", "SYS" : "C" }, "COUNT" : 1 }
{ "_id" : { "SUBJECT" : "AA", "SYS" : "A" }, "COUNT" : 3 }
{ "_id" : { "SUBJECT" : "CC", "SYS" : "A" }, "COUNT" : 1 }
{ "_id" : { "SUBJECT" : "CC", "SYS" : "D" }, "COUNT" : 1 }
{ "_id" : { "SUBJECT" : "BB", "SYS" : "B" }, "COUNT" : 1 }
这是我的代码
db.LogBuff.mapReduce(
function(){
emit({ SUBJECT : this.SUBJECT, SYS : this.SYS } , this.SYS);
},
function(key,values){
return $count:1 <-stuck here
}
)
由于一些限制,我无法使用聚合方法。我用下面的聚合代码:
db.LogBuff.aggregate([ {"$group" : {_id:{SUBJECT:"$SUBJECT",SYS:"$SYS"},COUNT:{$sum:1}}}, {$sort:{_id:1}},])
尽管这适用于记录数量有限,对大量返回这个错误(注意 - 我不是root用户,所以我不能改变的配置) :
断言:命令失败:{“OK”:0,“ERRMSG”:“有点超出内存104857600个字节限制,但并没有对外部排序选择中止操作传递allowDiskUse:。真实的选择in“,”code“:16819}:
聚合失败[email protected]/mongo/shell/utils.js:25:13
使用聚合框架试过吗?或者你只能使用MapReduce? –
我使用了聚合,但它工作的记录数量有限,大量返回跟随memomry错误(我不是root用户,因此我无法更改配置) – Kavinda
断言:命令失败:{ “ok”:0 , “errmsg”:“排序超过104857600字节的内存限制,但未选择进行外部排序。中止操作。通过allowDiskUse:true以选择加入。”, “code”:16819 }:聚合失败 _getErrorWithCode @ src/mongo/shell/utils.js:25:13 [email protected]/mongo/shell/assert.js:13:14 – Kavinda