2016-07-15 31 views
0

我有收集与下面的数据(集合包含超过10万条记录)MongoDB的重复计数问题

> db.LogBuff.find() 
{ "_id" : ObjectId("578899d5d2b76f77d083f16c"), "SUBJECT" : "DD", "SYS" : "A" } 
{ "_id" : ObjectId("578899d5d2b76f77d083f16d"), "SUBJECT" : "AA", "SYS" : "B" } 
{ "_id" : ObjectId("578899d5d2b76f77d083f16e"), "SUBJECT" : "BB", "SYS" : "A" } 
{ "_id" : ObjectId("578899d5d2b76f77d083f16f"), "SUBJECT" : "AA", "SYS" : "C" } 
{ "_id" : ObjectId("578899d5d2b76f77d083f170"), "SUBJECT" : "BB", "SYS" : "A" } 
{ "_id" : ObjectId("578899d5d2b76f77d083f171"), "SUBJECT" : "BB", "SYS" : "A" } 
{ "_id" : ObjectId("578899d5d2b76f77d083f172"), "SUBJECT" : "CC", "SYS" : "B" } 
{ "_id" : ObjectId("578899d5d2b76f77d083f173"), "SUBJECT" : "AA", "SYS" : "A" } 
{ "_id" : ObjectId("578899d5d2b76f77d083f174"), "SUBJECT" : "CC", "SYS" : "A" } 
{ "_id" : ObjectId("578899d5d2b76f77d083f175"), "SUBJECT" : "DD", "SYS" : "A" } 
{ "_id" : ObjectId("578899d5d2b76f77d083f176"), "SUBJECT" : "AA", "SYS" : "A" } 
{ "_id" : ObjectId("578899d5d2b76f77d083f177"), "SUBJECT" : "BB", "SYS" : "C" } 
{ "_id" : ObjectId("578899d5d2b76f77d083f178"), "SUBJECT" : "CC", "SYS" : "D" } 
{ "_id" : ObjectId("578899d5d2b76f77d083f179"), "SUBJECT" : "DD", "SYS" : "A" } 
{ "_id" : ObjectId("578899d5d2b76f77d083f17a"), "SUBJECT" : "AA", "SYS" : "B" } 
{ "_id" : ObjectId("578899d5d2b76f77d083f17b"), "SUBJECT" : "BB", "SYS" : "B" } 
{ "_id" : ObjectId("578899d5d2b76f77d083f17c"), "SUBJECT" : "AA", "SYS" : "A" } 
{ "_id" : ObjectId("578899d5d2b76f77d083f17d"), "SUBJECT" : "CC", "SYS" : "C" } 

我希望得到以下类型的输出

{ "_id" : { "SUBJECT" : "CC", "SYS" : "C" }, "COUNT" : 1 } 
{ "_id" : { "SUBJECT" : "DD", "SYS" : "A" }, "COUNT" : 3 } 
{ "_id" : { "SUBJECT" : "AA", "SYS" : "B" }, "COUNT" : 2 } 
{ "_id" : { "SUBJECT" : "AA", "SYS" : "C" }, "COUNT" : 1 } 
{ "_id" : { "SUBJECT" : "CC", "SYS" : "B" }, "COUNT" : 1 } 
{ "_id" : { "SUBJECT" : "BB", "SYS" : "A" }, "COUNT" : 3 } 
{ "_id" : { "SUBJECT" : "BB", "SYS" : "C" }, "COUNT" : 1 } 
{ "_id" : { "SUBJECT" : "AA", "SYS" : "A" }, "COUNT" : 3 } 
{ "_id" : { "SUBJECT" : "CC", "SYS" : "A" }, "COUNT" : 1 } 
{ "_id" : { "SUBJECT" : "CC", "SYS" : "D" }, "COUNT" : 1 } 
{ "_id" : { "SUBJECT" : "BB", "SYS" : "B" }, "COUNT" : 1 } 

这是我的代码

db.LogBuff.mapReduce(  
    function(){   
     emit({ SUBJECT : this.SUBJECT, SYS : this.SYS } , this.SYS);  
    },  
    function(key,values){   
     return $count:1 <-stuck here 
    } 
) 

由于一些限制,我无法使用聚合方法。我用下面的聚合代码:

db.LogBuff.aggregate([ {"$group" : {_id:{SUBJECT:"$SUBJECT",SYS:"$SYS"},COUNT:{$sum:1}}}, {$sort:{_id:1}},]) 

尽管这适用于记录数量有限,对大量返回这个错误(注意 - 我不是root用户,所以我不能改变的配置) :

断言:命令失败:{“OK”:0,“ERRMSG”:“有点超出内存104857600个字节限制,但并没有对外部排序选择中止操作传递allowDiskUse:。真实的选择in“,”code“:16819}:
聚合失败[email protected]/mongo/shell/utils.js:25:13

+0

使用聚合框架试过吗?或者你只能使用MapReduce? –

+0

我使用了聚合,但它工作的记录数量有限,大量返回跟随memomry错误(我不是root用户,因此我无法更改配置) – Kavinda

+0

断言:命令失败:{ “ok”:0 , “errmsg”:“排序超过104857600字节的内存限制,但未选择进行外部排序。中止操作。通过allowDiskUse:true以选择加入。”, “code”:16819 }:聚合失败 _getErrorWithCode @ src/mongo/shell/utils.js:25:13 [email protected]/mongo/shell/assert.js:13:14 – Kavinda

回答

1

尝试使用allowDiskUse选项:

db.LogBuff.aggregate([ {"$group" : {_id:{SUBJECT:"$SUBJECT",SYS:"$SYS"},COUNT:{$sum:1}}}, {$sort:{_id:1}}], {allowDiskUse: true})
+0

谢谢,工作正常 – Kavinda