2015-12-14 137 views
1

我有一个Mongo查询,我想以与SQL中的GROUP BY相同的方式有效地使用$组。Mongo聚合组由多个值

这不适用于我,除非我将新文档的_id设置为对我不适用的组类别之一,而且我也无法获得我想要的值,它来自潜在的三我在Mongo合并在一起的文件。

在SQL中,我会写东西想说明的分组和选择,我用我的聚集在蒙戈的基础:

SELECT entity_id, connection_id, cycle_id, objectOriginAPI,accountBalance 
FROM raw_originBusinessData 
WHERE objectStatus = 'UPROCESSED' 
AND (objectOriginAPI = 'Profit & Loss' 
OR objectOriginAPI = 'Balance Sheet' 
OR objectOriginAPI = 'Bank Summary') 
GROUP BY entity_id, connection_id, cycle_id; 

我已经改写简化什么我蒙戈的脚本用做嵌入式阵列。

db.getCollection('raw_originBusinessData').aggregate([ 
{ "$match": { 
    objectStatus : "UNPROCESSED" 
    , $or: [ 
    { objectOriginAPI : "Profit & Loss"} 
    ,{objectOriginAPI : "Balance Sheet"} 
    ,{objectOriginAPI : "Bank Summary"} 
    ]} 
}, 
     // don't worry about this, this is all good 
{ "$unwind": "$objectRawOriginData.Reports" } 
,{ "$unwind": "$objectRawOriginData.Reports.Rows" } 
,{ "$unwind": "$objectRawOriginData.Reports.Rows.Rows" }, 

     // this is where I believe I'm having my problem 
{ "$group": {"_id": "$entity_id" 
     // , "$connection_id" 
     // , "objectCycleID" 
, "accountBalances": { "$push": "$objectRawOriginData.Reports.Rows.Rows.Cells.Value" } 
}}, 
{$project: {objectClass: {$literal: "Source Data"} 
, objectCategory: {$literal: "Application"} 
, objectType: {$literal: "Account Balances"} 
, objectOrigin: {$literal: "Xero"} 
, entity_ID: "$_id" 
, connection_ID: "$connection_ID" 
, accountBalances: "$accountBalances"} 
} 
] 
     // ,{$out: "std_sourceBusinessData"} 
) 

因此,每个我合并成一个单一的文件的文件具有相同的ENTITY_ID,CONNECTION_ID和cycle_id我要投入到新文档。我也想确保新文档具有自己独特的object_id。

非常感谢您的帮助 - Mongo文档不包含除$以外的任何$ group组件,但是如果我没有将_id设置为我想分组的东西(在上面的脚本中它是设置为entity_id)它没有正确分组。

回答

1

简而言之,_id需要是一个“复合”的值,因此包括三个“子密钥”:

{ "$group":{ 
    "_id": { 
     "entity_id": "$entity_id" 
     "connection_id": "$connection_id", 
     "objectCycleID": "$objectCycleID" 
    }, 
    "accountBalances": { 
     "$push": "$objectRawOriginData.Reports.Rows.Rows.Cells.Value" 
    } 
}}, 
{ "$project": { 
    "_id": 0, 
    "objectClass": { "$literal": "Source Data" }, 
    "objectCategory": { "$literal": "Application"}, 
    "objectType": { "$literal": "Account Balances"}, 
    "objectOrigin": { "$literal": "Xero"}, 
    "entity_ID": "$_id.entity_id", 
    "connection_ID": "$_id.connection_id", 
    "accountBalances": "$accountBalances" 
}} 

然后当然,referncing任何这些值中的后面的$project要求您现在使用前缀$_id,因为这是现在的父密钥。

正如任何MongoDB文档一样,_id可以是代表有效的BSON对象的任何东西。所以在这种情况下,组合意味着“所有这些字段值上的组”

+0

这太棒了,非常有意义 - 它的工作,你是一个明星! –