2015-04-16 99 views
2

我每周收到一次服务的数据,并将其放入一个集合中。数据有一个金额,projectNo和dataDate时间戳。使用聚合框架我通过projectNo和dataDate总结量和组:Mongodb聚合框架:月份中最大日期的总和值

db.collection.aggregate([ 
    {$project: {projectNo: 1, bdgtAppd: 1, dataDate: 1}}, 
    {$group: {_id: { 
       projectNo: "$projectNo", 
       dataDate: "$dataDate" 
       }, 
      amount: {$sum: "$bdgtAppd"}} 
    }, 
    {$project: {_id:0, 
       projectNo:"$_id.projectNo", 
       dataDate:"$_id.dataDate", 
       amount:"$amount" 
       } 
    }, 
    {$sort: {projectNo:1,dataDate:1}} 
]) 

其中产量如下:

[{ 
    "amount" : 7887, 
    "projectNo" : "5544A", 
    "dataDate" : "2015-01-02T08:00:00.000Z" 
}, { 
    "amount" : 137947, 
    "projectNo" : "5544A", 
    "dataDate" : "2015-01-16T08:00:00.000Z" 
}, { 
    "amount" : 137947, 
    "projectNo" : "5544A", 
    "dataDate" : "2015-01-23T08:00:00.000Z" 
}, { 
    "amount" : 137947, 
    "projectNo" : "5544A", 
    "dataDate" : "2015-01-30T08:00:00.000Z" 
}, { 
    "amount" : 130060, 
    "projectNo" : "5544A", 
    "dataDate" : "2015-02-06T08:00:00.000Z" 
}, { 
    "amount" : 130060, 
    "projectNo" : "5544A", 
    "dataDate" : "2015-02-13T08:00:00.000Z" 
}, { 
    "amount" : 130060, 
    "projectNo" : "5544A", 
    "dataDate" : "2015-02-20T08:00:00.000Z" 
}] 

我现在需要做的是限制返回的数据只是最后日期每月:

[{ 
    "amount" : 137947, 
    "projectNo" : "5544A", 
    "dataDate" : "2015-01-30T08:00:00.000Z" 
}, { 
    "amount" : 130060, 
    "projectNo" : "5544A", 
    "dataDate" : "2015-02-27T08:00:00.000Z" 
}] 

编辑:

{ 
    "_id" : ObjectId("5527e724fc53ec16bc5fe57a"), 
    "projectNo" : "5544G", 
    "cpfoNo" : "1448R", 
    "cpfoDate" : ISODate("2014-10-20T07:00:00Z"), 
    "description" : "INC 6 CO 176 - Booster Pump", 
    "pcoNo" : "1510", 
    "approvedAmount" : null, 
    "days" : null, 
    "remarks" : null, 
    "itemNo" : "0005", 
    "costCode" : "5030.09900.0000.0000", 
    "itemTitle" : "Painting - Hasson", 
    "bdgtEst" : 0.0, 
    "bdgtProp" : 745.0, 
    "bdgtAprv" : 745.0, 
    "bdgtAppd" : 745.0, 
    "dataDate" : ISODate("2014-12-12T08:00:00Z") 
} 
:从收集样品文件
+0

你能告诉我们你的文件吗? – styvane

+0

@Michael你有没有机会看看这个? – Splitty

回答

1

感谢@chidrian让我开始。这是适合我的解决方案。可能是预测月份和年份键的额外步骤,但它可行。

{ 
    "$group": { 
     "_id": { 
      "projectNo": "$projectNo", 
      "dataDate": "$dataDate" 
     }, 
     "sum": "$bdgtAppd" 
    } 
} 
}, { 
    "$project": { 
     "_id": 0, 
     "projectNo": "$_id.projectNo", 
     "dataDate": "$_id.dataDate", 
     "amount": 1 
    } 
}, { 
    "$project": { 
     "_id": 0, 
     "projectNo": "$projectNo", 
     "amount": 1, 
     "dataDate": 1, 
     "month": { 
      $month: "$dataDate" 
     }, 
     "year": { 
      "$year": "$dataDate" 
     } 
    } 
}, { 
    "$sort": { 
     projectNo: 1, 
     dataDate: 1 
    } 
}, { 
    "$group": { 
     "_id": { 
      "projectNo": "$projectNo", 
      "month": "$month", 
      "year": "$year" 
     }, 
     "dataDate": { 
      "$last": "$dataDate" 
     }, 
     "amount": { 
      "$last": "$amount" 
     } 
    } 
}, { 
    "$sort": { 
     projectNo: 1, 
     dataDate: 1 
    } 
}, { 
    "$project": { 
     "_id": 0, 
     "projectNo": "$_id.projectNo", 
     "dataDate": 1, 
     "amount": 1 
    } 
} 
1

没有必要用于初始$project流水线级,简单地用$group步骤开始和下面的流水线阶段将产生所期望的结果:

db.collection.aggregate([ 
    { 
     "$group": { 
      "_id": { 
       "projectNo": "$projectNo", 
       "dataDate": "$dataDate" 
      }, 
      "amount": {"$sum": "$bdgtAppd"}    
     }  
    }, 
    { 
     "$project": { 
      "_id": 0,    
      "projectNo": "$_id.projectNo", 
      "dataDate": "$_id.dataDate", 
      "amount": 1 
     } 
    }, 
    { 
     "$group": { 
      "_id": "$projectNo",       
      "dataDate": {"$first" : "$dataDate"}, 
      "amount": {"$first" : "$amount"}   
     } 
    }, 
    { 
     "$project": { 
      "_id": 0,    
      "projectNo": "$_id", 
      "dataDate": 1, 
      "amount": 1 
     } 
    } 
]); 

用以下示例文档(包括在相关领域仅作为最小测试用例):

db.collection.insert([ 
    /* 0 */ 
    { 
     "projectNo" : "5544A", 
     "bdgtAppd" : 3, 
     "dataDate" : ISODate("2015-01-02T08:00:00.000Z") 
    }, 

    /* 1 */ 
    { 
     "projectNo" : "5544A", 
     "bdgtAppd" : 7, 
     "dataDate" : ISODate("2015-01-28T08:00:00.000Z") 
    }, 

    /* 2 */ 
    { 
     "projectNo" : "5544A", 
     "bdgtAppd" : 5, 
     "dataDate" : ISODate("2015-01-28T08:00:00.000Z") 
    }, 

    /* 3 */ 
    { 
     "projectNo" : "5544B", 
     "bdgtAppd" : 15, 
     "dataDate" : ISODate("2015-02-13T08:00:00.000Z") 
    }, 

    /* 4 */ 
    { 
     "projectNo" : "5544G", 
     "bdgtAppd" : 10, 
     "dataDate" : ISODate("2015-02-27T08:00:00.000Z") 
    }, 

    /* 5 */ 
    { 
     "projectNo" : "5544G", 
     "bdgtAppd" : 25, 
     "dataDate" : ISODate("2015-02-27T08:00:00.000Z") 
    }, 
]); 

上述聚合生产:

/* 0 */ 
{ 
    "result" : [ 
     { 
      "dataDate" : ISODate("2015-01-28T08:00:00.000Z"), 
      "amount" : 12, 
      "projectNo" : "5544A" 
     }, 
     { 
      "dataDate" : ISODate("2015-02-13T08:00:00.000Z"), 
      "amount" : 15, 
      "projectNo" : "5544B" 
     }, 
     { 
      "dataDate" : ISODate("2015-02-27T08:00:00.000Z"), 
      "amount" : 35, 
      "projectNo" : "5544G" 
     } 
    ], 
    "ok" : 1 
} 
+0

感谢您的提示,我想限制的关键只有我所关心的会加快一点点,但我忘了$组同样的事情。 这个聚合让我非常接近,有几种情况下它是正确的,而其他的则选择了本月的第一个dataDate。我觉得有一种方法可以一次完成,但我担心我必须先获取日期,然后将日期传递给$匹配。 – Splitty

+0

@Splitty别担心。我尝试了几个测试文档,并且不能提供较少的流水线阶段查询,只要我能够满足您提供的要求即可。亮点是你如何做你的初始分组;在上面你想在项目日期上分组。 – chridam

+0

如果可能,您能否提供最低测试用例(样本文档和样本集合中的预期结果),或许我们可以找到另一种方法? – chridam