我有2个蒙戈集合:MongoDB的查询连接两个集合
公司:每个记录是许多领域(城市,国家等)的公司 - >100k rows
{company_id:1, country:"USA", city:"New York",...}
{company_id:2, country:"Spain", city:"Valencia",... }
{company_id:3, country:"France", city:"Paris",... }
分数 :有日期块,每个块都有一个company_id +分数,例如 - >100k rows in each block
{date: 2016-05-29, company_id:1, score:90}
{date: 2016-05-29, company_id:2, score:87}
{date: 2016-05-29, company_id:3, score:75}
...
{date: 2016-05-22, company_id:1, score:88}
{date: 2016-05-22, company_id:2, score:87}
{date: 2016-05-22, company_id:3, score:76}
...
{date: 2016-05-15, company_id:1, score:91}
{date: 2016-05-15, company_id:2, score:82}
{date: 2016-05-15, company_id:3, score:73}
...
目的:
我希望检索可以通过一些字段进行过滤的公司名单(国家,城市,...)+其最新得分(以2016年5月29日),ordered by score descending
即:过滤器一个收集,过滤器+订单其他集合中
注:有一个关于scores.date
的指数,我们可以很容易地找到/预先计算和快速的最高日(2016年5月29日在本例)
尝试:
我一直在尝试aggregate
查询使用$lookup
。当过滤器完成(并且公司数量很少)时,查询速度更快。
查询是如下: -
db.companies.aggregate([
{$match: {"status": "running", "country": "USA", "city": "San Francisco",
"categories": { $in: ["Software"]}, dummy: false}},
{$lookup: {from: "scores", localField: "company_id", foreignField: "company_id", as:"scores"}},
{$unwind: "$scores"},
{$project: {_id: "$_id",
"company_id": "$company_id",
"company_name": "$company_name",
"status": "$status",
"city": "$city",
"country": "$country",
"categories": "$categories",
"dummy": "$dummy",
"score": "$scores.score",
"date": "$scores.date"}},
{$match: {"date" : ISODate("2016-05-29T00:00:00Z")}},
{$sort: {"score":-1}}
],{allowDiskUse: true})
但是,当过滤器是小的或者空的(更多的公司),该$sort
部分需要几秒钟。
db.companies.aggregate([
{$match: {"status": "running"}},
{$lookup: {from: "scores", localField: "company_id", foreignField: "company_id", as:"scores"}},
{$unwind: "$scores"},
{$project: {_id: "$_id",
"company_id": "$company_id",
"company_name": "$company_name",
"status": "$status",
"city": "$city",
"country": "$country",
"categories": "$categories",
"dummy": "$dummy",
"score": "$scores.score",
"date": "$scores.date"}},
{$match: {"date" : ISODate("2016-05-29T00:00:00Z")}},
{$sort: {"score":-1}}
],{allowDiskUse: true})
可能是因为过滤器找到的公司数量。 59行是更容易的顺序89K
> db.companies.count({"status": "running", "country": "USA", "city": "San Francisco", "categories": { $in: ["Software"]}, dummy: false})
59
> db.companies.count({"status": "running"})
89043
我已经尝试了不同的方法,通过分数,按日期过滤器集合,按分数排序(索引日期+得分是非常有用的在这里),并且一切都非常快,直到最后$match
时筛选公司属性
db.scores.aggregate([
{$match:{"date" : ISODate("2016-05-29T00:00:00Z")}},
{$sort:{"score":-1}},
{$lookup:{from: "companies", localField: "company_id", foreignField: "company_id", as:"companies"}},
{$unwind:"$companies"},
{$project: {_id: "$companies._id",
"company_id": "$companies.company_id",
"company_name": "$companies.company_name",
"status": "$companies.status",
"city": "$companies.city",
"country": "$companies.country",
"categories": "$companies.categories",
"dummy": "$companies.dummy"}},
"score": "$score",
"date": "$date"
{$match:{"status": "running", "country":"USA", "city": "San Francisco",
"categories": { $in: ["Software"]}, dummy: false}}
],{allowDiskUse: true})
使用这种方法,大量的过滤器(前面的例子)是非常缓慢的,而小的过滤器(只{"status": "running"}
)更快
任何方式加入两个科尔ections,在他们两个中过滤并按一个字段排序?
加入案例https://www.mongodb.com/blog/post/joins-and-other-aggregation-enhancements-coming-in-mongodb-3-2-part-1-of-3-introduction – Leo