mongodb：返回子文档并跟踪父代

我收集了一些推文，我试图将根级别的转推（类似于引用推文）输出到新集合，以便稍后将它们与使用转储的原始集合合并，恢复）。转推状态是tweet文档中的一个子文档，可能有多个推文转发同一推文。如何在根级上进行转推，并添加一个名为'retweeted_by'的数组，其中包含所有推特转发的ID？mongodb：返回子文档并跟踪父代

请记住，我将推文ID用作主索引（_id），以避免在组合（mongorestore）集合时创建重复项。

我收藏有以下形式：

{ 
    "_id" : "123456", 
    "other_fields1" : "values1", 
    "retweeted_status" : { 
          "retweet_id": "159753", 
          "other_fields2" : "values2", 
          } 
}

理想的产量预计将看起来像：

{ 
    "_id" : "159753", 
    "other_fields2" : "values2",  
    "retweeted_by" : [ "123456", "974631", "121212"] 
}

编辑澄清：

子文档中的字段（other_fields2 ）是多个字段（〜28），并非全部存在于其他推文中

来源

2017-08-07 Ali Abul Hawa

'db.collection.aggregate（[{$组：{_id： “$ retweeted_status.retweet_id”，retweeted_by ：{$ push：“$ _id”}}}]）' – felix

@felix谢谢，但这只输出retweeted_status的id，而不是retweedted_status的整个子文档，在我的示例“other_fields2”中调用...我想在我需要使用$ replaceRoot将子文档作为newRoot，并以某种方式向其添加数组retweted_by –

add'other_fields2：{$ first：“$ retweeted_status.other_fields2”}'。请看[mongodb documentation $ group]（https://docs.mongodb.com/manual/reference/operator/aggregation/group/） – felix

OK ..所以我终于达成了解决我的问题..我不知道这是否是这样做，虽然最好的办法：

db.tweets.aggregate([ 
{ 
    $match: { retweeted_status: {$exists: true}} 
}, 
{ 
    $addFields: { 'retweeted_status.retweeted_by' : '$_id', 'retweeted_status._id' : '$retweeted_status.id_str'} 
}, 
{ 
    $replaceRoot: { newRoot: '$retweeted_status'} 
}, 
{ 
    $group: { _id: '$_id', doc: { '$first': '$$ROOT' }, retweeted_by: {$addToSet: '$retweeted_by'}} 
}, 
{ 
    $addFields: { 'doc.retweeted_by' : '$retweeted_by'} 
}, 
{ 
    $replaceRoot: { newRoot: '$doc'} 
}, 
{ 
    $project: { id: 0 , id_str: 0 } 
}, 
{ 
    $out: 'retweets' 
} 
], {allowDiskUse: true})

开始时，每个文件（鸣叫）的形式为：

{父，子文档{}}

首先匹配一个retweeted_status（子文档）的存在，然后通过retweeted_status ID分组之前，我添加一个字段与父鸣叫的id：

{父，{子文档，PARENT_ID}}

然后替换根与修改后的子文档：

{子文档，PARENT_ID}

然后，我通过新的根的_id分组，拿到了该组的第一份文件，并添加了一个新的累加器组（retweeted_by）。（未$推因为Twitter API有时发送一式两份）

到目前为止，根文档包含_id，嵌入在字段“文档”内的转推文件，以及包含父母的数组：

{文档{子文档，PARENT_ID}，[parent_ids]}

接着，我添加了父母阵列内的文档的字段，（覆盖先前添加retweeted_by字段）：

{文档{子文档，[parent_ids]} ，[parent_ids]}

然后用新文档替换父（root）。然后排除包含相同数量的作为_id字段：

{子文档，[parent_ids]}

来源

2017-08-08 12:09:17

mongodb：返回子文档并跟踪父代

回答

相关问题