在mongodb中为1亿条记录添加新字段

向超过1亿个mongodb文档添加新字段的最快和最安全的策略是什么？在mongodb中为1亿条记录添加新字段

背景

在3节点复制品使用的mongodb 3.0设置
我们当前文档中添加了基于数据在另一个字段（post_time）一个新的字段（post_hour）。 post_hour字段是post_time到小时的截断版本。

2016-06-23 Matt Saddington

我遇到过类似的情况，并且创建了一个脚本来更新大约2500万个文档，并且需要大量时间来更新所有文档。为了提高性能，我逐一将更新后的文档插入到一个新的集合中，并将其重新命名为新集合。这种方法有助于我插入文档而不是更新文档（“插入”操作比“更新”操作快）。

下面是示例脚本（我没有测试过）：

/*This method returns postHour*/ 
function convertPostTimeToPostHour(postTime){ 
} 

var totalCount = db.person.count(); 
var chunkSize = 1000; 
var chunkCount = totalCount/chunkSize; 
offset = 0; 
for(index = 0; index<chunkCount; index++){ 
    personList = db.persons.find().skip(offset).limit(chunkSize); 
    personList.forEach(function (person) { 
     newPerson = person; 
     newPerson.post_hour = convertPostTimeToPostHour(person.post_time); 
     db.personsNew.insert(newPerson); // This will insert the record in a new collection 
    }); 
    offset += chunkSize; 
}

当上面写的脚本将得到执行，新的集合“personNew”将有更新的记录与现场的价值“post_hour '集合。

如果现有集合具有任何索引，则需要在新集合中重新创建它们。

一旦创建了索引，就可以将集合'person'的名称重命名为'personOld'和'personNew'以'person'。

来源

2016-06-23 09:23:13 Manish

我猜测在单独的shell中执行每个块都可以提高其速度性能。 –

-1

的snapshot将允许以防止查询结果的重复（如我们正在扩大规模） - 如果有任何问题发生，可以删除。

请在下面找到其中“A1”是集名称蒙戈shell脚本：

var documentLimit = 1000; 

var docCount = db.a1.find({ 
     post_hour : { 
      $exists : false 
     } 
    }).count(); 

var chunks = docCount/documentLimit; 

for (var i = 0; i <= chunks; i++) { 
    db.a1.find({ 
     post_hour : { 
      $exists : false 
     } 
    }).snapshot() 
     .limit(documentLimit) 
     .forEach(function (doc) { 
     doc.post_hour = 12; // put your transformation here 
     // db.a1.save(doc); // uncomment this line to save data 
          // you can also specify write concern here 
     printjson(doc);  // comment this line to avoid polution of shell output 
          // this is just for test purposes  
    }); 
}

可以用参数玩，但大部分是在1000个记录块执行，看起来最佳。

来源

2016-06-23 08:18:56 profesor79

在mongodb中为1亿条记录添加新字段

回答

相关问题