2016-11-25 88 views
0

我的目标是让我的Map-Reduce作业始终在我的MongoDB集群的分片的辅助节点上运行。在分片群集上运行MapReduce时,MongoDB会忽略readPreference?

我将readPreference设置为secondary,将out参数的MapReduce命令设置为inline以实现此目的。这在非分片副本集合上工作正常:作业在辅助副本上运行。但是,在分片群集上,此作业在Primary上运行。

有人可以解释为什么发生这种情况或指向任何相关的文档?我在relevant documentation中找不到任何东西。从二次

public static final String mapfunction = "function() { emit(this.custid, this.txnval); }"; 
public static final String reducefunction = "function(key, values) { return Array.sum(values); }"; 
... 
private void mapReduce() { 
... 
MapReduceIterable<Document> iterable = collection.mapReduce(mapfunction, reducefunction); 
... 
} 
... 
Builder options = MongoClientOptions.builder().readPreference(ReadPreference.secondary()); 
MongoClientURI uri = new MongoClientURI(MONGO_END_POINT, options); 
MongoClient client = new MongoClient(uri); 
... 

日志时,这是一个上副本集执行:

2016-11-23T15:05:26.735 + 0000我COMMAND [conn671]命令test.txns命令:MapReduce的映射精简{: “txns”,map:function(){emit(this.custid,this.txnval); },reduce:function(key,values){return Array.sum(values); },out:{inline:1},query:null,sort:null,finalize:null,scope:null,verbose:true} planSummary:COUNT keyUpdates:0 writeConflicts:0 numYields:7 reslen:4331 locks:全球:{acquireCount:{r:44}},数据库:{acquireCount:{r:3,R:19}},集合:{acquireCount:{r:3}}}协议:op_query 124ms

Sharded collection :从碎片-0初级

mongos> db.txns.getShardDistribution() 

Shard Shard-0 at Shard-0/primary.shard0.example.com:27017,secondary.shard0.example.com:27017 
data : 498KiB docs : 9474 chunks : 3 
estimated data per chunk : 166KiB 
estimated docs per chunk : 3158 

Shard Shard-1 at Shard-1/primary.shard1.example.com:27017,secondary.shard1.example.com:27017 
data : 80KiB docs : 1526 chunks : 3 
estimated data per chunk : 26KiB 
estimated docs per chunk : 508 

Totals 
data : 579KiB docs : 11000 chunks : 6 
Shard Shard-0 contains 86.12% data, 86.12% docs in cluster, avg obj size on shard : 53B 
Shard Shard-1 contains 13.87% data, 13.87% docs in cluster, avg obj size on shard : 53B 

日志:

2016-11-24T08:46:30.828 + 0000我COMMAND [conn357]命令测试$ cmd命令:mapreduce.shardedfinish {mapred uce.shardedfinish:{mapreduce:“txns”,map:function(){emit(this.custid,this.txnval); },reduce:function(key,values){return Array.sum(values); },out:{in line:1},query:null,sort:null,finalize:null,scope:null,verbose:true,$ queryOptions:{$ readPreference:{mode:“secondary”}}},inputDB:“test”,shardedOutputCollection:“tmp.mrs.txns_1479977190_0”,shards:{Shard-0/primary.shard0.example.com:27017,secondary.shard0.example.com:27017:{result :“tmp.mrs.txns_1479977190_0”,timeMillis:123,timing:{mapTime:51,emitLoop:116,reduceTime:9,mode:“mixed”,total:123},counts:{input:9474,emit:9474, reduce:909,output:101},ok:1.0,$ gleS tats:{lastOpTime:Timestamp 1479977190000 | 103,electionId:ObjectId('7fffffff0000000000000001')}},Shard-1/primary.shard1.example.com:27017 ,secondary.shard1.example.com:27017:{result:“tmp.mrs.txns_1479977190_0”,timeMil lis:71,时间: {mapTime:8,emitLoop:63,reduceTime:4,mode:“mixed”,total:71},counts:{input:1526,emit:1526,reduce:197,output:101} ,ok:1.0,$ gleStats:{lastOpTime:Timestamp 1479977190000 | 103,electionId:ObjectId('7fffffff0000000000000001')}}},shardCounts:{Sha rd-0/primary.shard0.example.com:27017,secondary.shard0 .example.com:27017:{input:9474,emit:9474,reduce:909,output:101},Shard-1/primary.shard1.example.com:27017,secondary.shard1.example.com:27017:{ inpu t:1526,emit:1526,reduce:197,output:101}},counts:{emit:11000,input:11000,output:202,reduce:1106}} keyUpdates:0 writeConflicts:0 numYields:0 reslen :4368锁:{全局:{acquireCount:{r:2}},数据库:{acquireCount:{r:1}},集合:{acqu ireCount:{r:1}}}协议:op_command 115ms 2016- 11-24T08:46 :30.830 + 0000 I COMMAND [conn46] CMD:drop test.tmp.mrs。txns_1479977190_0

有关预期行为的任何指针都会非常有用。谢谢。

回答

1

因为我没有得到回应,在这里,我提交的MongoDB的一个JIRA的bug,并发现,截至目前,这是不可能运行的map-reduce分片的MongoDB集群对二次就业。这里是the bug report

+0

写了一篇关于这个原因的博客文章,这对于希望在MongoDB上挖掘其MR的人来说是一个重要的限制: https://scalegrid.io/blog/mongodb-performance-running-mongodb-map-reduce-操作上,次级/ – Vaibhaw

相关问题