MongoDB查找子文档并对结果进行排序

我在MongoDB中拥有一个具有复杂结构和子文档的集合。文档有一个结构是这样的：MongoDB查找子文档并对结果进行排序

doc1 = { 
    '_id': '12345678', 
    'url': "http//myurl/...", 
    'nlp':{ 
     "status": "OK", 
     "entities": { 
      "0": { 
       "type" : "Person", 
       "relevance": "0.877245", 
       "text" : "Neelie Kroes" 
      }, 
      "1": { 
       "type": "Company", 
       "relevance": "0.36242", 
       "text": "ICANN" 
      }, 
      "2": { 
       "type": "Company", 
       "relevance": "0.265175", 
       "text": "IANA" 
      } 
     } 
    } 
} 


doc2 = { 
    '_id': '987456321', 
    'url': "http//myurl2/...", 
    'nlp':{ 
     "status": "OK", 
     "entities": { 
      "0": { 
       "type": "Company", 
       "relevance": "0.96", 
       "text": "ICANN" 
      }, 
      "1": { 
       "type" : "Person", 
       "relevance": "0.36242", 
       "text" : "Neelie Kroes" 
      }, 
      "2": { 
       "type": "Company", 
       "relevance": "0.265175", 
       "text": "IANA" 
      } 
     } 
    } 
}

我的任务是“相关性”搜索“类型”和“文本”的子文档中，然后进行排序。随着$ elemMatch操作我能够执行查询：

db.resource.find({ 
    'nlp.entities': { 
     '$elemMatch': {'text': 'Neelie Kroes', 'type': 'Person'} 
    } 
});

完美，现在我已经通过相关下降到所有类型“人”和值“尼利·克罗斯”的实体记录进行排序。

我尝试了一个正常的“排序”，但是，作为关于$ elemMatch中的sort（）的manual said，结果可能不会反映排序顺序，因为sort（）是在数组之前应用于数组的元素$ elemMatch投影。

事实上，_id：987456321将是第一个（相关度为0.96，但提及ICANN）。

我该怎么办，通过匹配的子文档的相关性对我的文档进行排序？

P.S .:我无法更改文档结构。

来源

2014-03-30 Marcello Verona

这是作为一个工具的转储出来，或者这是你的文件在mongo shell中的实际外观？我这样说是因为你如何表示“实体”是作为“子文档”而不是数组。这些不能通过任何标准手段进行分类。 –

如上所述，我希望你的文件确实有一个数组，但如果$ elemMatch正在为你工作，那么他们应该。

无论如何，您无法使用find对数组中的元素进行排序。但是，在这里你可以使用.aggregate()做到这一点的情况下：

db.collection.aggregate([ 

    // Match the documents that you want, containing the array 
    { "$match": { 
     "nlp.entities": { 
      "$elemMatch": { 
       "text": "Neelie Kroes", 
       "type": "Person" 
      } 
     } 
    }}, 

    // Project to "store" the whole document for later, duplicating the array 
    { "$project": { 
     "_id": { 
      "_id": "$_id", 
      "url": "$url", 
      "nlp": "$nlp"   
     }, 
     "entities": "$nlp.entities" 
    }}, 

    // Unwind the array to de-normalize 
    { "$unwind": "$entities" }, 

    // Match "only" the relevant entities 
    { "$match": { 
     "entities.text": "Neelie Kroes", 
     "entities.type": "Person" 
    }}, 

    // Sort on the relevance 
    { "$sort": { "entities.relevance": -1 } }, 

    // Restore the original document form 
    { "$project": { 
     "_id": "$_id._id", 
     "url": "$_id.url", 
     "nlp": "$_id.nlp" 
    }} 
])

所以基本上，做$match条件包含相关比赛文稿后，然后在_id领域使用$project“店”的原始文件和$unwind“实体”数组的“副本”。

下一个$match将数组内容“过滤”到只有那些相关的数组内容。然后，您将$sort应用于“匹配”文档。

由于“原始”文档存储在_id下，因此您使用$project来“恢复”该文档实际上必须从头开始的结构。

这就是你对数组的匹配元素“排序”的方法。

注意如果你父文档一个阵列中有多个“匹配”，那么你就必须使用一个额外的$group阶段获得以完成对“相关性”字段中的$最大值您分类。

来源

2014-03-31 00:12:08

谢谢你完美的作品。第一次很慢，但之后非常快。集合函数是否存储在RAM中，缓存还是让其他加速系统？再次感谢你。 –

MongoDB查找子文档并对结果进行排序

回答

相关问题