我使用Apache Solr实现我的web应用程序的匹配功能,并且我遇到这种情况的一个问题:Solr的查询匹配嵌套/关系型数据
我有三个程序员,技能领域是他们的技能, “重量”是指技能有多好,他/她:
{
name: "John",
skill: [
{name: "java", weight: 90},
{name: "oracle", weight: 90},
{name: "linux", weight: 70}
]
},
{
name: "Sam",
skill: [
{name: "C#", weight: 98},
{name: "java", weight: 75},
{name: "oracle", weight: 70},
{name: "tomcat", weight: 70},
]
},
{
name: "Bob",
skill: [
{name: "oracle", weight: 90},
{name: "java", weight: 85}
]
}
,我有一个求职的程序员:
{
name: "webapp development",
skillRequired: [
{name: "java", weight: 85},
{name: "oracle", weight: 85},
]
}
我想用工作的“skillRequired”匹配那些程序员(找到最适合这份工作的人)。在这种情况下,应该是John和Bob,Sam因为他的java和oracle技能不够好而被踢掉了。约翰得分高于鲍勃,因为他更了解甲骨文。
问题是,Solr的不能嵌套索引对象,我想我能得到最好的格式是:
name: "John",
skill-name: ["java", "oracle", "linux"],
skill-weight: [90, 90, 70]
等。所以我不知道是否可以构建一个查询来获得这个场景的工作。
有更好的模式结构吗?或使用索引/查询时间提升?
我几乎读了几乎所有的solr wiki和google都没有运气,欢迎提供任何提示和解决方法。
问题解决了,在这里我登录解决方案的帮助:
月1日,我的数据格式是JSON,所以我需要Solr的-4.8.0与JSON支持指数嵌套数据。如果数据是xml格式,solr-4.7.2仍然有效。
2,Solr的-4.8.0需要java7-U55(官方推荐)
3,嵌套的文件/对象应提交与 “childDocuments” 键SOLR。为了识别父/子文档的类型,我添加了“type”字段。所以用上面的例子,似乎是这样的:
{
type: "programmer",
name: "John",
_childDocuments_: [
{type:"skill", name: "java", weight: 90},
{type:"skill", name: "oracle", weight: 90},
{type:"skill", name: "linux", weight: 70}
]
},
{
type: "programmer",
name: "Sam",
_childDocuments_: [
{type:"skill",name: "C#", weight: 98},
{type:"skill", name: "java", weight: 75},
{type:"skill", name: "oracle", weight: 70},
{type:"skill", name: "tomcat", weight: 70},
]
},
{
type: "programmer",
name: "Bob",
_childDocuments_: [
{type:"skill", name: "oracle", weight: 90},
{type:"skill", name: "java", weight: 85}
]
}
4,提交后并提交到Solr,我可以匹配块的工作连接(过滤器查询)查询:
fq={!parent which='type:programmer'}type:skill AND name:java AND weight:[85 TO *]&
fq={!parent which='type:programmer'}type:skill AND name:oracle AND weight:[85 TO *]
请问您能否为这个特殊情况提供schema.xml? – frankie
您是否必须将_ root _字段添加到您的模式中?我遵循http://yonik.com/solr-nested-objects/的指导方针,在添加嵌套文档之前,我必须更新模式:$ curl http:// localhost:8983/solr/nested_demo/schema -X POST -H'Content-type:application/json'--data-binary'“add-field”:{ “name”:“_ root _”, “type”:“string”, “indexed”:true, “stored”:false } }' – alisa
您能否提供架构?你是如何在模式中声明这个字段的? –