Neo4j用于搜索的文档，关键词和词干的数据模型

我的目标是使用neo4j对文档进行两种不同类型的搜索。我将为我的示例使用食谱（文档）。说我手边有一份配料（关键词）（牛奶，黄油，面粉，盐，糖，鸡蛋......），我在我的数据库中有一些配方，每个配方都附有配料。我想输入我的列表并得到两个不同的结果。一个是最接近包括我输入的所有成分的食谱。第二个是食谱的组合，一起包括我的所有成分。Neo4j用于搜索的文档，关键词和词干的数据模型

考虑：牛奶，黄油，面粉，盐，糖，鸡蛋

对于第一种情况的检索结果可能是：

1）糖饼干

2）黄油饼干

一种用于所述第二结果可能是：

1）扁平面包和Gogel-Mogel

我正在阅读食谱中插入neo4j，并从每个配方顶部的成分列表中提取成分，但也从配方说明中提取。我想要权衡这些不同，也许60/40赞成成分列表。

我也想干每种成分，以防人们输入类似的词。

我努力想出一个在neo4j中的好数据模型。我计划让用户输入英文成分，我会在后台阻止它们，并将其用于搜索。

我的第一个想法是： neo4j data model 1 这对我来说很直观，但是要找到所有食谱需要大量的时间。

下一页也许这样的： neo4j data model 2

它得到直接的食谱从茎，但我需要通过配方IDS的关系（？右）获得实际的成分。

第三，也许这样结合他们？ neo4j data model 3 但有很多重复。

这里也有一些CYPHER语句来创建第一个想法：

//Create 4 recipes 
create (r1:Recipe {rid:'1', title:'Sugar cookies'}), (r2:Recipe {rid:'2', title:'Butter cookies'}), 
(r3:Recipe {rid:'3', title:'Flat bread'}), (r4:Recipe {rid:'4', title:'Gogel-Mogel'}) 

//Adding some ingredients 
merge (i1:Ingredient {ingredient:"salted butter"}) 
merge (i2:Ingredient {ingredient:"white sugar"}) 
merge (i3:Ingredient {ingredient:"brown sugar"}) 
merge (i4:Ingredient {ingredient:"all purpose flour"}) 
merge (i5:Ingredient {ingredient:"iodized salt"}) 
merge (i6:Ingredient {ingredient:"eggs"}) 
merge (i7:Ingredient {ingredient:"milk"}) 
merge (i8:Ingredient {ingredient:"powdered sugar"}) 
merge (i9:Ingredient {ingredient:"wheat flour"}) 
merge (i10:Ingredient {ingredient:"bananas"}) 
merge (i11:Ingredient {ingredient:"chocolate chips"}) 
merge (i12:Ingredient {ingredient:"raisins"}) 
merge (i13:Ingredient {ingredient:"unsalted butter"}) 
merge (i14:Ingredient {ingredient:"wheat flour"}) 
merge (i15:Ingredient {ingredient:"himalayan salt"}) 
merge (i16:Ingredient {ingredient:"chocolate bars"}) 
merge (i17:Ingredient {ingredient:"vanilla flavoring"}) 
merge (i18:Ingredient {ingredient:"vanilla"}) 

//Stems added to each ingredient 
merge (i1)<-[:STEM_OF]-(s1:Stem {stem:"butter"}) 
merge (i2)<-[:STEM_OF]-(s2:Stem {stem:"sugar"}) 
merge (i3)<-[:STEM_OF]-(s2) 
merge (i4)<-[:STEM_OF]-(s4:Stem {stem:"flour"}) 
merge (i5)<-[:STEM_OF]-(s5:Stem {stem:"salt"}) 
merge (i6)<-[:STEM_OF]-(s6:Stem {stem:"egg"}) 
merge (i7)<-[:STEM_OF]-(s7:Stem {stem:"milk"}) 
merge (i8)<-[:STEM_OF]-(s2) 
merge (i9)<-[:STEM_OF]-(s4) 
merge (i10)<-[:STEM_OF]-(s10:Stem {stem:"banana"}) 

merge (i11)<-[:STEM_OF]-(s11:Stem {stem:"chocolate"}) 
merge (i12)<-[:STEM_OF]-(s12:Stem {stem:"raisin"}) 
merge (i13)<-[:STEM_OF]-(s1) 
merge (i14)<-[:STEM_OF]-(s4) 
merge (i15)<-[:STEM_OF]-(s5) 
merge (i16)<-[:STEM_OF]-(s11) 
merge (i17)<-[:STEM_OF]-(s13:Stem {stem:"vanilla"}) 
merge (i18)<-[:STEM_OF]-(s13) 


create (r1)<-[:INGREDIENTS_LIST{weight:.7}]-(i1) 
create (r1)<-[:INGREDIENTS_LIST{weight:.6}]-(i2)  
create (r1)<-[:INGREDIENTS_LIST{weight:.5}]-(i4) 
create (r1)<-[:INGREDIENTS_LIST{weight:.4}]-(i5) 
create (r1)<-[:INGREDIENTS_LIST{weight:.4}]-(i6) 
create (r1)<-[:INGREDIENTS_LIST{weight:.2}]-(i7) 
create (r1)<-[:INGREDIENTS_LIST{weight:.1}]-(i18) 

create (r2)<-[:INGREDIENTS_LIST{weight:.7}]-(i1) 
create (r2)<-[:INGREDIENTS_LIST{weight:.6}]-(i3)  
create (r2)<-[:INGREDIENTS_LIST{weight:.5}]-(i4) 
create (r2)<-[:INGREDIENTS_LIST{weight:.4}]-(i5) 
create (r2)<-[:INGREDIENTS_LIST{weight:.3}]-(i6) 
create (r2)<-[:INGREDIENTS_LIST{weight:.2}]-(i7) 
create (r2)<-[:INGREDIENTS_LIST{weight:.1}]-(i18) 

create (r3)<-[:INGREDIENTS_LIST{weight:.7}]-(i1) 
create (r3)<-[:INGREDIENTS_LIST{weight:.6}]-(i5) 
create (r3)<-[:INGREDIENTS_LIST{weight:.5}]-(i7) 
create (r3)<-[:INGREDIENTS_LIST{weight:.4}]-(i9) 

create (r4)<-[:INGREDIENTS_LIST{weight:.6}]-(i2) 
create (r4)<-[:INGREDIENTS_LIST{weight:.5}]-(i6) 



create (r1)<-[:INGREDIENTS_INSTR{weight:.2}]-(i1) 
create (r1)<-[:INGREDIENTS_INSTR{weight:.2}]-(i2) 
create (r1)<-[:INGREDIENTS_INSTR{weight:.2}]-(i4) 
create (r1)<-[:INGREDIENTS_INSTR{weight:.2}]-(i5) 
create (r1)<-[:INGREDIENTS_INSTR{weight:.1}]-(i6) 
create (r1)<-[:INGREDIENTS_INSTR{weight:.1}]-(i7) 


create (r2)<-[:INGREDIENTS_INSTR{weight:.3}]-(i1) 
create (r2)<-[:INGREDIENTS_INSTR{weight:.2}]-(i3) 
create (r2)<-[:INGREDIENTS_INSTR{weight:.2}]-(i4) 
create (r2)<-[:INGREDIENTS_INSTR{weight:.2}]-(i5) 
create (r2)<-[:INGREDIENTS_INSTR{weight:.2}]-(i6) 
create (r2)<-[:INGREDIENTS_INSTR{weight:.1}]-(i7) 


create (r3)<-[:INGREDIENTS_INSTR{weight:.3}]-(i1) 
create (r3)<-[:INGREDIENTS_INSTR{weight:.3}]-(i5) 
create (r3)<-[:INGREDIENTS_INSTR{weight:.1}]-(i7) 
create (r3)<-[:INGREDIENTS_INSTR{weight:.1}]-(i9) 

create (r4)<-[:INGREDIENTS_INSTR{weight:.3}]-(i2) 
create (r4)<-[:INGREDIENTS_INSTR{weight:.3}]-(i6)

，并链接到一个Neo4j的控制台上面的语句： http://console.neo4j.org/?id=3o8y44

多少Neo4j的关心多重关系？此外，我可以做一个单一的成分，但我怎么会把一个查询，让食谱给多个配料？

编辑：谢谢迈克尔！这让我进一步。我能够扩大你的答案：

WITH split("egg, sugar, chocolate, milk, flour, salt",", ") as terms UNWIND 
terms as term MATCH (stem:Stem {stem:term})-[:STEM_OF]-> 
(ingredient:Ingredient)-[lst:INGREDIENTS_LIST]->(r:Recipe) WITH r, 
size(terms) - count(distinct stem) as notCovered, sum(lst.weight) as weight, 
collect(distinct stem.stem) as matched RETURN r , notCovered,matched, weight 
ORDER BY notCovered ASC, weight DESC

并得到了配料和重量的列表。我如何更改查询以显示：INGREDIENTS_INSTR关系的权重，以便我可以同时使用两个权重进行计算？ [lst：INGREDIENTS_LIST | INGREDIENTS_INSTR]不是我想要的。

编辑：

这似乎是工作，是正确的吗？

WITH split("egg, sugar, chocolate, milk, flour, salt",", ") as terms UNWIND 
terms as term MATCH (stem:Stem {stem:term})-[:STEM_OF]-> 
(ingredient:Ingredient)-[lstl:INGREDIENTS_LIST]->(r:Recipe)<- 
[lsti:INGREDIENTS_INSTR]-(ingredient:Ingredient) WITH r, size(terms) - 
count(distinct stem) as notCovered, sum(lsti.weight) as wi, sum(lstl.weight) 
as wl, collect(distinct stem.stem) as matched RETURN r , 
notCovered,matched, wl+wi ORDER BY notCovered ASC, wl+wi DESC

另外，你可以帮助第二个查询吗？在提供成分列表的情况下，将返回包括给定成分的食谱组合。再次感谢！

来源

2017-09-13 Oleg

我会去你的版本1）。

不要担心额外的啤酒花。您会在配方和实际配料之间的关系中放入有关量/重量的信息。

您可以有多个关系。

下面是一个例子查询，你有没有配方，它有所有成分不会与数据集中工作：

WITH split("milk, butter, flour, salt, sugar, eggs",", ") as terms 
UNWIND terms as term 
MATCH (stem:Stem {stem:term})-[:STEM_OF]->(ingredient:Ingredient)-->(r:Recipe) 
WITH r, size(terms) - count(distinct stem) as notCovered 
RETURN r ORDER BY notCovered ASC LIMIT 2 

+-----------------------------------------+ 
| r          | 
+-----------------------------------------+ 
| Node[0]{rid:"1",title:"Sugar cookies"} | 
| Node[1]{rid:"2",title:"Butter cookies"} | 
+-----------------------------------------+ 
2 rows

以下将是大数据集进行优化：

而且对于查询你会首先找到所有的成分，然后食谱附有最有选择性的（最低程度）。

然后检查每个食谱的其余成分。

WITH split("milk, butter, flour, salt, sugar, eggs",", ") as terms 
MATCH (stem:Stem) WHERE stem.stem IN terms 
// highest selective stem first 
WITH stem, terms ORDER BY size((stem)-[:STEM_OF]->()) ASC 
WITH terms, collect(stem) as stems 
WITH head(stems) first, tail(stems) as rest, terms 
MATCH (first)-[:STEM_OF]->(ingredient:Ingredient)-->(r:Recipe) 
WHERE size[other IN rest WHERE (other)-[:STEM_OF]->(:Ingredient)-->(r)] as covered 
WITH r, size(terms) - 1 - covered as notCovered 
RETURN r ORDER BY notCovered ASC LIMIT 2

来源

2017-09-18 23:00:38

是否在最后缺少答案的一部分？冒号后？ – Oleg

为Q1编辑，稍后再做Q2。 –

Neo4j用于搜索的文档，关键词和词干的数据模型

回答

相关问题