我是neo4j和密码查询语言的新手。奇怪的neo4j密码行为
我的节点/关系数据集基本上看起来像下面这样:
- 我在数据库中有大约27000用户节点
- 我已经在数据库中约8000问节点
- 问题的节点都可以回答(用户) - [:ANSWERED] - >(问题)
- 某些Question节点为用户触发属性,因此存在像(用户) - [:HAS_PROPERTY] - >(Property)
- 此外,一些Question节点需要一些属性才能够得到回答。所以有关系像(问题) - [:REQUIRES] - >(Property)
现在我的查询全部是关于查找特定用户尚未回答的问题, 50个问题。
hassling了一段时间后,我想出了以下查询:
MATCH (user:User {code: 'xyz'}), (:ActiveQuestions)-[]->(q:Question)
OPTIONAL MATCH (:User {code: 'xyz'})-[a:ANSWERED]->(q)
WITH q, user
WHERE a IS NULL
OPTIONAL MATCH (q)-[r:REQUIRES]->(:Property)
WITH q, user, count(r) as rCount
OPTIONAL MATCH (q)-[r:REQUIRES]->(:Property)<-[h:HAS_PROPERTY]-(user)
WITH q, rCount, count(h) as hCount
WHERE rCount = 0 or rCount = hCount
RETURN q ORDER BY q.priority DESC LIMIT 50
上述查询给我行预期,是相当快(约150毫秒),这是真棒。
什么我不明白的是:
当我替换为用户变量查询的第二行,而不是做一个标签查找查询变得非常缓慢。尤其对于已经回答了很多甚至所有问题的用户。
所以下面的查询是慢了许多:
MATCH (user:User {code: 'xyz'}), (:ActiveQuestions)-[]->(q:Question)
OPTIONAL MATCH (user)-[a:ANSWERED]->(q)
WITH q, user
WHERE a IS NULL
OPTIONAL MATCH (q)-[r:REQUIRES]->(:Property)
WITH q, user, count(r) as rCount
OPTIONAL MATCH (q)-[r:REQUIRES]->(:Property)<-[h:HAS_PROPERTY]-(user)
WITH q, rCount, count(h) as hCount
WHERE rCount = 0 or rCount = hCount
RETURN q ORDER BY q.priority DESC LIMIT 50
为什么是这样的话,因为我真的不明白吗?事实上,我认为查询会更便宜,重新使用已经匹配的用户作为第二个可选匹配的基础。
在对密码性能进行研究的同时,我发现很多文章告诉我应尽量避免可选匹配。所以我的第一个查询看起来像下面这样:
MATCH (user:User {code: 'xyz'}), (:ActiveQuestions)-[]->(q:Question)
MATCH (q) WHERE NOT (q)<-[:ANSWERED]->(user)
WITH q, user
OPTIONAL MATCH (q)-[r:REQUIRES]->(:Property)
WITH q, user, count(r) as rCount
OPTIONAL MATCH (q)-[r:REQUIRES]->(:Property)<-[h:HAS_PROPERTY]-(user)
WITH q, rCount, count(h) as hCount
WHERE rCount = 0 or rCount = hCount
RETURN q ORDER BY q.priority DESC LIMIT 50
这里同样的问题。上面的查询比第一个慢很多。慢20-30倍左右。
最后,我想问问我是否缺少一些东西,是否还有更好的方法来实现我的目标。
任何帮助,将不胜感激。
问候,
亚历
编辑
下面是一些分析详细信息:
使用下面的查询:
MATCH (user:User {code: 'xyz'}), (:ActiveQuestions)-[]->(q:Question)
OPTIONAL MATCH (:User {code: 'xyz'})-[a:ANSWERED]->(q)
WITH q, user
WHERE a IS NULL
OPTIONAL MATCH (q)-[r:REQUIRES]->(:Property)
WITH q, user, count(r) as rCount
OPTIONAL MATCH (q)-[r:REQUIRES]->(:Property)<-[h:HAS_PROPERTY]-(user)
WITH q, rCount, count(h) as hCount
WHERE rCount = 0 or rCount = hCount
RETURN q ORDER BY q.priority DESC LIMIT 50
Cypher version: CYPHER 2.2, planner: COST. 26979 total db hits in 169 ms.
使用从迈克尔饥饿所述建议查询:
MATCH (user:User {code: 'abc'})
MATCH (:ActiveQuestions)-[]->(q:Question)
WHERE NOT (user)-[:ANSWERED]->(q)
OPTIONAL MATCH (q)-[r:REQUIRES]->(:Property)
WITH q, user, count(r) as rCount
OPTIONAL MATCH (q)-[r:REQUIRES]->(:Property)<-[h:HAS_PROPERTY]-(user)
WITH q, rCount, count(h) as hCount
WHERE rCount = 0 or rCount = hCount
RETURN q ORDER BY q.priority DESC LIMIT 50
Cypher version: CYPHER 2.2, planner: COST. 2337573 total db hits in 2622 ms.
所以我当前的查询速度更快,效率更高。
我真的不明白,为什么我题为邮报“奇怪的Neo4j暗号行为”的事实,当我修改我还挺快的查询从第二行:
OPTIONAL MATCH (:User {code: 'xyz'})-[a:ANSWERED]->(q)
到:
OPTIONAL MATCH (user)-[a:ANSWERED]->(q)
这将是有点简单,逻辑我,我得到如下:
MATCH (user:User {code: 'xyz'}), (:ActiveQuestions)-[]->(q:Question)
WHERE NOT (user)-[:ANSWERED]->(q)
OPTIONAL MATCH (q)-[r:REQUIRES]->(:Property)
WITH q, user, count(r) as rCount
OPTIONAL MATCH (q)-[r:REQUIRES]->(:Property)<-[h:HAS_PROPERTY]-(user)
WITH q, rCount, count(h) as hCount
WHERE rCount = 0 or rCount = hCount
RETURN q ORDER BY q.priority DESC LIMIT 50
Cypher version: CYPHER 2.2, planner: COST. 2337573 total db hits in 2391 ms.
因此,与之前提到的慢速查询相比,我可以获得完全相同的数据库访问量。
有没有人对此有过解释?
而且它没有任何区别,当我修改第一行
来自:
MATCH (user:User {code: 'xyz'}), (:ActiveQuestions)-[]->(q:Question)
到:
MATCH (user:User {code: 'xyz'})
MATCH (:ActiveQuestions)-[]->(q:Question)
所以我基本上有两个问题:
为什么重复使用已定义的用户节点变量(用户)比使用
(user:User {code: 'xyz'})
重复使用查询要慢得多我的第二行使用的是外部连接的准等价物。根据我提出的所有建议,这比使用
MATCH (q) WHERE NOT (q)<-[:ANSWERED]->(user)
要快得多,我认为后者也在做一个外连接,但似乎并非如此。编辑
一些进一步的分析我想出了一个便宜一点查询后。看下面的分析详细信息:
使用下面的暗号查询:
MATCH (user:User {code: 'xyz'}), (:ActiveQuestions)-[]->(q)
OPTIONAL MATCH (:User {code: 'xyz'})-[a:ANSWERED]->(q)
WITH q, user
WHERE a IS NULL
OPTIONAL MATCH (q)-[r:REQUIRES]->(p)
WITH q, user, count(r) as rCount
OPTIONAL MATCH (q)-[r:REQUIRES]->(p)<-[h:HAS_PROPERTY]-(user)
WITH q, rCount, count(h) as hCount
WHERE rCount = hCount
RETURN q ORDER BY q.priority DESC LIMIT 50
Cypher version: CYPHER 2.2, planner: COST. 21669 total db hits in 120 ms.
所以我基本上摆脱了明确的节点标签(:问题)和(:房产)的例子中,这听起来逻辑对我来说因为不再需要明确的标签扫描。这为我节省了大约5300个数据库点击量。
还有什么可以在此查询上进行调整?
您好迈克尔,我已经尝试使用WHERE NOT子句,正如我在我的第一篇文章中提到的,实际上它正在查询并使其慢大约20倍。在引导我对我的第一篇文章的第一个查询之前,我做了很多分析,这是我发现的最快的。我会发布一些关于分析信息的细节。 – n3bul4