2015-10-04 37 views
1

我是neo4j和密码查询语言的新手。奇怪的neo4j密码行为

我的节点/关系数据集基本上看起来像下面这样:

  1. 我在数据库中有大约27000用户节点
  2. 我已经在数据库中约8000问节点
  3. 问题的节点都可以回答(用户) - [:ANSWERED] - >(问题)
  4. 某些Question节点为用户触发属性,因此存在像(用户) - [:HAS_PROPERTY] - >(Property)
  5. 此外,一些Question节点需要一些属性才能够得到回答。所以有关系像(问题) - [:REQUIRES] - >(Property)

现在我的查询全部是关于查找特定用户尚未回答的问题, 50个问题。

hassling了一段时间后,我想出了以下查询:

MATCH (user:User {code: 'xyz'}), (:ActiveQuestions)-[]->(q:Question) 
OPTIONAL MATCH (:User {code: 'xyz'})-[a:ANSWERED]->(q) 
WITH q, user 
WHERE a IS NULL 
OPTIONAL MATCH (q)-[r:REQUIRES]->(:Property) 
WITH q, user, count(r) as rCount 
OPTIONAL MATCH (q)-[r:REQUIRES]->(:Property)<-[h:HAS_PROPERTY]-(user) 
WITH q, rCount, count(h) as hCount 
WHERE rCount = 0 or rCount = hCount 
RETURN q ORDER BY q.priority DESC LIMIT 50 

上述查询给我行预期,是相当快(约150毫秒),这是真棒。

什么我不明白的是:

当我替换为用户变量查询的第二行,而不是做一个标签查找查询变得非常缓慢。尤其对于已经回答了很多甚至所有问题的用户。

所以下面的查询是慢了许多:

MATCH (user:User {code: 'xyz'}), (:ActiveQuestions)-[]->(q:Question) 
OPTIONAL MATCH (user)-[a:ANSWERED]->(q) 
WITH q, user 
WHERE a IS NULL 
OPTIONAL MATCH (q)-[r:REQUIRES]->(:Property) 
WITH q, user, count(r) as rCount 
OPTIONAL MATCH (q)-[r:REQUIRES]->(:Property)<-[h:HAS_PROPERTY]-(user) 
WITH q, rCount, count(h) as hCount 
WHERE rCount = 0 or rCount = hCount 
RETURN q ORDER BY q.priority DESC LIMIT 50 

为什么是这样的话,因为我真的不明白吗?事实上,我认为查询会更便宜,重新使用已经匹配的用户作为第二个可选匹配的基础。

在对密码性能进行研究的同时,我发现很多文章告诉我应尽量避免可选匹配。所以我的第一个查询看起来像下面这样:

MATCH (user:User {code: 'xyz'}), (:ActiveQuestions)-[]->(q:Question) 
MATCH (q) WHERE NOT (q)<-[:ANSWERED]->(user) 
WITH q, user 
OPTIONAL MATCH (q)-[r:REQUIRES]->(:Property) 
WITH q, user, count(r) as rCount 
OPTIONAL MATCH (q)-[r:REQUIRES]->(:Property)<-[h:HAS_PROPERTY]-(user) 
WITH q, rCount, count(h) as hCount 
WHERE rCount = 0 or rCount = hCount 
RETURN q ORDER BY q.priority DESC LIMIT 50 

这里同样的问题。上面的查询比第一个慢很多。慢20-30倍左右。

最后,我想问问我是否缺少一些东西,是否还有更好的方法来实现我的目标。

任何帮助,将不胜感激。

问候,

亚历

编辑

下面是一些分析详细信息:

使用下面的查询:

MATCH (user:User {code: 'xyz'}), (:ActiveQuestions)-[]->(q:Question) 
OPTIONAL MATCH (:User {code: 'xyz'})-[a:ANSWERED]->(q) 
WITH q, user 
WHERE a IS NULL 
OPTIONAL MATCH (q)-[r:REQUIRES]->(:Property) 
WITH q, user, count(r) as rCount 
OPTIONAL MATCH (q)-[r:REQUIRES]->(:Property)<-[h:HAS_PROPERTY]-(user) 
WITH q, rCount, count(h) as hCount 
WHERE rCount = 0 or rCount = hCount 
RETURN q ORDER BY q.priority DESC LIMIT 50 

Cypher version: CYPHER 2.2, planner: COST. 26979 total db hits in 169 ms. 

使用从迈克尔饥饿所述建议查询:

MATCH (user:User {code: 'abc'}) 
MATCH (:ActiveQuestions)-[]->(q:Question) 
WHERE NOT (user)-[:ANSWERED]->(q) 
OPTIONAL MATCH (q)-[r:REQUIRES]->(:Property) 
WITH q, user, count(r) as rCount 
OPTIONAL MATCH (q)-[r:REQUIRES]->(:Property)<-[h:HAS_PROPERTY]-(user) 
WITH q, rCount, count(h) as hCount 
WHERE rCount = 0 or rCount = hCount 
RETURN q ORDER BY q.priority DESC LIMIT 50 

Cypher version: CYPHER 2.2, planner: COST. 2337573 total db hits in 2622 ms. 

所以我当前的查询速度更快,效率更高。

我真的不明白,为什么我题为邮报“奇怪的Neo4j暗号行为”的事实,当我修改我还挺快的查询从第二行:

OPTIONAL MATCH (:User {code: 'xyz'})-[a:ANSWERED]->(q) 

到:

OPTIONAL MATCH (user)-[a:ANSWERED]->(q) 

这将是有点简单,逻辑我,我得到如下:

MATCH (user:User {code: 'xyz'}), (:ActiveQuestions)-[]->(q:Question) 
WHERE NOT (user)-[:ANSWERED]->(q) 
OPTIONAL MATCH (q)-[r:REQUIRES]->(:Property) 
WITH q, user, count(r) as rCount 
OPTIONAL MATCH (q)-[r:REQUIRES]->(:Property)<-[h:HAS_PROPERTY]-(user) 
WITH q, rCount, count(h) as hCount 
WHERE rCount = 0 or rCount = hCount 
RETURN q ORDER BY q.priority DESC LIMIT 50 

Cypher version: CYPHER 2.2, planner: COST. 2337573 total db hits in 2391 ms. 

因此,与之前提到的慢速查询相比,我可以获得完全相同的数据库访问量。

有没有人对此有过解释?

而且它没有任何区别,当我修改第一行

来自:

MATCH (user:User {code: 'xyz'}), (:ActiveQuestions)-[]->(q:Question) 

到:

MATCH (user:User {code: 'xyz'}) 
MATCH (:ActiveQuestions)-[]->(q:Question) 

所以我基本上有两个问题:

  1. 为什么重复使用已定义的用户节点变量(用户)比使用(user:User {code: 'xyz'})

  2. 重复使用查询要慢得多我的第二行使用的是外部连接的准等价物。根据我提出的所有建议,这比使用MATCH (q) WHERE NOT (q)<-[:ANSWERED]->(user)要快得多,我认为后者也在做一个外连接,但似乎并非如此。

    编辑

一些进一步的分析我想出了一个便宜一点查询后。看下面的分析详细信息:

使用下面的暗号查询:

MATCH (user:User {code: 'xyz'}), (:ActiveQuestions)-[]->(q) 
OPTIONAL MATCH (:User {code: 'xyz'})-[a:ANSWERED]->(q) 
WITH q, user 
WHERE a IS NULL 
OPTIONAL MATCH (q)-[r:REQUIRES]->(p) 
WITH q, user, count(r) as rCount 
OPTIONAL MATCH (q)-[r:REQUIRES]->(p)<-[h:HAS_PROPERTY]-(user) 
WITH q, rCount, count(h) as hCount 
WHERE rCount = hCount 
RETURN q ORDER BY q.priority DESC LIMIT 50 

Cypher version: CYPHER 2.2, planner: COST. 21669 total db hits in 120 ms. 

所以我基本上摆脱了明确的节点标签(:问题)和(:房产)的例子中,这听起来逻辑对我来说因为不再需要明确的标签扫描。这为我节省了大约5300个数据库点击量。

还有什么可以在此查询上进行调整?

回答

1

您用第二个匹配遍历了很多行,您必须再次折叠,因此如果您将第一个WITH更改为with distinct q, user或聚合with q,user, count(*) as answers。然后你再次降低你的基数。

而且这已经跨越了很多行的,我认为(:ActiveQuestions)-[]->(q:Question)

如果您有谱运行查询,你应该看到有多少数据被访问。

一般来说,我会尝试将您的OPTIONAL MATCH更改为WHERE条件并查看它是如何发生的。

Btw。您可以将活动问题标记为:ActiveQuestion,不需要额外的关系。我还添加了一个rel-type。

MATCH (user:User {code: 'xyz'}) 
MATCH (:ActiveQuestions)-[:IS_ACTIVE]->(q:Question) 
WHERE NOT (user)-[:ANSWERED]->(q) 
OPTIONAL MATCH (q)-[r:REQUIRES]->(:Property) 
WITH q, user, count(r) as rCount 
OPTIONAL MATCH (q)-[r:REQUIRES]->(:Property)<-[h:HAS_PROPERTY]-(user) 
WITH q, rCount, count(h) as hCount 
WHERE rCount = 0 or rCount = hCount 
RETURN q ORDER BY q.priority DESC LIMIT 50 
+0

您好迈克尔,我已经尝试使用WHERE NOT子句,正如我在我的第一篇文章中提到的,实际上它正在查询并使其慢大约20倍。在引导我对我的第一篇文章的第一个查询之前,我做了很多分析,这是我发现的最快的。我会发布一些关于分析信息的细节。 – n3bul4