2017-05-26 548 views
2

在我的Neo4j/SDN 4应用程序中,我所有的Cypher查询都基于内部Neo4j ID。Neo4j SDN 4 GraphId性能与指数

这是一个问题,因为我无法在我的Web应用程序URL上依赖这些ID。 Neo4j可以重用这些ID,所以很有可能在未来的某个时间在相同的ID下我们可以找到绝对的另一个节点。

我试图根据以下解决方案重新实现此逻辑:Using the graph to control unique id generation但注意到查询性能下降。

从理论的角度来看,应在Cypher支架查询基于与@Index(unique = true, primary = true属性)

例如:

@Index(unique = true, primary = true) 
private Long uid; 

entity.uid = {someId} 

具有同样性能的Cypher支架查询其基于内部工作Neo4j的ID:

id(entity) = {someId} 

修订

这是:schema输出:

Indexes 
    ON :BaseEntity(uid) ONLINE 
    ON :Characteristic(lowerName) ONLINE 
    ON :CharacteristicGroup(lowerName) ONLINE 
    ON :Criterion(lowerName) ONLINE 
    ON :CriterionGroup(lowerName) ONLINE 
    ON :Decision(lowerName) ONLINE 
    ON :FlagType(name) ONLINE (for uniqueness constraint) 
    ON :HAS_VALUE_ON(value) ONLINE 
    ON :HistoryValue(originalValue) ONLINE 
    ON :Permission(code) ONLINE (for uniqueness constraint) 
    ON :Role(name) ONLINE (for uniqueness constraint) 
    ON :User(email) ONLINE (for uniqueness constraint) 
    ON :User(username) ONLINE (for uniqueness constraint) 
    ON :Value(value) ONLINE 

Constraints 
    ON (flagtype:FlagType) ASSERT flagtype.name IS UNIQUE 
    ON (permission:Permission) ASSERT permission.code IS UNIQUE 
    ON (role:Role) ASSERT role.name IS UNIQUE 
    ON (user:User) ASSERT user.email IS UNIQUE 
    ON (user:User) ASSERT user.username IS UNIQUE 

正如你可以看到我有:BaseEntity(uid)

BaseEntity的指数是在我的实体层次结构中的基类,例如:

@NodeEntity 
public abstract class BaseEntity { 

    @GraphId 
    private Long id; 

    @Index(unique = false) 
    private Long uid; 

    private Date createDate; 

    private Date updateDate; 

... 

} 

@NodeEntity 
public class Commentable extends BaseEntity { 
... 
} 

@NodeEntity 
public class Decision extends Commentable { 

    private String name; 

} 

威尔当我在查找(d:Decision) WHERE d.uid = {uid}的示例时,将使用uid索引?

PROFILE resuls - 内部ID VS索引属性

查询基于内部ID

PROFILE MATCH (parentD)-[:CONTAINS]->(childD:Decision) 
WHERE id(parentD) = 1474333 
MATCH (childD)-[relationshipValueRel1475199:HAS_VALUE_ON]-(filterCharacteristic1475199) 
WHERE id(filterCharacteristic1475199) = 1475199 
WITH relationshipValueRel1475199, childD 
WHERE ([1, 19][0] <= relationshipValueRel1475199.value <= [1, 19][1]) 
WITH childD 
MATCH (childD)-[relationshipValueRel1474358:HAS_VALUE_ON]-(filterCharacteristic1474358) 
WHERE id(filterCharacteristic1474358) = 1474358 
WITH relationshipValueRel1474358, childD 
WHERE (ANY (id IN ['Compact'] WHERE id IN relationshipValueRel1474358.value)) 
WITH childD 
MATCH (childD)-[relationshipValueRel1475193:HAS_VALUE_ON]-(filterCharacteristic1475193) 
WHERE id(filterCharacteristic1475193) = 1475193 
WITH relationshipValueRel1475193, childD 
WHERE (ANY (id IN ['16:9', '3:2', '4:3', '1:1'] 
WHERE id IN relationshipValueRel1475193.value)) 
WITH childD 
OPTIONAL MATCH (childD)-[vg:HAS_VOTE_ON]->(c) 
WHERE id(c) IN [1474342, 1474343, 1474340, 1474339, 1474336, 1474352, 1474353, 1474350, 1474351, 1474348, 1474346, 1474344] 
WITH childD, vg.avgVotesWeight as weight, vg.totalVotes as totalVotes 
WITH * MATCH (childD)-[ru:CREATED_BY]->(u:User) 
WITH ru, u, childD , toFloat(sum(weight)) as weight, toInt(sum(totalVotes)) as totalVotes 
ORDER BY weight DESC 
SKIP 0 LIMIT 10 
RETURN ru, u, childD AS decision, weight, totalVotes, 
[ (parentD)<-[:DEFINED_BY]-(entity)<-[:COMMENTED_ON]-(comg:CommentGroup)-[:COMMENTED_FOR]->(childD) | {entityId: id(entity), types: labels(entity), totalComments: toInt(comg.totalComments)} ] AS commentGroups, 
[ (parentD)<-[:DEFINED_BY]-(c1)<-[vg1:HAS_VOTE_ON]-(childD) | {criterionId: id(c1), weight: vg1.avgVotesWeight, totalVotes: toInt(vg1.totalVotes)} ] AS weightedCriteria, 
[ (parentD)<-[:DEFINED_BY]-(ch1:Characteristic)<-[v1:HAS_VALUE_ON]-(childD) WHERE NOT ((ch1)<-[:DEPENDS_ON]-()) | {characteristicId: id(ch1), value: v1.value, totalHistoryValues: toInt(v1.totalHistoryValues), description: v1.description, valueType: ch1.valueType, visualMode: ch1.visualMode} ] AS valuedCharacteristics 

PROFILE输出:

Cypher支架版本:CYPHER 3.1,计划者:成本,运行时:INTERPRETED 。 350554总分贝命中238毫秒。 CYPHER 3.1,规划师:成本,运行时间:解释为基于索引的属性UID

PROFILE MATCH (parentD)-[:CONTAINS]->(childD:Decision) 
WHERE parentD.uid = 61 
MATCH (childD)-[relationshipValueRel1475199:HAS_VALUE_ON]-(filterCharacteristic1475199) 
WHERE filterCharacteristic1475199.uid = 15 
WITH relationshipValueRel1475199, childD 
WHERE ([1, 19][0] <= relationshipValueRel1475199.value <= [1, 19][1]) 
WITH childD 
MATCH (childD)-[relationshipValueRel1474358:HAS_VALUE_ON]-(filterCharacteristic1474358) 
WHERE filterCharacteristic1474358.uid = 10 
WITH relationshipValueRel1474358, childD 
WHERE (ANY (id IN ['Compact'] WHERE id IN relationshipValueRel1474358.value)) 
WITH childD 
MATCH (childD)-[relationshipValueRel1475193:HAS_VALUE_ON]-(filterCharacteristic1475193) 
WHERE filterCharacteristic1475193.uid = 14 
WITH relationshipValueRel1475193, childD 
WHERE (ANY (id IN ['16:9', '3:2', '4:3', '1:1'] 
WHERE id IN relationshipValueRel1475193.value)) 
WITH childD 
OPTIONAL MATCH (childD)-[vg:HAS_VOTE_ON]->(c) 
WHERE c.uid IN [26, 27, 24, 23, 20, 36, 37, 34, 35, 32, 30, 28] 
WITH childD, vg.avgVotesWeight as weight, vg.totalVotes as totalVotes 
WITH * MATCH (childD)-[ru:CREATED_BY]->(u:User) 
WITH ru, u, childD , toFloat(sum(weight)) as weight, toInt(sum(totalVotes)) as totalVotes 
ORDER BY weight DESC 
SKIP 0 LIMIT 10 
RETURN ru, u, childD AS decision, weight, totalVotes, 
[ (parentD)<-[:DEFINED_BY]-(entity)<-[:COMMENTED_ON]-(comg:CommentGroup)-[:COMMENTED_FOR]->(childD) | {entityId: id(entity), types: labels(entity), totalComments: toInt(comg.totalComments)} ] AS commentGroups, 
[ (parentD)<-[:DEFINED_BY]-(c1)<-[vg1:HAS_VOTE_ON]-(childD) | {criterionId: id(c1), weight: vg1.avgVotesWeight, totalVotes: toInt(vg1.totalVotes)} ] AS weightedCriteria, 
[ (parentD)<-[:DEFINED_BY]-(ch1:Characteristic)<-[v1:HAS_VALUE_ON]-(childD) WHERE NOT ((ch1)<-[:DEPENDS_ON]-()) | {characteristicId: id(ch1), value: v1.value, totalHistoryValues: toInt(v1.totalHistoryValues), description: v1.description, valueType: ch1.valueType, visualMode: ch1.visualMode} ] AS valuedCharacteristics 

暗号版本

enter image description here

查询。 671326总分贝命中426毫秒。

enter image description here

是否有任何改善基于UID的表现机会呢?

回答

5

你是对的,不要在网址上使用Neo4j内部id,因为节点被删除后可以重复使用它们。

从性能的角度来看,内部id的速度可以达到 - 它实际上是一个带有节点/关系记录的文件中的偏移量(您可能已经注意到这些是2个独立的id序列,您可以使用id = z和相同的id = x的关系)。

任何索引的使用都要慢一些,因为数据库首先索引查找,获取内部id,然后读取节点记录。

但是,对于绝大多数应用性能差异可以忽略 - 可能远远小于网络延迟或一般OGM开销。

如果你看到

  • 明显的差异验证指标存在于数据库中(例如,在Neo4j的浏览器:schema
  • 打开日志记录和验证查询标签正确(设置org.neo4j.ogminfo水平)
  • 如果索引存在,并且查询包含正确的标签,然后使用PROFILE检查查询计划

修订

是,指数将被用于类似查询:应该由

session.load(Decision.class, uid) 

获取生成,如果你的指数是主要或findByUidDecisionRepository

MATCH (d:Decision) WHERE d.uid = {uid} ... 

要注意的是,当where子句出现在查询中间的指标可以不使用:

... 
WITH x 
MATCH (x)-[...]-(d) WHERE d.uid = {uid} ... 

这取决于查询计划,你应该使用PROFILE进行调查。

+0

感谢您的回答。现在,我试图采用一种方法来重构我的系统,以避免ID重用的问题,并且我看到了以下模式 - 在我的网站上我将使用代理uid。如果id不需要放在web url中,我会使用内部的Neo4j id.So这个代理uuid将只用在web urls中,否则在客户端的所有其他地方我都会使用内部的Neo4j ID。是否有意义 ? – alexanoid

+0

有两种方式通过ID访问实体可能会使事情不必要地复杂化。我只会去用自定义的uuid。正如我所说索引是*快*,内部id和索引查找之间的差异将小于网络延迟或一般OGM开销的数量级。 –

+0

我在单个查询中使用了不同的ID(https://stackoverflow.com/questions/43824894/neo4j-cypher-query-structure-and-performance-optimization),所以使用基于一个纯粹的UIDs被人眼所注意到。 – alexanoid