2017-04-14 41 views
0

在Cypher中,如何修改k-means来考虑Jaccard距离Dj而不是欧几里得距离?k-means聚类中的Jaccard

在哪里的Jaccard距离被定义为了Dj = 1-(|A∩B|)/(|A∪B|)

+0

检查此图表:http://neo4j.com/graphgist/49a2b9874b37b4a2da4a/ –

回答

0

下面是如何计算与CYPHER(从Recommendations Neoj Sandbox)的Jaccard距离的例子:

MATCH (m:Movie {title: "Inception"})-[:IN_GENRE]->(g:Genre)<-[:IN_GENRE]-(other:Movie) 
WITH m, other, COUNT(g) AS intersection, COLLECT(g.name) AS i 
MATCH (m)-[:IN_GENRE]->(mg:Genre) 
WITH m,other, intersection,i, COLLECT(mg.name) AS s1 
MATCH (other)-[:IN_GENRE]->(og:Genre) 
WITH m,other,intersection,i, s1, COLLECT(og.name) AS s2 
WITH m,other,intersection,s1,s2 
WITH m,other,intersection,s1+filter(x IN s2 WHERE NOT x IN s1) AS union, s1, s2 
RETURN m.title, other.title, s1,s2,((1.0*intersection)/SIZE(union)) AS jaccard ORDER BY jaccard DESC LIMIT 100 

一旦你计算出你可以使用它与你的k-means算法。你如何运行k-means?还在Cypher中?

+0

非常感谢!是的,这将最终都是一个neo4j查询。 – ProdBot