2017-05-21 65 views
0

我试图加载一个CSV文件(25 Mb大小,150 000行),其中包含22列成neo4j图形使用py2neo航班模型化。Neo4j:从CSV文件创建关系是非常慢py2neo

密码查询用于一个查询并包含节点(机场,城市,飞行和飞机)之间的节点和关系创建。但是在运行代码时,即使使用定期提交,也需要永久使用。

我不确定我写的密码查询是否已优化,并且可能是缓慢的来源。 对于10 000行,我花了大约10分钟来建立图... 任何人都可以帮助我吗?下面是代码:

def importFromCSVtoNeo(graph): 
query = ''' 
    USING PERIODIC COMMIT 1000 
    LOAD CSV WITH HEADERS FROM "file:///flights.csv" AS row FIELDTERMINATOR '\t' 
    WITH row 

    MERGE (c_departure:City {cityName: row.cityName_departure}) 
    MERGE (a_departure:Airport {airportName: row.airportName_departure}) 
    MERGE (f_segment1:Flight {airline: row.airline1}) 
    ON CREATE SET f_segment1.class = row.class1, 
        f_segment1.outboundclassgroup = row.outboundclassgroup1 

    MERGE (a_departure)-[:IN]->(c_departure) 
    MERGE (c_departure)-[:HAS]->(a_departure) 
    MERGE (f_segment1)-[:FROM {departAt: row.outbounddeparttime}]->(a_departure) 

    MERGE (c_transfer:City {cityName: row.transferCityName}) 
    MERGE (a_transfer:Airport {airportName: row.airportName_transfer}) 
    MERGE (f_segment1)-[:TO_TRANSFER {transferArriveAt: row.transferArriveAt}]->(a_transfer) 
    MERGE (a_transfer)-[:IN]->(c_transfer) 
    MERGE (c_transfer)-[:HAS]->(a_transfer) 

    MERGE (c_arrival:City {cityName: row.cityName_arrival}) 
    MERGE (a_arrival:Airport {airportName: row.airportName_arrival}) 
    MERGE (f_segment2:Flight {airline: row.airline2}) 
    ON CREATE SET f_segment2.class = row.class2, 
        f_segment2.outboundclassgroup = row.outboundclassgroup2 
    MERGE (f_segment2)-[:TO {arrivalAt: row.outboundarrivaltime}]->(a_arrival) 
    MERGE (f_segment2)-[:FROM_TRANSFER {transferDepartAt: row.transferDepartAt}]->(a_transfer) 
    MERGE (a_arrival)-[:IN]->(c_arrival) 
    MERGE (c_arrival)-[:HAS]->(a_arrival) 


    MERGE (p:Plane {saleprice: row.saleprice}) 
    ON CREATE SET p.depart = row.cityName_departure, 
        p.destination = row.cityName_arrival, 
        p.salechannel = row.salechannel, 
        p.planeDuration = row.planeDuration 
    MERGE (p)-[:HAS_FLIGHTS]->(f_segment1) 
    MERGE (f_segment1)-[:WAIT_FOR {waitingTime: row.waitingTime}]->(f_segment2) 
    ''' 

graph.run(query) 


if __name__ == '__main__': 
    graph = Graph() 
    importFromCSVtoNeo(graph) 

我也试着做一个批处理模式,但性能并没有得到更好的... 我会知道的任何意见或建议。谢谢 !!

回答

1

我会在启动脚本之前使用节点属性的索引,以便让neo4j在使用MERGE时快速查找(因为它必须逐行匹配节点)。例如,对于我将使用的第一个节点属性:

CREATE INDEX ON :City(cityname) 

等等。您可以直接在py2neo中将它们创建为单个运行语句。

+0

谢谢你的回答!但我也尝试过,但它并没有改变速度:s – filipyoo

+0

我试图在一个熊猫数据框中加载csv文件,然后使用py2neo API填充图形,但它一直在创建相同的节点速度慢 :/ – filipyoo