2016-07-28 36 views
3

我有一些数据在2个CSV文件中,一个包含顶点,另一个文件包含的边缘在另一个文件中。我正在研究如何使用ETL进行设置,并且接近但尚未完成 - 它主要工作,但我的边有属性,我不确定它们是否正确加载。 This question是有益的,但我还是失去了一些东西......OrientDB ETL加载一个文件中的顶点和另一个边上的顶点的CSV

这里是我的数据:

vertices.csv

label,data,date 
v01,0.1234,2015-01-01 
v02,0.5678,2015-01-02 
v03,0.9012,2015-01-03 

edges.csv

u,v,weight,date 
v01,v02,12.4,2015-06-17 
v02,v03,17.9,2015-09-14 

我用这个导入我的顶点:

commonVertices.json

{ 
"begin": [ 
      { "let": { "name":  "$filePath", 
         "expression": "$fileDirectory.append($fileName)" 
         } 
      }, 
     ], 
"config": { "log": "info"}, 
"source": { "file": { "path": "$filePath" } }, 
"extractor": { "csv": { "ignoreEmptyLines": true, 
         "nullValue": "N/A", 
         "dateFormat": "yyyy-mm-dd" 
         } 
      }, 
"transformers": [ 
        { "vertex": { "class": "myVertex" } }, 
        { "code": { "language": "Javascript", 
            "code":  "print(' Current record: ' + record); record;" } 
        } 
       ], 
"loader": { "orientdb": { 
      "dbURL": "plocal:my_orientdb", 
      "dbType": "graph", 
      "batchCommit": 1000, 
      "classes": [ { "name": "myVertex", "extends", "V" }, 
         ], 
      "indexes": [] 
      } 
      } 
} 

vertices.json

{ "config": { "log":   "info", 
       "fileDirectory": "./", 
       "fileName":  "vertices.csv" 
      } 
} 

commonEdges.json

{ 
    "begin": [ 
     { "let": { "name": "$filePath", 
        "expression": "$fileDirectory.append($fileName)" 
       } 
     }, 
    ], 

    "config": { "log": "info" 
       }, 

    "source": { "file": { "path": "$filePath" } }, 

    "extractor": { "csv": { "ignoreEmptyLines": true, 
          "nullValue": "N/A", 
          "dateFormat": "yyyy-mm-dd" 
          } 
       }, 

    "transformers": [ 
      { "merge": { "joinFieldName": "u", "lookup": "myVertex.label" } }, 
      { "edge": { "class":   "myEdge", 
          "joinFieldName": "v", 
          "lookup":  "myVertex.label", 
          "direction":  "out", 
          "unresolvedLinkAction": "NOTHING" 
         } 
      }, 
      { "field": { "fieldNames": ["u", "v"], "operation": "remove" } } 
     ], 

    "loader": { 
     "orientdb": { 
      "dbURL": "plocal:my_orientdb", 
      "dbType": "graph", 
      "batchCommit": 1000, 
      "useLightweightEdges": false, 
      "classes": [ 
       { "name": "myEdge", "extends", "E" } 
      ], 
      "indexes": [] 
     } 
    } 
} 

edges.json

{ 
    "config": { 
     "log": "info", 
     "fileDirectory": "./", 
     "fileName": "edges.csv" 
    } 
} 

我与oetl.sh像这样运行它:

$ oetl.sh vertices.json commonVertices.json 
$ oetl.sh edges.json commonEdges.json 

,一切都会运行,但是当我查询的边缘......我是新来OrientDB,所以也许这是得到的属性在我的边缘,但是当我查询的边缘,我不看重量和日期字段:

orientdb {db=my_orientdb}> SELECT FROM myEdge 
+----+-----+------+-----+-----+ 
|# |@RID |@CLASS|out |in | 
+----+-----+------+-----+-----+ 
|0 |#33:0|myEdge|#25:0|#26:0| 
|1 |#34:0|myEdge|#26:0|#27:0| 
+----+-----+------+-----+-----+ 

顶点表包含从我edges.csv和[日期]对[体重]字段我的领域越来越clo i一个奇怪的方式。这个月的日子越来越覆盖从edge.csv文件,这是不可取的日子,但很奇怪,我认为本月本身是不是也越来越变化:

orientdb {db=my_orientdb}> SELECT FROM myVertex 
+----+-----+--------+------+-------------------+-----+------+----------+---------+ 
|# |@RID |@CLASS |data |date    |label|weight|out_myEdge|in_myEdge| 
+----+-----+--------+------+-------------------+-----+------+----------+---------+ 
|0 |#25:0|myVertex|0.1234|2015-01-17 00:06:00|v01 |12.4 |[#33:0] |   | 
|1 |#26:0|myVertex|0.5678|2015-01-14 00:09:00|v02 |17.9 |[#34:0] |[#33:0] | 
|2 |#27:0|myVertex|0.9012|2015-01-03 00:01:00|v03 |  |   |[#34:0] | 
+----+-----+--------+------+-------------------+-----+------+----------+---------+ 

我敢肯定,这可能是一个简单的调整,任何帮助将是伟大的!

回答

5

在边缘变压器中使用edgeFields来绑定边中的属性。例如:

"transformers": [ 
      { "merge": { "joinFieldName": "u", "lookup": "myVertex.label" } }, 
      { "edge": { "class":   "myEdge", 
          "joinFieldName": "v", 
          "lookup":  "myVertex.label", 
          "edgeFields": { "weight": "${input.weight}", "date": "${input.date}" }, 
          "direction":  "out", 
          "unresolvedLinkAction": "NOTHING" 
         } 

      }, 
      { "field": { "fieldNames": ["u", "v"], "operation": "remove" } } 
     ], 

希望它有帮助。

+0

谢谢,这解决了我在这个问题上遇到的两个问题之一。 – TxAG98

+0

我在日期字段中特别针对[另一个问题](http://stackoverflow.com/questions/38702959/edge-properties-clobbering-vertex-properties-in-orientdb-from-etl)发布了后续行为问题... – TxAG98

相关问题