2013-06-11 15 views
7

我有一堆需要加载到MySQL数据库中的CSV数据。那么,也许是CSV-ISH。 (编辑actually, it looks like the stuff described in RFC 4180从双引号用作转义字符的CSV文件中加载数据

每行是逗号分隔双引号字符串的列表。为了避免出现在列值内的任何双引号,使用双重双引号。反斜杠允许表示自己。

例如,该行:

"", "\wave\", ""hello,"" said the vicar", "what are ""scare-quotes"" good for?", "I'm reading ""Bossypants""" 

如果解析成JSON应该是:

[ "", "\\wave\\", "\"hello,\" said the vicar", "what are \"scare-quotes\" good for?", "I'm reading \"Bossypants\"" ] 

我试图使用LOAD DATA阅读的CSV,但我跑陷入一些奇怪的行为。


作为一个例子,考虑如果我有一个简单的两列的表

shell% mysql exampledb -e "describe person" 
+-------+-----------+------+-----+---------+-------+ 
| Field | Type  | Null | Key | Default | Extra | 
+-------+-----------+------+-----+---------+-------+ 
| ID | int(11) | YES |  | NULL |  | 
| UID | char(255) | YES |  | NULL |  | 
+-------+-----------+------+-----+---------+-------+ 
shell% 

如果我的输入文件的第一非报头行上""结束:

shell% cat temp-1.csv 
"ID","UID" 
"9","" 
"0","Steve the Pirate" 
"1","\Alpha" 
"2","Hoban ""Wash"" Washburne" 
"3","Pastor Veal" 
"4","Tucker" 
"10","" 
"5","Simon" 
"6","Sonny" 
"7","Wat\" 

我可以加载每个非标题行但第一个:

mysql> DELETE FROM person; 
Query OK, 0 rows affected (0.00 sec) 

mysql> LOAD DATA 
      LOCAL INFILE 'temp-1.csv' 
      INTO TABLE person 
      FIELDS 
      TERMINATED BY ',' 
      ENCLOSED BY '"' 
      ESCAPED BY '"' 
      LINES 
      TERMINATED BY '\n' 
      IGNORE 1 LINES 
     ; 
Query OK, 9 rows affected (0.00 sec) 
Records: 9 Deleted: 0 Skipped: 0 Warnings: 0 

mysql> SELECT * FROM person; 
+------+------------------------+ 
| ID | UID     | 
+------+------------------------+ 
| 0 | Steve the Pirate  | 
| 10 |      | 
| 1 | \Alpha     | 
| 2 | Hoban "Wash" Washburne | 
| 3 | Pastor Veal   | 
| 4 | Tucker     | 
| 5 | Simon     | 
| 6 | Sonny     | 
| 7 | Wat\     | 
+------+------------------------+ 
9 rows in set (0.00 sec) 

或者我可以加载所有线路包括标题:

mysql> DELETE FROM person; 
Query OK, 9 rows affected (0.00 sec) 

mysql> LOAD DATA 
      LOCAL INFILE 'temp-1.csv' 
      INTO TABLE person 
      FIELDS 
      TERMINATED BY ',' 
      ENCLOSED BY '"' 
      ESCAPED BY '"' 
      LINES 
      TERMINATED BY '\n' 
      IGNORE 0 LINES 
     ; 
Query OK, 11 rows affected, 1 warning (0.01 sec) 
Records: 11 Deleted: 0 Skipped: 0 Warnings: 1 

mysql> show warnings; 
+---------+------+--------------------------------------------------------+ 
| Level | Code | Message            | 
+---------+------+--------------------------------------------------------+ 
| Warning | 1366 | Incorrect integer value: 'ID' for column 'ID' at row 1 | 
+---------+------+--------------------------------------------------------+ 
1 row in set (0.00 sec) 

mysql> SELECT * FROM person; 
+------+------------------------+ 
| ID | UID     | 
+------+------------------------+ 
| 0 | UID     | 
| 9 |      | 
| 0 | Steve the Pirate  | 
| 10 |      | 
| 1 | \Alpha     | 
| 2 | Hoban "Wash" Washburne | 
| 3 | Pastor Veal   | 
| 4 | Tucker     | 
| 5 | Simon     | 
| 6 | Sonny     | 
| 7 | Wat\     | 
+------+------------------------+ 
11 rows in set (0.00 sec) 

如果我输入文件结束对""没有台词:

shell% cat temp-2.csv 
"ID","UID" 
"0","Steve the Pirate" 
"1","\Alpha" 
"2","Hoban ""Wash"" Washburne" 
"3","Pastor Veal" 
"4","Tucker" 
"5","Simon" 
"6","Sonny" 
"7","Wat\" 

然后我可以加载没有台词:

mysql> DELETE FROM person; 
Query OK, 11 rows affected (0.00 sec) 

mysql> LOAD DATA 
      LOCAL INFILE 'temp-2.csv' 
      INTO TABLE person 
      FIELDS 
      TERMINATED BY ',' 
      ENCLOSED BY '"' 
      ESCAPED BY '"' 
      LINES 
      TERMINATED BY '\n' 
      IGNORE 1 LINES 
     ; 
Query OK, 0 rows affected (0.00 sec) 
Records: 0 Deleted: 0 Skipped: 0 Warnings: 0 

mysql> SELECT * FROM person; 
Empty set (0.00 sec) 

或者我可以加载包括标题在内的所有行:

mysql> DELETE FROM person; 
Query OK, 0 rows affected (0.00 sec) 

mysql> LOAD DATA 
      LOCAL INFILE 'temp-2.csv' 
      INTO TABLE person 
      FIELDS 
      TERMINATED BY ',' 
      ENCLOSED BY '"' 
      ESCAPED BY '"' 
      LINES 
      TERMINATED BY '\n' 
      IGNORE 0 LINES 
     ; 
Query OK, 9 rows affected, 1 warning (0.03 sec) 
Records: 9 Deleted: 0 Skipped: 0 Warnings: 1 

mysql> show warnings; 
+---------+------+--------------------------------------------------------+ 
| Level | Code | Message            | 
+---------+------+--------------------------------------------------------+ 
| Warning | 1366 | Incorrect integer value: 'ID' for column 'ID' at row 1 | 
+---------+------+--------------------------------------------------------+ 
1 row in set (0.00 sec) 

mysql> SELECT * FROM person; 
+------+------------------------+ 
| ID | UID     | 
+------+------------------------+ 
| 0 | UID     | 
| 0 | Steve the Pirate  | 
| 1 | \Alpha     | 
| 2 | Hoban "Wash" Washburne | 
| 3 | Pastor Veal   | 
| 4 | Tucker     | 
| 5 | Simon     | 
| 6 | Sonny     | 
| 7 | Wat\     | 
+------+------------------------+ 
9 rows in set (0.00 sec) 

因此,现在我发现了很多错误的方法,我怎样才能使用LOAD DATA将这些文件中的数据导入到我的数据库中?

回答

15

根据the documentation for LOAD DATA, treating doubled double quotes as a double quote is the default

如果该字段使用所附BY字符开始,该字符的情况下被识别为终止仅当随后的场或行终止BY序列的字段值。为了避免歧义,字段值中的ENCLOSED BY字符的出现可以加倍并且被解释为字符的单个实例。例如,如果ENCLOSED BY““”被指定,引号的处理方式如下所示:

"The ""BIG"" boss" -> The "BIG" boss 
The "BIG" boss  -> The "BIG" boss 
The ""BIG"" boss -> The ""BIG"" boss 

因此,所有我需要做的是禁用解释\作为转义字符,使用ESCAPED BY ''

LOAD DATA 
    LOCAL INFILE 'temp-1.csv' 
    INTO TABLE person 
    FIELDS 
    TERMINATED BY ',' 
    ENCLOSED BY '"' 
    ESCAPED BY '' 
    LINES 
    TERMINATED BY '\n' 
    IGNORE 1 LINES 
; 
+0

+1您的建议帮助我解决了另一个问题。我在csv中用双引号括住了所有的字段,并且如果该字段为空,那么csv将只包含空的两个双引号“” - 它假定它是转义字符,而我的数据导入命令不起作用。把ESCAPED BY''完成了这项工作。谢谢。 – Aakash

+0

我的数据完全是rfc 4180,因为没有逃逸角色。如果在一个字段中有一个逗号,那么它必须用双引号括起来。在这种情况下是否使'ESCAPED BY''工作? – CMCDragonkai

相关问题