2013-04-11 42 views
3

我从旧系统导出了一些粗糙的mySQL CSV导出,我正在解析并加载到新的Ruby on Rails应用程序中。来自mySQL导出字段的CSV空白行在ruby CSV解析过程中导致错误

下面是一个例子:

"1","1","When a ticket is marked as Resolved, revert the assigned_to to the one who started it",,"7","1","1.00","0.00","2",NULL,NULL,"1","2009-06-04 16:40:37","2009-06-04 16:40:37",NULL,"0000-00-00 00:00:00";"2","2","Email notifications when ticket is assigned to someone",,"1","1","1.00","0.00","1",NULL,NULL,"1","2009-06-04 16:41:21","2009-06-04 16:41:21",NULL,"0000-00-00 00:00:00";"3","1","When a ticket is marked as Resolved, revert the assigned_to to the one who started it - and notify",,"7","1","1.00","0.00","2",NULL,NULL,"1","2009-06-09 18:10:47","2009-06-09 18:10:47",NULL,"0000-00-00 00:00:00";"4","3","Change Password Capability","Fix the forgot password capability (and for bonus points, add capability for user to change once logged in.","7","1","0.00","0.00","1",NULL,NULL,"9","2009-06-09 18:13:45","2009-06-09 18:13:45",NULL,"0000-00-00 00:00:00";"5","4","Manager View","Don't need listed: 
    Milestone 
    Status 

Do need listed: 
    Assigned To 
    Position (since we're not assigning case numbers)","7","1","0.00","0.00","1",NULL,NULL,NULL,"2009-06-09 18:16:32","2009-06-09 18:16:32",NULL,"0000-00-00 00:00:00";"6","5","TICKETS: Remove Position/Assign ID","Don't really need to assign a position, instead would be better to automatically assign a ticket number and be able to sort on that. 

Also, when you don't assign a position to a ticket, it breaks the system (or at least it doesn't show up and causes an error in the Manager View)","7","1","0.00","0.00","1",NULL,NULL,"9","2009-06-09 18:19:10","2009-06-09 18:19:10",NULL,"0000-00-00 00:00:00";"7","6","Manager View","Don't need listed: 
- Milestone 
- Status 

Do need listed: 
- Case ID (preferred) 
- Position (until case id implemented)","7","1","0.00","0.00","1",NULL,NULL,"9","2009-06-09 18:24:07","2009-06-09 18:24:07",NULL,"0000-00-00 00:00:00";"8","5","TICKETS: Remove Position/Assign ID","Don't really need to assign a position, instead would be better to automatically assign a ticket number and be able to sort on that. 

Also, when you don't assign a position to a ticket, it breaks the system (or at least it doesn't show up and causes an error in the Manager View)","7","1","0.00","0.00","1",NULL,NULL,NULL,"2009-06-09 18:35:00","2009-06-09 18:35:00",NULL,"0000-00-00 00:00:00";"9","7","Ability to \"assign\" projects to users","Some way, even manual in the database, to indicate which projects a user may access","7","1","0.00","0.00","1",NULL,NULL,"9","2009-06-09 18:45:16","2009-06-09 18:45:16",NULL,"0000-00-00 00:00:00"; 

这些字段由双引号,用逗号终止,并且行以分号结束。正如你所希望看到的那样,特定的领域在这个领域内有很好的回报(?)。这就是它们如何显示在CSV文件中,而不是作为新行字符。

我的Ruby测试代码来解析CSV:

csv_file_path1 = 'data/file.csv' 

    CSV.foreach(csv_file_path2, { :row_sep => ";" }) do |row| 
    puts row[1] 
    end 

当我通过rake任务运行它,我得到的输出:

1 
2 
3 
4 
5 
6 
7 
8 
rake aborted! 
Missing or stray quote in line 9 
... 

为什么不能通过有硬行解析在现场返回?谢谢。

编辑:更新后显示更多的CSV。

回答

2

在这种情况下的问题是双引号转义,而不是换行符。您有一个字段,其中包含字符串\"assign\",而应将其转义为""assign""。进行此更改会导致以下内容正常运行:

require 'csv' 
CSV.parse(DATA, :row_sep => ";") do |row| 
    puts row 
end 

__END__ 
"1","1","When a ticket is marked as Resolved, revert the assigned_to to the one who started it",,"7","1","1.00","0.00","2",NULL,NULL,"1","2009-06-04 16:40:37","2009-06-04 16:40:37",NULL,"0000-00-00 00:00:00";"2","2","Email notifications when ticket is assigned to someone",,"1","1","1.00","0.00","1",NULL,NULL,"1","2009-06-04 16:41:21","2009-06-04 16:41:21",NULL,"0000-00-00 00:00:00";"3","1","When a ticket is marked as Resolved, revert the assigned_to to the one who started it - and notify",,"7","1","1.00","0.00","2",NULL,NULL,"1","2009-06-09 18:10:47","2009-06-09 18:10:47",NULL,"0000-00-00 00:00:00";"4","3","Change Password Capability","Fix the forgot password capability (and for bonus points, add capability for user to change once logged in.","7","1","0.00","0.00","1",NULL,NULL,"9","2009-06-09 18:13:45","2009-06-09 18:13:45",NULL,"0000-00-00 00:00:00";"5","4","Manager View","Don't need listed: 
    Milestone 
    Status 

Do need listed: 
    Assigned To 
    Position (since we're not assigning case numbers)","7","1","0.00","0.00","1",NULL,NULL,NULL,"2009-06-09 18:16:32","2009-06-09 18:16:32",NULL,"0000-00-00 00:00:00";"6","5","TICKETS: Remove Position/Assign ID","Don't really need to assign a position, instead would be better to automatically assign a ticket number and be able to sort on that. 

Also, when you don't assign a position to a ticket, it breaks the system (or at least it doesn't show up and causes an error in the Manager View)","7","1","0.00","0.00","1",NULL,NULL,"9","2009-06-09 18:19:10","2009-06-09 18:19:10",NULL,"0000-00-00 00:00:00";"7","6","Manager View","Don't need listed: 
- Milestone 
- Status 

Do need listed: 
- Case ID (preferred) 
- Position (until case id implemented)","7","1","0.00","0.00","1",NULL,NULL,"9","2009-06-09 18:24:07","2009-06-09 18:24:07",NULL,"0000-00-00 00:00:00";"8","5","TICKETS: Remove Position/Assign ID","Don't really need to assign a position, instead would be better to automatically assign a ticket number and be able to sort on that. 

Also, when you don't assign a position to a ticket, it breaks the system (or at least it doesn't show up and causes an error in the Manager View)","7","1","0.00","0.00","1",NULL,NULL,NULL,"2009-06-09 18:35:00","2009-06-09 18:35:00",NULL,"0000-00-00 00:00:00";"9","7","Ability to ""assign"" projects to users","Some way, even manual in the database, to indicate which projects a user may access","7","1","0.00","0.00","1",NULL,NULL,"9","2009-06-09 18:45:16","2009-06-09 18:45:16",NULL,"0000-00-00 00:00:00"; 
+0

我也使用1.9.3。我试图从CSV中提取出问题区域,其中有几十个数千行。让我试着用一个更好的例子来更新这个问题。 – dremme 2013-04-11 20:10:19

+0

我更新了一个更好的例子,我认为 – dremme 2013-04-11 20:17:27

+0

它在这个例子中的''能够为用户分配项目''的转义引用中窒息。我还没有解决方案,但删除该字段中的转义引号会导致它解析。 – 2013-04-11 20:56:57