2013-11-21 46 views
1

我想把这个csv文件分成一个2D列表。目前我的代码存在的问题是,它会在数据中用引号将行中的几个字段切断。在那里有引用来表示内部的逗号不是逗号分隔字段的一部分,实际上是字段的一部分。我发布了代码,示例数据和示例输出。由于引号,您可以看到第一个输出行如何跳过几个字段。我需要用正则表达式来做什么?感谢您提前提供任何帮助。用Python中的逗号和引号分隔字段?

下面的代码的一个切口:

import sys 
import re 
import time 

# get the date 
date = time.strftime("%x") 


# function for reading in each line of file 
# returns array of each line 
def readIn(file): 
    array = [] 
    for line in file: 
     array.append(line) 
    return array 


def main(): 
    data = open(sys.argv[1], "r") 
    template = open(sys.argv[2], "r") 
    output = open(sys.argv[3], "w") 

    finalL = [] 

    dataL = [] 
    dataL = readIn(data) 

    templateL = [] 
    templateL = readIn(template) 

    costY = 0 
    dateStr = "" 

    # split each line in the data by the comma unless there are quotes 
    for i in range(0, len(dataL)): 
     if '"' in dataL[i]: 
      Pattern = re.compile(r'''((?:[^,"']|"[^"]*"|'[^']*')+)''') 
      dataL[i] = Pattern.split(dataL[i])[1::2] 
      for j in range(0, len(dataL[i])): 
       dataL[i][j] = dataL[i][j].strip() 
     else:  
      temp = dataL[i].strip().split(",") 
      dataL[i] = temp 

数据例如:

OrgLevel3: ATHLET ,,,,,,,, 
,,,,,,,, 
Name,,,Calls,,Duration,Cost ($),, 
,,,,,,,, 
ATHLET Direct,,,"1,312 ",,62:58:18,130.62 ,, 
,,,,,,,, 
Grand Total for ATHLET:,,,"1,312 ",,62:58:18,130.62 ,, 
,,,,,,,, 
OrgLevel3: BOOK ,,,,,,,, 
,,,,,,,, 
Name,,,Calls,,Duration,Cost ($),, 
,,,,,,,, 
BOOK Direct,,,434 ,,14:59:18,28.09 ,, 
,,,,,,,, 
Grand Total for BOOK:,,,434 ,,14:59:18,28.09 ,, 
,,,,,,,, 
OrgLevel3: CARD ,,,,,,,, 
,,,,,,,, 
Name,,,Calls,,Duration,Cost ($),, 
,,,,,,,, 
CARD Direct,,,253 ,,09:02:54,14.30 ,, 
,,,,,,,, 
Grand Total for CARD:,,,253 ,,09:02:54,14.30 ,, 

输出示例:

['Grand Total for ATHLET:', '"1,312 "', '62:58:18', '130.62', ''] 
['Grand Total for BOOK:', '', '', '434 ', '', '14:59:18', '28.09 ', '', ''] 
['Grand Total for CARD:', '', '', '253 ', '', '09:02:54', '14.30 ', '', ''] 
+5

你看过['csv'](http://docs.python.org/2/library/csv.html)模块吗? –

+0

您的“数据示例”是输入数据还是输出数据? –

+0

是输入数据是示例数据。我被告知不要使用csv模块。我尝试使用当前的代码,但改变了Pattern = re.compile(r'''((?:[^,''] |“[^”] *“|'[^'] *')+ )''') – italianmoses

回答

0

如果你想为CSV加载到一个列表,则整个代码这样做是:

import csv 

with open(sys.argv[1]) as data: 
    dataL = list(csv.reader(data)) 

如果您的示例数据是输入数据,则需要手工先于其他工作......,如:

dataL = [row for row in csv.reader(data) if row[0].startswith('Grand Total for')] 
+1

如果OP没有输入数据,OP为什么会提供示例数据? –

+0

@matti同样的原因,他们没有听说过'csv'模块?此外 - 更新显示过滤“输入” –

+0

我很欣赏csv模块选项,但我被告知不要使用csv。我试图让Pattern = re.compile(r'''((?:[^,''] |“[^”] *“|'[^'] *')+)''')行对于第一行例子输出 – italianmoses