我想把这个csv文件分成一个2D列表。目前我的代码存在的问题是,它会在数据中用引号将行中的几个字段切断。在那里有引用来表示内部的逗号不是逗号分隔字段的一部分,实际上是字段的一部分。我发布了代码,示例数据和示例输出。由于引号,您可以看到第一个输出行如何跳过几个字段。我需要用正则表达式来做什么?感谢您提前提供任何帮助。用Python中的逗号和引号分隔字段?
下面的代码的一个切口:
import sys
import re
import time
# get the date
date = time.strftime("%x")
# function for reading in each line of file
# returns array of each line
def readIn(file):
array = []
for line in file:
array.append(line)
return array
def main():
data = open(sys.argv[1], "r")
template = open(sys.argv[2], "r")
output = open(sys.argv[3], "w")
finalL = []
dataL = []
dataL = readIn(data)
templateL = []
templateL = readIn(template)
costY = 0
dateStr = ""
# split each line in the data by the comma unless there are quotes
for i in range(0, len(dataL)):
if '"' in dataL[i]:
Pattern = re.compile(r'''((?:[^,"']|"[^"]*"|'[^']*')+)''')
dataL[i] = Pattern.split(dataL[i])[1::2]
for j in range(0, len(dataL[i])):
dataL[i][j] = dataL[i][j].strip()
else:
temp = dataL[i].strip().split(",")
dataL[i] = temp
数据例如:
OrgLevel3: ATHLET ,,,,,,,,
,,,,,,,,
Name,,,Calls,,Duration,Cost ($),,
,,,,,,,,
ATHLET Direct,,,"1,312 ",,62:58:18,130.62 ,,
,,,,,,,,
Grand Total for ATHLET:,,,"1,312 ",,62:58:18,130.62 ,,
,,,,,,,,
OrgLevel3: BOOK ,,,,,,,,
,,,,,,,,
Name,,,Calls,,Duration,Cost ($),,
,,,,,,,,
BOOK Direct,,,434 ,,14:59:18,28.09 ,,
,,,,,,,,
Grand Total for BOOK:,,,434 ,,14:59:18,28.09 ,,
,,,,,,,,
OrgLevel3: CARD ,,,,,,,,
,,,,,,,,
Name,,,Calls,,Duration,Cost ($),,
,,,,,,,,
CARD Direct,,,253 ,,09:02:54,14.30 ,,
,,,,,,,,
Grand Total for CARD:,,,253 ,,09:02:54,14.30 ,,
输出示例:
['Grand Total for ATHLET:', '"1,312 "', '62:58:18', '130.62', '']
['Grand Total for BOOK:', '', '', '434 ', '', '14:59:18', '28.09 ', '', '']
['Grand Total for CARD:', '', '', '253 ', '', '09:02:54', '14.30 ', '', '']
你看过['csv'](http://docs.python.org/2/library/csv.html)模块吗? –
您的“数据示例”是输入数据还是输出数据? –
是输入数据是示例数据。我被告知不要使用csv模块。我尝试使用当前的代码,但改变了Pattern = re.compile(r'''((?:[^,''] |“[^”] *“|'[^'] *')+ )''') – italianmoses