2014-12-18 347 views
1

我有一个.csv文件,列值包含一些逗号。下面是例子:python csv模块读取csv按逗号分割,但忽略双引号或单引号内的逗号

Header: ID  Value   Content           Date 
     1  34    "market, business"        12/20/2013 
     2  15    "market, business", yesterday, metric   11/21/2014 
     3  18    "market," business and yesterday     10/20/2014 
     4  19    yesterday, today,        11/22/2014 

这是,如果我打开文本崇高的.csv文件格式,它出现在格式:

1, 34, "market, business", 12/20/2013 
2, 15, "market, business", "yesterday, metric, 11/21/2014 
3, 18, "market," business and yesterday, 10/20/2014 
4, 19, yesterday, today, 11/22/2014 

但我想是Python的csv后阅读器程序是:

[1, 34, "market, business", 12/20/2013] 
[2, 15, "market, business" "yesterday metric, 11/21/2014] 
[3, 18, "market," business and yesterday, 10/20/2014] 
[4, 19, yesterday today, 11/22/2014] 

这些是我刚样本数据,“内容”列是头痛这里原因CSV模块的用途“”作为分隔符,我用

reader = csv.reader(f, skipinitialspace=True) 

它适用于第一行,如果所有的字符串都在一个双引号内。但它不适用于第三和第二行,如果在引号外有逗号(单或双)

如何解决问题?我现在只是在python中使用传统的csv模块,“熊猫”有能力解决这个问题吗?

谢谢。

我做了一些更新,我想我要的是,方法在不同的地方指定逗号...... 现在我在这里贴似乎不合理的原因没有办法,我能找到的csv模块内部讲,从分离器的区别“,”和“,”在一个字段内。即使excel不能...

任何想法?

+0

看的 “相关问题” 到右侧列表。做任何这些回答你的问题? – kdopen

+0

请发布您的csv样本和所需的DataFrame。 – unutbu

+1

所需的Python列表会引发SyntaxErrors,因为有不匹配的引号和没有任何引号的字符串。请修复。 – unutbu

回答

1

如果我们可以假设

  • 每行开始用逗号分隔的两个整数,
  • 每一行与日期结束时,用逗号
  • 剩余的(在中间)的一切属于分离第三列

那么你的数据可以被分析是这样的:

data = list() 
with open('data') as f: 
    for line in f: 
     parts = line.split(',', 2) 
     parts[2:4] = parts[2].rsplit(',', 1) 
     parts[:2] = map(int, parts[:2]) 
     parts[2:] = map(str.strip, parts[2:]) 
     data.append(parts) 

for row in data: 
    print(row) 

产生

[1, 34, '"market, business"', '12/20/2013'] 
[2, 15, '"market, business", "yesterday, metric', '11/21/2014'] 
[3, 18, '"market," business and yesterday', '10/20/2014'] 
[4, 19, 'yesterday, today', '11/22/2014'] 

那么你可以做这样一个数据帧:

import pandas as pd 
df = pd.DataFrame(data, columns=['Id','Value','Content','Date']) 
print(df) 

产量

Id Value         Content  Date 
0 1  34      "market, business" 12/20/2013 
1 2  15 "market, business", "yesterday, metric 11/21/2014 
2 3  18  "market," business and yesterday 10/20/2014 
3 4  19      yesterday, today 11/22/2014 
相关问题