from io import StringIO
import pandas as pd
s = '''ID,Level,QID,Text,ResponseID,responseText,date_key,last
375280046,S,D3M,Which is your favorite?,D5M0,option 1,2012-08-08 00:00:00,ynot
375280046,S,D3M,How often? (at home, at work, other),D3M0,Work,2010-03-31 00:00:00,okkk
375280046,M,A78,Do you prefer a, b, or c?,A78C,a,2010-03-31 00:00:00,abc
376918925,M,A78,Which ONE (select only one),A78E,Milk,2004-02-02 00:00:00,launch Wed., '''
df = pd.read_csv(StringIO(s), sep=r',(?!\s)')
问题:我问一个问题here。但我遇到了一个新问题。注意在最后一行的末尾,它是一个逗号和一个空格。 sep=r',(?!\s)'
中的正则表达式应该忽略后跟空格的逗号。
问题:有没有办法读的最后一列字面上launch Wed.,
,其中逗号不是一个分离器/分隔符,但实际上就是在last
列文本逗号 - 只使用pd.read_csv
?
错误:
ValueError: Expected 8 fields in line 5, saw 9. Error could possibly be due to quotes being ignored when a multi-char delimiter is used.
预期/所需的输出:
ID Level QID Text ResponseID \
0 375280046 S D3M Which is your favorite? D5M0
1 375280046 S D3M How often? (at home, at work, other) D3M0
2 375280046 M A78 Do you prefer a, b, or c? A78C
3 376918925 M A78 Which ONE (select only one) A78E
responseText date_key last
0 option 1 2012-08-08 00:00:00 ynot
1 Work 2010-03-31 00:00:00 okkk
2 a 2010-03-31 00:00:00 abc
3 Milk 2004-02-02 00:00:00 launch Wed.,
有趣!该文件说:“只匹配字符串的末尾”。我想这意味着熊猫读一行,这是允许'\ Z'工作。 – Jarad
@Jarad我只是在上面的字符串的背景下看这个,当我写这个,它是最后一个字符,但是,看起来像。 '$'也可以工作,即使问题行不在最后(即使行结束时行仍然引起问题),“\ Z”也可以工作。 – EFT