2016-11-07 58 views
5

我有一个字符串列表,它看起来像这样:Python的大熊猫转换逗号分隔值的列表,数据帧

["Name: Alice, Department: HR, Salary: 60000", "Name: Bob, Department: Engineering, Salary: 45000"] 

我想这个列表转换成数据帧,看起来像这样:

Name | Department | Salary 
-------------------------- 
Alice | HR | 60000 

Bob | Engineering | 45000 

最简单的方法是什么? 我的直觉说丢数据到CSV,并与正则表达式单独标题“^ *:”,但必须有一个更简单的方法

+0

这是非常简单的。所以,在我们给你答案之前,你做了什么来自己找到答案? *提示:*这是一个以逗号分隔的k => v对的字符串数组(由':'分隔) – Fallenreaper

回答

8

随着一些字符串处理就可以得到类型的字典列表,并传递到数据帧的构造函数:

lst = ["Name: Alice, Department: HR, Salary: 60000", 
     "Name: Bob, Department: Engineering, Salary: 45000"] 
pd.DataFrame([dict([kv.split(': ') for kv in record.split(', ')]) for record in lst]) 
Out: 
    Department Name Salary 
0   HR Alice 60000 
1 Engineering Bob 45000 
3

你能做到这样:

In [271]: s 
Out[271]: 
['Name: Alice, Department: HR, Salary: 60000', 
'Name: Bob, Department: Engineering, Salary: 45000'] 

In [272]: pd.read_csv(io.StringIO(re.sub(r'\s*(Name|Department|Salary):\s*', r'', '~'.join(s))), 
    ...:    names=['Name','Department','Salary'], 
    ...:    header=None, 
    ...:    lineterminator=r'~' 
    ...:) 
    ...: 
Out[272]: 
    Name Department Salary 
0 Alice   HR 60000 
1 Bob Engineering 45000 
3

有点创意

s.str.extractall(r'(?P<key>[^,]+)\s*:(?P<value>[^,]+)') \ 
    .reset_index('match', drop=True) \ 
    .set_index('key', append=True).value.unstack() 

enter image description here

设置

l = ["Name: Alice, Department: HR, Salary: 60000", 
    "Name: Bob, Department: Engineering, Salary: 45000"] 
s = pd.Series(l)