2017-06-01 117 views
0

这是大熊猫数据帧拆分列表分为单独的列

Feature 
Cricket:82379, Kabaddi:255, Reality:4751 
Cricket:15640, Wildlife:730 
LiveTV:13, Football:4129 
TalkShow:658, Cricket:7690 
Drama:5503, Cricket:3283, Reality:1345 

我的“功能”一栏,我想打板球的一列,并把价值82379.

类似在下面的链接 Splitting dictionary/list inside a Pandas Column into Separate Columns

+0

和你的代码在哪里? –

+0

检查https://stackoverflow.com/questions/44298024/python-split-a-string-field-into-3-separate-fields-using-lambda – Sanju

回答

2

假设你提到的情况有:

import pandas as pd 
df = pd.DataFrame.from_dict({'Freature':[{"Cricket":82379, "Kabaddi":255, "Reality":4751},{"Cricket":15640, "Wildlife":730},{"LiveTV":13, "Football":4129},{"TalkShow":658, "Cricket":7690},{"Drama":5503, "Cricket":3283, "Reality":1345}]}) 
df 

    Freature 
0 {u'Cricket': 82379, u'Kabaddi': 255, u'Reality... 
1 {u'Cricket': 15640, u'Wildlife': 730} 
2 {u'LiveTV': 13, u'Football': 4129} 
3 {u'TalkShow': 658, u'Cricket': 7690} 
4 {u'Drama': 5503, u'Cricket': 3283, u'Reality':... 

然后尝试:

df['Freature'].apply(pd.Series) 

输出将是:

Cricket Drama Football Kabaddi LiveTV Reality TalkShow Wildlife 
0 82379.0 NaN  NaN   255.0 NaN  4751.0 NaN   NaN 
1 15640.0 NaN  NaN   NaN  NaN  NaN  NaN   730.0 
2 NaN  NaN  4129.0  NaN  13.0 NaN  NaN   NaN 
3 7690.0 NaN  NaN   NaN  NaN  NaN  658.0  NaN 
4 3283.0 5503.0 NaN   NaN  NaN  1345.0 NaN   NaN 

更新:

转换与dict:

new_df = df['Freature'].apply(pd.Series) 
result = dict((column, list(new_df[column].dropna())) for column in new_df.columns) 
result 

结果的输出将是一个字典:

{'Cricket': [82379.0, 15640.0, 7690.0, 3283.0], 
'Drama': [5503.0], 
'Football': [4129.0], 
'Kabaddi': [255.0], 
'LiveTV': [13.0], 
'Reality': [4751.0, 1345.0], 
'TalkShow': [658.0], 
'Wildlife': [730.0]} 

如果Freature内容字符串:

import pandas as pd 
df = pd.DataFrame.from_dict({'Freature':["Cricket:82379, Kabaddi:255, Reality:4751","Cricket:15640, Wildlife:730","LiveTV:13, Football:4129","TalkShow:658, Cricket:7690","Drama:5503, Cricket:3283, Reality:1345"]}) 
df 

    Freature 
0 Cricket:82379, Kabaddi:255, Reality:4751 
1 Cricket:15640, Wildlife:730 
2 LiveTV:13, Football:4129 
3 TalkShow:658, Cricket:7690 
4 Drama:5503, Cricket:3283, Reality:1345 

然后你可以将它们转换为字典是这样的:

for i in range(len(df)): 
    print(dict((e.strip().split(":")[0],int(e.strip().split(":")[1])) for e in df.iloc[i].Freature.split(","))) 

它将打印所有转换字典:

{'Cricket': 82379, 'Kabaddi': 255, 'Reality': 4751} 
{'Cricket': 15640, 'Wildlife': 730} 
{'LiveTV': 13, 'Football': 4129} 
{'TalkShow': 658, 'Cricket': 7690} 
{'Drama': 5503, 'Cricket': 3283, 'Reality': 1345} 
+0

但如何将该列项目转换为单个字典? 我在那个特性列中有2十万行 –

+0

@RajeevKumarSahu检查更新的答案 –

+0

我在询问第一步,并且您正在解释后面的步骤。 我在问 板球:82379,Kabaddi:255,现实:4751 如何编写将此字符串转换为字典的代码? –