2017-07-25 133 views
1

我有记录的CSV:如何创建熊猫分类索引记录列表?

name,credits,email 
bob,,[email protected] 
bob,6.0,[email protected] 
bill,3.0,[email protected] 
bill,4.0,[email protected] 
tammy,5.0,[email protected] 

其中name是该指数。因为有相同名称的多个记录,我想整个行(减去名称)卷成列表创建窗体的JSON:

{ 
    "bob": [ 
     { "credits": null, "email": "[email protected]"}, 
     { "credits": 6.0, "email": "[email protected]" } 
    ], 
    // ... 
} 

我目前的解决方案是有点kludgey因为它似乎用大熊猫仅作为阅读CSV的工具,但仍然是产生预期的我输出JSONish:

#!/usr/bin/env python3 

import io 
import pandas as pd 
from pprint import pprint 
from collections import defaultdict 

def read_data(): 
    s = """name,credits,email 
bob,,[email protected] 
bob,6.0,[email protected] 
bill,3.0,[email protected] 
bill,4.0,[email protected] 
tammy,5.0,[email protected] 
""" 

    data = io.StringIO(s) 
    return pd.read_csv(data) 

if __name__ == "__main__": 
    df = read_data() 
    columns = df.columns 
    index_name = "name" 
    print(df.head()) 

    records = defaultdict(list) 

    name_index = list(columns.values).index(index_name) 
    columns_without_index = [column for i, column in enumerate(columns) if i != name_index] 

    for record in df.values: 
     name = record[name_index] 
     record_without_index = [field for i, field in enumerate(record) if i != name_index] 
     remaining_record = {k: v for k, v in zip(columns_without_index, record_without_index)} 
     records[name].append(remaining_record) 
    pprint(dict(records)) 

有没有办法做到在本地大熊猫(和numpy的)是一回事吗?

回答

4

这就是你想要的吗?

cols = df.columns.drop('name').tolist() 

或依@jezrael:

cols = df.columns.difference(['name']) 

然后:

s = df.groupby('name')[cols].apply(lambda x: x.to_dict('r')).to_json() 

让打印好听:

In [45]: print(json.dumps(json.loads(s), indent=2)) 
{ 
    "bill": [ 
    { 
     "credits": 3.0, 
     "email": "[email protected]" 
    }, 
    { 
     "credits": 4.0, 
     "email": "[email protected]" 
    } 
    ], 
    "bob": [ 
    { 
     "credits": null, 
     "email": "[email protected]" 
    }, 
    { 
     "credits": 6.0, 
     "email": "[email protected]" 
    } 
    ], 
    "tammy": [ 
    { 
     "credits": 5.0, 
     "email": "[email protected]" 
    } 
    ] 
} 
+0

差不多!如果我不需要明确列出“groupby”后面的列,那很好,但我认为这很简单。 – erip

+0

@erip,我已更新我的文章 - 请检查... – MaxU

+0

完美!非常感谢你的帮助! – erip