平展实体 - 属性 - 值（EAV）模式

我在实体 - 属性 - 值格式的东西有一个CSV文件（即，我event_id是非唯一的和重复ķ倍在ķ相关的属性）：平展实体 - 属性 - 值（EAV）模式

event_id, attribute_id, value 
    1, 1, a 
    1, 2, b 
    1, 3, c 
    2, 1, a 
    2, 2, b 
    2, 3, c 
    2, 4, d

是否有任何方便的技巧变换可变数量的属性（即，行）转换成列？这里的关键是输出应该是结构化数据表格m = max（k）;在缺少的属性填充NULL将是最佳的：

event_id, 1, 2, 3, 4 
    1, a, b, c, null 
    2, a, b, c, d

我的计划是：（1）CSV转换成JSON对象，看起来像这样：

data = [{'value': 'a', 'id': '1', 'event_id': '1', 'attribute_id': '1'}, 
    {'value': 'b', 'id': '2', 'event_id': '1', 'attribute_id': '2'}, 
    {'value': 'a', 'id': '3', 'event_id': '2', 'attribute_id': '1'}, 
    {'value': 'b', 'id': '4', 'event_id': '2', 'attribute_id': '2'}, 
    {'value': 'c', 'id': '5', 'event_id': '2', 'attribute_id': '3'}, 
    {'value': 'd', 'id': '6', 'event_id': '2', 'attribute_id': '4'}]

（2）提取独特的事件ID ：

events = set() 
    for item in data: 
     events.add(item['event_id'])

（3）创建列表的列表，其中每个内部列表是相应父级事件的属性列表。

attributes = [[k['value'] for k in j] for i, j in groupby(data, key=lambda x: x['event_id'])]

（4）创建带来的事件和属性组合在一起的字典：

event_dict = dict(zip(events, attributes))

，看起来像这样：

{'1': ['a', 'b'], '2': ['a', 'b', 'c', 'd']}

我不知道如何让所有内部列表根据需要填入NULL值。这似乎是需要在步骤（3）中完成的事情。另外，创建n列表中的整个mNULL值已经超出了我的想法，然后遍历每个列表并使用attribute_id作为列表位置填充值;但那看起来很笨拙。

来源

2015-04-15 Scott Hoover

你的基本想法似乎是正确的，但如下我会实现它：

import itertools 
import csv 

events = {} # we're going to keep track of the events we read in 
with open('path/to/input') as infile: 
    for event, _att, val in csv.reader(infile): 
     if event not in events: 
      events[event] = [] 
     events[int(event)].append(val) # track all the values for this event 

maxAtts = max(len(v) for _k,v in events.items()) # the maximum number of attributes for any event 
with open('path/to/output', 'w') as outfile): 
    writer = csv.writer(outfile) 
    writer.writerow(["event_id"] + list(range(1, maxAtts+1))) # write out the header row 
    for k in sorted(events): # let's look at the events in sorted order 
     writer.writerow([k] + events[k] + ['null']*(maxAtts-len(events[k]))) # write out the event id, all the values for that event, and pad with "null" for any attributes without values

来源

2015-04-16 01:37:34 inspectorG4dget

平展实体 - 属性 - 值（EAV）模式

回答

相关问题