2017-08-03 139 views
1

我有一个字典的列表,列表中的每个字典都有一个字符串格式和一个键的时间戳。一个特定的键可以在列表中重复多次。我只想保留带有最新时间戳的键的字典,并从列表中删除/删除所有其他字典。我已经实现了soluion的一种方法是使用另一个变量并循环遍历所有的键并与现有的键进行比较。排序和从列表中删除Python

有没有更好的方式使用列表理解或itertools或任何其他方式

这里来解决这个问题是取样输入数据

data = [ 
    {'key': 'key1', 'timestamp': '2017-08-03T10:24:21.762278'}, 
    {'key': 'key2', 'timestamp': '2017-08-03T10:24:22.762278'}, 
    {'key': 'key1', 'timestamp': '2017-08-03T10:24:23.762278'}, 
    {'key': 'key2', 'timestamp': '2017-08-03T10:24:19.762278'}, 
    {'key': 'key3', 'timestamp': '2017-08-03T10:24:25.762278'}, 
    {'key': 'key2', 'timestamp': '2017-08-03T10:24:11.762278'}, 
    {'key': 'key1', 'timestamp': '2017-08-03T10:24:45.762278'}, 
    {'key': 'key4', 'timestamp': '2017-08-03T10:24:39.762278'} 
] 

这里是被期待作为输出

data = [ 
    {'key': 'key3', 'timestamp': '2017-08-03T10:24:25.762278'}, 
    {'key': 'key2', 'timestamp': '2017-08-03T10:24:22.762278'}, 
    {'key': 'key1', 'timestamp': '2017-08-03T10:24:45.762278'}, 
    {'key': 'key4', 'timestamp': '2017-08-03T10:24:39.762278'} 
] 

我在Python中的实现如下

from dateutil.parser import parse 
def sort_and_eliminate(data): 
    processed_data = {} 
    for cur_item in data: 
     key = cur_item.get('key') 
     if key not in processed_data: 
      processed_data[key] = cur_item 
     else: 
      ex_item = processed_data.get(key) 
      ex_ts = parse(ex_item.get("timestamp")) 
      cur_ts = parse(cur_item.get("timestamp")) 
      if cur_ts > ex_ts: 
       processed_data[key] = cur_item 
    return processed_data.values() 

有没有更好的方法来解决这个问题,使用列表理解或itertools或任何其他方式

回答

0
from datetime import datetime 
from operator import itemgetter 
from itertools import groupby 
from dateutil.parser import parse 

expected = [ 
    {'key': 'key3', 'timestamp': '2017-08-03T10:24:25.762278'}, 
    {'key': 'key2', 'timestamp': '2017-08-03T10:24:22.762278'}, 
    {'key': 'key1', 'timestamp': '2017-08-03T10:24:45.762278'}, 
    {'key': 'key4', 'timestamp': '2017-08-03T10:24:39.762278'} 
] 

data = [ 
    {'key': 'key1', 'timestamp': '2017-08-03T10:24:21.762278'}, 
    {'key': 'key2', 'timestamp': '2017-08-03T10:24:22.762278'}, 
    {'key': 'key1', 'timestamp': '2017-08-03T10:24:23.762278'}, 
    {'key': 'key2', 'timestamp': '2017-08-03T10:24:19.762278'}, 
    {'key': 'key3', 'timestamp': '2017-08-03T10:24:25.762278'}, 
    {'key': 'key2', 'timestamp': '2017-08-03T10:24:11.762278'}, 
    {'key': 'key1', 'timestamp': '2017-08-03T10:24:45.762278'}, 
    {'key': 'key4', 'timestamp': '2017-08-03T10:24:39.762278'} 
] 


# alt way without dateutil 
def dtconv(s): 
    return datetime.strptime(s, "%Y-%m-%dT%H:%M:%S.%f") 

ds = sorted(data, key=lambda x: (x['key'], parse(x['timestamp'])), reverse=True) 

result = [] 
for grouper, group in groupby(ds, key=itemgetter('key')): 
    result.append(next(group)) 

print("result:") 
for r in result: 
    print(r) 

print("expected") 
for e in expected: 
    print(e) 

# demonstrate it's equal to expected value 
print(sorted(result, key=itemgetter('key')) == sorted(expected, key=itemgetter('key'))) 

尝试使用key和datestamp对列表进行排序。然后你可以做一个groupby并采取第一个元素,那就是你想要保留的。

+0

即使这是真的,它将需要更多时间与问题 – akashdeep

+0

@akashdeep中提供的实现相比。理由要清楚得多,也容易理解。 OP要求提供更好的解决方案,但这并不一定意味着它必须更快。几乎没有理由拒绝投票,但这是你的特权。 –

+0

另请考虑最后两种用于演示目的。我希望你没有在你的时间包括那些? –

1

这是一种方法。

根据键和时间戳对字典进行排序。

x=sorted(data, key=lambda k: (k['key'],k['timestamp']), reverse=True) 
print(x) 

[{'key': 'key4', 'timestamp': '2017-08-03T10:24:39.762278'}, 
{'key': 'key3', 'timestamp': '2017-08-03T10:24:25.762278'}, 
{'key': 'key2', 'timestamp': '2017-08-03T10:24:22.762278'}, 
{'key': 'key2', 'timestamp': '2017-08-03T10:24:19.762278'}, 
{'key': 'key2', 'timestamp': '2017-08-03T10:24:11.762278'}, 
{'key': 'key1', 'timestamp': '2017-08-03T10:24:45.762278'}, 
{'key': 'key1', 'timestamp': '2017-08-03T10:24:23.762278'}, 
{'key': 'key1', 'timestamp': '2017-08-03T10:24:21.762278'}] 

创建一个新的列表,并仅插入钥匙

new_list=[] 
temp=None 
for values in x: 
    if values['key']!=temp: 
    new_list.append(values) 
    temp=values['key'] 
print(new_list) 

[{'key': 'key4', 'timestamp': '2017-08-03T10:24:39.762278'}, 
{'key': 'key3', 'timestamp': '2017-08-03T10:24:25.762278'}, 
{'key': 'key2', 'timestamp': '2017-08-03T10:24:22.762278'}, 
{'key': 'key1', 'timestamp': '2017-08-03T10:24:45.762278'}] 

希望这有助于第一次出现!

0
from dateutil.parser import parse 

data = [ 
    {'key': 'key1', 'timestamp': '2017-08-03T10:24:21.762278'}, 
    {'key': 'key2', 'timestamp': '2017-08-03T10:24:22.762278'}, 
    {'key': 'key1', 'timestamp': '2017-08-03T10:24:23.762278'}, 
    {'key': 'key2', 'timestamp': '2017-08-03T10:24:19.762278'}, 
    {'key': 'key3', 'timestamp': '2017-08-03T10:24:25.762278'}, 
    {'key': 'key2', 'timestamp': '2017-08-03T10:24:11.762278'}, 
    {'key': 'key1', 'timestamp': '2017-08-03T10:24:45.762278'}, 
    {'key': 'key4', 'timestamp': '2017-08-03T10:24:39.762278'}] 


all_keys = [k['key'] for k in data] 

all_keys_unique = set(all_keys) 

new_dict = {} 

for k in all_keys_unique: 

    #find all values for that key and parse them 
    values_of_key = [j['timestamp'] for j in data if k == j['key']] 

    parsed_values = [parse(k2) for k2 in values_of_key] 

    #use max to find latest time step, works on datetimes 
    #and add to dictionary 
    new_dict[k] = max(parsed_values) 

print(new_dict) 
0

按照时间戳字符串的相反顺序对数据进行排序,然后每个唯一键的第一次出现将是您想要保留的一次。

data = sorted(data, key=lambda x: x["timestamp"], reverse=True) 
used_keys, cleaned_data = [ ], [ ] 
for item in data: 
    if not item['key'] in used_keys: 
     # if a key that we encounter in the list isn't used yet, 
     # add its corresponding item to cleaned_data and add it to 
     # used_keys so we know not to use it again. 
     cleaned_data.append(item) 
     used_keys.append(item['key']) 
+0

只要注意到有人发布基本上完全这一点。好吧。 – DragonBobZ

+0

这不能解决问题:) –

+0

固定。这个问题没有提到保留剩余密钥的原始顺序的任何内容,所以我假设按时间戳排序很好。 – DragonBobZ

0

只是创建另一个键值为键的dict,比较时间戳和插入最新的时间戳作为值。