在Python中列表交集和部分字符串匹配

所以我有2个列表第一个来自我的数据集并包含格式为'yyyy-mm-dd hh：mm'的日期时间，名为times。例如：在Python中列表交集和部分字符串匹配

'2010-01-01 00:00', '2010-01-01 00:15', '2010-01-01 00:30', ...,

另一种是所有特殊的一年一个月组合，命名为year_and_month的列表。例如：

'2010-01', '2010-02', '2010-03', '2010-04',

所以我尝试提取原始数据集中年份组合的所有索引。我这样做，用最糟糕的方式（在蟒蛇新），即

each_member_indices = [] 
for i in range(len(year_and_month)): 
    item_ind = [] 
    for j in range(times.shape[0]): 
     if year_and_month[i] in times[j]: 
      item_ind.append(j) 

each_member_indices.append(item_ind)

现在，这是用了那么多的时间来工作核弹。因此，我希望优化它一下，因此我一直在寻找在一些实施方式中，如 Find intersection of two lists?和Python: Intersection of full string from list with partial string问题在于

res_1 = [val for val in year_and_month if val in times]

产生空列表，而

res_1 = [val for val in year_and_month if val in times[0]]

产生所述第一构件至少。

有什么想法？

编辑：

我只需要从名为times相应的year_and_month名单的唯一年月对原始数据集的元素的索引。因此，作为要求的样本输出将

[[0, 1, 2, 3,...],[925, 926, ...],...]

第一子列表包含了对2010年一月的指数，第二次为2010年二月等等。

来源

2017-06-30 Kots

你能为你的输入显示一个想要的输出样本吗？ –

你是对的！正当我在看解决方案时，我发现我通过for循环获得了我想要的内容，但列表理解却没有达到同样的目的。为了回答你的问题，ima得到一个列表，即'each_member_indices'，它是'[[0,1,2，..]，[924，925，...]，...]'每个对应于唯一年份的子列表例如，第一个子列表是2010年1月期间的所有指数。 – Kots

要做到线性时间，你可以建立一个查找字典映射年份和月份组合索引。您还可以使用collections.defaultdict，使之更容易一点：

from collections import defaultdict 

d = defaultdict(list) 
for i, v in enumerate(times): 
    d[v[:7]].append(i)

然后，你可以创建一个列表解析结果列表：

result = [d[x] for x in year_and_month]

演示：

>>> from collections import defaultdict 
>>> times = ['2010-01-01 00:00', '2010-01-01 00:15', '2010-02-01 00:30', '2010-03-01 00:00'] 
>>> year_and_month = ['2010-01', '2010-02', '2010-03', '2010-04'] 
>>> d = defaultdict(list) 
>>> for i, v in enumerate(times): 
...  d[v[:7]].append(i) 
...  
>>> dict(d) 
{'2010-01': [0, 1], '2010-02': [2], '2010-03': [3]} 
>>> [d[x] for x in year_and_month] 
[[0, 1], [2], [3], []]

来源

2017-06-30 10:07:16

因此，如果我想提取'2010年 - 01'我应该可以写'd ['2010-01']'。但是，当我做'result = [d [x] for x year_and_month]'这给了我一个列表，其中'len（result）== len（times）'。不过，我宁愿选择一个“结果”列表，其长度与独特的年份组合相同，即与演示中的结果相同。这可能是一个问题来自事实，我使用python 3？ – Kots

也许，'times'中的每个元素都有独特的年份呢？ [list comprehension]（https://docs.python.org/3.6/tutorial/datastructures.html#list-comprehensions）会创建一个与输入大小相同的新列表。 –

也许尝试使用任何？

[val for val in year_and_month if any(val in t for t in times)]

来源

2017-06-30 10:04:18 NightHallow

**注意**我没有尝试你的原始代码，也不知道你正在寻找什么输出 – NightHallow

有一些警告;）也许一个澄清问题的评论会更好 –

为什么不用字典创建一个新的结构，并按year_and_month排序呢？

result = {} 
for i, v in enumerate(times): 
    result.setdefault(v[:7], []).append(i) 
for i in year_and_month: 
    print(i, result[i]) #will print the year_month with all the indices of that year_month

来源

2017-06-30 10:04:28 mke21

好吧，这给出了常见元素：

ls = str(times) 
r = [x for x in year_and_month if (x in ls)] 
print r

来源

2017-06-30 10:08:39

在Python中列表交集和部分字符串匹配

回答

相关问题