根据分组变量从文件加载列表列表？

如果我有一个文件：根据分组变量从文件加载列表列表？

A pgm1 
A pgm2 
A pgm3 
Z pgm4 
Z pgm5 
C pgm6 
C pgm7 
C pgm8 
C pgm9

如何创建列表：

[['pgm1','pgm2','pgm3'],['pgm4','pgm5'],['pgm6','pgm7','pgm8','pgm9']]

我需要保留从负载文件中的原始顺序。所以[pgm4，pgm5]必须是第二个子列表。

我的偏好是当分组变量从前一个变为“A，Z，C”时触发新的子列表。但是我可以接受，如果分组变量必须是连续的，即“1,2,3”。

（这是为了支持运行在每个子列表兼任节目，而是在等待所有上游方案，以进行下一列表之前完成。）

我在RHEL 2.6.32使用Python 2.6 .6

来源

2015-11-16 Scott

请您能不能告诉你有什么到目前为止已经试过？ – styvane

我进行了网络搜索并搜索了超过一个小时之前发布的“列表的Python文件列表”。难倒我的是如何检测团队何时改变。话虽如此，未来我会尽我所能提供我已经尝试的示例代码，作为所有SO帖子的一部分。 – Scott

在我的OP后，其他网络搜索发现这个：How do I use Python's itertools.groupby()?

这是我目前的方法。请告知我是否可以使它更加Pythonic。

loadfile1.txt（无分组变量 - 相同的输出loadfile4.txt）：

pgm1 
pgm2 
pgm3 

pgm4 
pgm5 

pgm6 
pgm7 
pgm8 
/a/path/with spaces/pgm9

loadfile2.txt（随机分组变量）：

10, pgm1 
10, pgm2 
10, pgm3 

ZZ, pgm4 
ZZ, pgm5 

-5, pgm6 
-5, pgm7 
-5, pgm8 
-5, /a/path/with spaces/pgm9

loadfile3.txt（同一分组变量 - 不依赖关系 - 多线程）：

,pgm1 
,pgm2 
,pgm3 

,pgm4 
,pgm5 

,pgm6 
,pgm7 
,pgm8 
,/a/path/with spaces/pgm9

loadfile4.txt（不同的分组变量 - dep endencies - 单线程）：

1, pgm1 
2, pgm2 
3, pgm3 

4, pgm4 
5, pgm5 

6, pgm6 
7, pgm7 
8, pgm8 
9, /a/path/with spaces/pgm9

我的Python脚本：

#!/usr/bin/python 

# See https://stackoverflow.com/questions/4842057/python-easiest-way-to-ignore-blank-lines-when-reading-a-file 

# convert file to list of lines, ignoring any blank lines 
filename = 'loadfile2.txt' 

with open(filename) as f_in: 
    lines = filter(None, (line.rstrip() for line in f_in)) 

print(lines) 

# convert list to a list of lists split on comma 
lines = [i.split(',') for i in lines] 
print(lines) 

# create list of lists based on the key value (first item in sub-lists) 
listofpgms = [] 
for key, group in groupby(lines, lambda x: x[0]): 
    pgms = [] 
    for pgm in group: 
     try: 
      pgms.append(pgm[1].strip()) 
     except IndexError: 
      pgms.append(pgm[0].strip()) 

    listofpgms.append(pgms) 

print(listofpgms)

输出使用loadfile2.txt时：

['10, pgm1', '10, pgm2', '10, pgm3', 'ZZ, pgm4', 'ZZ, pgm5', '-5, pgm6', '-5, pgm7', '-5, pgm8', '-5, /a/path/with spaces/pgm9'] 
[['10', ' pgm1'], ['10', ' pgm2'], ['10', ' pgm3'], ['ZZ', ' pgm4'], ['ZZ', ' pgm5'], ['-5', ' pgm6'], ['-5', ' pgm7'], ['-5', ' pgm8'], ['-5', ' /a/path/with spaces/pgm9']] 
[['pgm1', 'pgm2', 'pgm3'], ['pgm4', 'pgm5'], ['pgm6', 'pgm7', 'pgm8', '/a/path/with spaces/pgm9']]

来源

2015-11-17 00:39:26 Scott

只需使用collections.defaultdict()。

代码：

import collections 
d = collections.defaultdict(list) 

infile = 'filename' 
with open(infile) as f: 
    a = [i.strip() for i in f] 

a = [i.split() for i in a] 

for key, value in a: 
    d[key].append(value) 

l = list(d.values())

演示：

>>> import collections 
>>> d = collections.defaultdict(list) 

>>> infile = 'filename' 
>>> with open(infile) as f: 
...  a = [i.strip() for i in f] 

>>> a = [i.split() for i in a] 
>>> a 
[['A', 'pgm1'], ['A', 'pgm2'], ['A', 'pgm3'], ['Z', 'pgm4'], ['Z', 'pgm5'], ['C', 'pgm6'], ['C', 'pgm7'], ['C', 'pgm8'], ['C', 'pgm9']] 

>>> for key, value in a: 
...  d[key].append(value) 

>>> d 
defaultdict(<class 'list'>, {'A': ['pgm1', 'pgm2', 'pgm3'], 'C': ['pgm6', 'pgm7', 'pgm8', 'pgm9'], 'Z': ['pgm4', 'pgm5']}) 

>>> d.values() 
dict_values([['pgm1', 'pgm2', 'pgm3'], ['pgm6', 'pgm7', 'pgm8', 'pgm9'], ['pgm4', 'pgm5']]) 

>>> list(d.values()) 
[['pgm1', 'pgm2', 'pgm3'], ['pgm6', 'pgm7', 'pgm8', 'pgm9'], ['pgm4', 'pgm5']] 
>>>

的打击代码做同样的事情，上面的代码做，但保留顺序：

infile = 'filename' 
with open(infile) as f: 
    a = [i.strip() for i in f] 

a = [i.split() for i in a] 

def orderset(seq): 
    seen = set() 
    seen_add = seen.add 
    return [ x for x in seq if not (x in seen or seen_add(x))] 

l = [] 
for i in orderset([i[0] for i in a]): 
    l.append([j[1] for j in a if j[0] == i])

来源

2015-11-16 06:03:10

我需要保留加载文件的原始顺序。所以pgm4，pgm5需要成为第二个子列表。我使用Python 2.6.6在RHEL 2.6.32上，所以我没有OrderedDict。 – Scott

@Scott：嗯......让我编辑我的答案，请稍等...... –

@Scott：好的，完成了。希望这个帮助:) –

根据分组变量从文件加载列表列表？

回答

相关问题