2014-07-23 28 views
0

我想编写一个函数塔输入这样如何将行按行分组?

1405684432,  d8:c7:c8:5e:7c:2d,   SUTD_GLAB, 72 

    1405684432,  d8:c7:c8:5e:7c:2c,   SUTD_BOT, 72 

    1405684432,  d8:c7:c8:5e:7c:2b,  SUTD_Student, 72 

    1405684432,  d8:c7:c8:5e:7c:2a,   SUTD_Staff, 72 

    1405684433,  d8:c7:c8:5e:7c:29,   SUTD_ILP2, 71 

    1405684433,  d8:c7:c8:5e:7d:eb,  SUTD_Student, 57 

    1405684433,  d8:c7:c8:5e:7d:ea,   SUTD_Staff, 57 

输出会给我两个列表或第一列,这意味着如果在拳头列中的数字是一样的,它会分组文件组作为列表。结果应该是这样的:

列出一个:

1405684432,  d8:c7:c8:5e:7c:2d,   SUTD_GLAB, 72 

    1405684432,  d8:c7:c8:5e:7c:2c,   SUTD_BOT, 72 

    1405684432,  d8:c7:c8:5e:7c:2b,  SUTD_Student, 72 

    1405684432,  d8:c7:c8:5e:7c:2a,   SUTD_Staff, 72 

列出两种:

1405684433,  d8:c7:c8:5e:7c:29,   SUTD_ILP2, 71 

    1405684433,  d8:c7:c8:5e:7d:eb,  SUTD_Student, 57 

    1405684433,  d8:c7:c8:5e:7d:ea,   SUTD_Staff, 57 

我不知道我应该使用哪种方法。

+0

d第一列只包含“1405684432”或“1405684433”,或者是其他值吗? –

+0

@Tichodroma有其他值:) – user3843433

+0

输入来自哪里? –

回答

0

我会选择使用字典来跟踪第一列。一个解决办法是使用类似:

def split_on_first_column(data): 
    result = dict() 
    for line in data: 
     l = line.split(',') 
     if not l[0] in result: 
      result[l[0]] = [line] 
     else: 
      result[l[0]].append(line) 

    return result.values() 

这在Python 2为您提供了蟒蛇名单在这种情况下的列表和迭代器在列表3.

注意,行存储的完整字符串,而不是进一步分成列表。

0
  1. 读输入作为一个CSV文件
  2. 使用第一列作为键的字典
  3. 输出继电器字典

Python代码:

import csv 

groups = {} 

with open("data.csv") as data: 
    reader = csv.reader(data) 
    for row in reader: 
     if len(row) > 0: 
      col1 = row[0].strip() 
      group = groups.get(col1, []) 
      group.append(row) 
      groups[col1] = group 

for key in groups: 
    print("=== {0} ===".format(key)) 
    print("\n".join(",".join(row) for row in groups[key])) 

输出:

=== 1405684433 === 
1405684433,  d8:c7:c8:5e:7c:29,   SUTD_ILP2, 71 
1405684433,  d8:c7:c8:5e:7d:eb,  SUTD_Student, 57 
1405684433,  d8:c7:c8:5e:7d:ea,   SUTD_Staff, 57 
=== 1405684432 === 
1405684432,  d8:c7:c8:5e:7c:2d,   SUTD_GLAB, 72 
1405684432,  d8:c7:c8:5e:7c:2c,   SUTD_BOT, 72 
1405684432,  d8:c7:c8:5e:7c:2b,  SUTD_Student, 72 
1405684432,  d8:c7:c8:5e:7c:2a,   SUTD_Staff, 72 
2

您可以使用itertools.groupby()。 (假设输入由列排序。)

实施例:

import itertools 

data = """\ 
    1405684432,  d8:c7:c8:5e:7c:2d,   SUTD_GLAB, 72 
    1405684432,  d8:c7:c8:5e:7c:2c,   SUTD_BOT, 72 
    1405684432,  d8:c7:c8:5e:7c:2b,  SUTD_Student, 72 
    1405684432,  d8:c7:c8:5e:7c:2a,   SUTD_Staff, 72 
    1405684433,  d8:c7:c8:5e:7c:29,   SUTD_ILP2, 71 
    1405684433,  d8:c7:c8:5e:7d:eb,  SUTD_Student, 57 
    1405684433,  d8:c7:c8:5e:7d:ea,   SUTD_Staff, 57 
""" 

data = data.splitlines() 
keyfunc = lambda x: x.split(',')[0] 
#data.sort(key=keyfunc) # if input is not sorted by first column 

for k,l in itertools.groupby(data, key=keyfunc): 
    print "group:", k 
    for x in l: 
     print x 

输出:

group: 1405684432 
    1405684432,  d8:c7:c8:5e:7c:2d,   SUTD_GLAB, 72 
    1405684432,  d8:c7:c8:5e:7c:2c,   SUTD_BOT, 72 
    1405684432,  d8:c7:c8:5e:7c:2b,  SUTD_Student, 72 
    1405684432,  d8:c7:c8:5e:7c:2a,   SUTD_Staff, 72 
group: 1405684433 
    1405684433,  d8:c7:c8:5e:7c:29,   SUTD_ILP2, 71 
    1405684433,  d8:c7:c8:5e:7d:eb,  SUTD_Student, 57 
    1405684433,  d8:c7:c8:5e:7d:ea,   SUTD_Staff, 57 

参考: