python：在另一列的ID的基础上合并一列中的值

我需要根据第一列中的ID来合并制表符分隔文件的第二列中的值。下面给出了这个例子。什么是最快的方法来做到这一点。我可以使用for循环，遍历每一行，但我确信有一些聪明的方式来做到这一点，我不知道。python：在另一列的ID的基础上合并一列中的值

596230 Other postop infection 
596230 Disseminated candidiasis 
596230 Int inf clstrdium dfcile 
596230 Pressure ulcer, site NOS 
2846079 Schizophrenia NOS-unspec 
7800713 CHF NOS 
7800713 Chr airway obstruct NEC 
7800713 Polymyalgia rheumatica 
7800713 DMII wo cmp nt st uncntr

到

596230 Other postop infection, Disseminated candidiasis, Int inf clstrdium dfcile, Pressure ulcer, site NOS 
2846079 Schizophrenia NOS-unspec 
7800713 CHF NOS, Chr airway obstruct NEC, Polymyalgia rheumatica, DMII wo cmp nt st uncntr

来源

2012-10-05 Curious

假设你有一个文件你的文字：

from collections import defaultdict 
items = defaultdict(list) 
with open("myfile.txt") as infile: 
    for line in file: 
     id, text = line.rstrip().split("\t") 
     items[id].append(text) 
for id in items: 
    print id + "\t" + ", ".join(items[id])

这不会让你id S的原始顺序，但它确实保持文本的顺序。

来源

2012-10-05 13:14:30

如果他们已经排序，你可以使用分割线itertools.groupby()聚集起来。如果他们没有排序，那么先排序。

来源

2012-10-05 13:11:29

您可能也会考虑Python csv module来解析您的文件，因为您可以将其设置为使用除逗号之外的其他字符（例如制表符\t）作为分隔符。基本的例子是这样的：

import csv 
with open('myfile', 'rb') as f: 
    reader = csv.reader(f, delimiter='\t', quoting=csv.QUOTE_NONE) 
    for row in reader: 
     print row

从那里，你可以使用已经建议组具有相同数量的所有项目一起的选项之一。

来源

2012-10-05 14:11:28 Qanthelas

python：在另一列的ID的基础上合并一列中的值

回答

相关问题