一个Python版本,可以看看这个:
fobj_in = io.StringIO("""Name1, Surname1 Team1
Team2
Team3
Name2, Surname2 Team2
Team4
Name3, Surname3 Team1
Team5""")
fobj_out = io.StringIO()
from collections import defaultdict
teams = defaultdict(list)
for line in fobj_in:
items = line.split()
if len(items) == 3:
name = items[:2]
team = items[2]
else:
team = items[0]
teams[team].append(name)
for team_name in sorted(teams.keys()):
fobj_out.write(team_name + ', ')
for name in teams[team_name][:-1]:
fobj_out.write('{} {}, '.format(name[0], name[1]))
name = teams[team_name][-1]
fobj_out.write('{} {}\n'.format(name[0], name[1]))
fobj_out.seek(0)
print(fobj_out.read())
输出:
Team1, Name1, Surname1, Name3, Surname3
Team2, Name1, Surname1, Name2, Surname2
Team3, Name1, Surname1
Team4, Name2, Surname2
Team5, Name3, Surname3
只要做到这一点读取和写入到一个实际的文件:
fobj_in = open('in_file.txt')
fobj_out = open('out_file.txt', 'w')
EDIT
注:样品的数据似乎不包含的情况下woud导致多个名称在输出一行。
随着this input data,我们需要改变的代码:
from collections import defaultdict
teams = defaultdict(list)
for line in fobj_in:
if not line.strip():
continue
items = [entry.strip() for entry in line.split('\t') if entry]
if len(items) == 2:
name = items[0]
team = items[1]
else:
team = items[0]
teams[team].append(name)
for team_name in sorted(teams.keys()):
fobj_out.write(team_name + ', ')
for name in teams[team_name][:-1]:
fobj_out.write('{}, '.format(name))
name = teams[team_name][-1]
fobj_out.write('{}\n'.format(name))
生成的文件内容是这样的:
"Décore ta vie" (2003), Boilard, Naggy
"Mouki" (2010), Boileau, Sonia
A chacun sa place (2011), Boinem, Victor Emmanuel
Absence (2009) (V), Boillat, Patricia
C.A.L.L.E. (2005), Boillat, Patricia
Comment devenir un trou de cul et enfin plaire aux femmes (2004), Boire, Roger
Couleur de peau: Miel (2012), Boileau, Laurent
Hergé:Les aventures de Tintin (2004), Boillot, Olivier
Isola, là dove si parla la lingua di Bacco (2011) (co-director), Boillat, Patricia
L'île (2011), Boillot, Olivier
La beauté fatale et féroce... (1996), Boire, Roger
Last Call Indian (2010), Boileau, Sonia
Le Temple Oublié (2005), Boillot, Olivier
Le pied tendre (1988), Boire, Roger
Legit (2006), Boinski, James W.
Nubes (2010), Boira, Francisco
Questions nationales (2009), Boire, Roger
Reconciling Rwanda (2007), Boiko, Patricia
Soviet Gymnasts (1955), Boikov, Vladimir
The Corporal's Diary (2008) (V) (head director), Boiko, Patricia
Un gars ben chanceux (1977), Boire, Roger
它又是你现在的输入结构?一条线,多条线和什么时候是线路制动器? – Johannes
姓氏和/或队名中是否有空格?中间是否有制表符,或者是固定列中的团队名称? –
@Johannes:输入非常混乱。唯一的“结构化”部分是“Name1,Surname1”,每次都有一个逗号和1个空格。就团队而言,他们通常被放置在一个固定的列中,但是,首先报告的团队(名称 - 姓氏行中)通常与团队列不一致,具体取决于包含“姓名,姓氏“ – user2447387