2016-09-10 240 views
2

我有与具有类似结构的列表一行一个数据帧分割字符串

import pandas as pd 

df=pd.DataFrame({'Name':['Stooge, Nick','Dick, Tracy','Rick, Nike','Maw','El','Paw, Maw, Haw','Caw', 'Greep'], 
'key':[2,2,2,1,1,3,1,1,], 
'Lastname':['Smith, Foo','Johnson, Macy','Johnson, Sike','Simpson','Diablo','Simpson, Sampson, Simmons','Simpson', 'Mortimer'] 
}) 


df.ix[df['key'] == 2, 'Full'] = df['Name']+', ' + df['Lastname'] 
df.ix[df['key'] == 1, 'Full'] = df['Name']+' ' + df['Lastname'] 
print(df) 

输出:

    Lastname   Name key      Full 
0     Smith, Foo Stooge, Nick 2 Stooge, Nick, Smith, Foo 
1    Johnson, Macy Dick, Tracy 2 Dick, Tracy, Johnson, Macy 
2    Johnson, Sike  Rick, Nike 2 Rick, Nike, Johnson, Sike 
3     Simpson   Maw 1     Maw Simpson 
4      Diablo    El 1     El Diablo 
5 Simpson, Sampson, Simmons Paw, Maw, Haw 3       NaN 
6     Simpson   Caw 1     Caw Simpson 
7     Mortimer   Greep 1    Greep Mortimer 

有没有办法操纵或拆分数据框内部的串由逗号所以它产生的结果,如:

    Lastname   Name key      Full 
0     Smith, Foo Stooge, Nick 2 Stooge Smith and Nick Foo 
1    Johnson, Macy Dick, Tracy 2 Dick Johnson and Tracy Macy 
2    Johnson, Sike  Rick, Nike 2 Rick Johnson and Nike Sike 
3     Simpson   Maw 1     Maw Simpson 
4      Diablo    El 1     El Diablo 
5 Simpson, Sampson, Simmons Paw, Maw, Haw 3       NaN 
6     Simpson   Caw 1     Caw Simpson 
7     Mortimer   Greep 1    Greep Mortimer 
+0

这可能帮助:http://pandas.pydata.org/pandas-docs/stable/text.html – cel

回答

2
ln = df.Lastname.str.split(r',\s*', expand=True).stack() 
fn = df.Name.str.split(r',\s*', expand=True).stack() 
df['full'] = fn.add(' ').add(ln).groupby(level=0).apply(tuple).str.join(' and ') 
df 

enter image description here

0

您可以使用apply()

In [63]: df 
Out[63]: 
        Lastname   Name key      Full 
0     Smith, Foo Stooge, Nick 2 Stooge, Nick, Smith, Foo 
1    Johnson, Macy Dick, Tracy 2 Dick, Tracy, Johnson, Macy 
2    Johnson, Sike  Rick, Nike 2 Rick, Nike, Johnson, Sike 
3     Simpson   Maw 1     Maw Simpson 
4      Diablo    El 1     El Diablo 
5 Simpson, Sampson, Simmons Paw, Maw, Haw 3       NaN 
6     Simpson   Caw 1     Caw Simpson 
7     Mortimer   Greep 1    Greep Mortimer 

In [64]: def get_full_name(row): 
    ...:  if ',' in str(row.Full): 
    ...:  z = row.Full.split(',') 
    ...:  x = z[::2] 
    ...:  y = z[1::2] 
    ...:  return ' and '.join(map(lambda(first, last): ' '.join([first, last]), zip(z, y))) 
    ...:  return row.Full 
    ...: 

In [65]: df['Full'] = df.apply(get_full_name, axis = 1) 

In [66]: df 
Out[66]: 
        Lastname   Name key       Full 
0     Smith, Foo Stooge, Nick 2 Stooge Nick and Nick Foo 
1    Johnson, Macy Dick, Tracy 2 Dick Tracy and Tracy Macy 
2    Johnson, Sike  Rick, Nike 2 Rick Nike and Nike Sike 
3     Simpson   Maw 1     Maw Simpson 
4      Diablo    El 1      El Diablo 
5 Simpson, Sampson, Simmons Paw, Maw, Haw 3       NaN 
6     Simpson   Caw 1     Caw Simpson 
7     Mortimer   Greep 1    Greep Mortimer 
+0

'回报 '和'。加入(图(拉姆达(第一个,最后一个):''.join([first,last]),zip(z,y)))'这里有一个语法错误 – ccsv

+0

@ccsv不,没有:http://imgur.com/a/rnpFD –