大熊猫据帧返回第一个词串

列

df = pd.DataFrame({'id' : ['abarth 1.4 a','abarth 1 a','land rover 1.3 r','land rover 2', 
          'land rover 5 g','mazda 4.55 bl'], 
        'series': ['a','a','r','','g', 'bl'] })

我想从相应的ID删除“系列”字符串，所以最终的结果应该是：

最终的结果应该是'id': ['abarth 1.4','abarth 1','land rover 1.3','land rover 2','land rover 5', 'mazda 4.55']

目前我使用df.apply：

df.id = df.apply(lambda x: x['id'].replace(x['series'], ''), axis =1)

但这删除字符串的所有实例，甚至换句话说，就像这样： 'id': ['brth 1.4','brth 1','land ove 1.3','land rover 2','land rover 5', 'mazda 4.55']

我应该以某种方式混合和匹配的正则表达式里面df.apply变量，像这样？

df.id = df.apply(lambda x: x['id'].replace(r'\b' + x['series'], ''), axis =1)

来源

2016-05-28 Testy8

使用str.split和str.get和分配使用loc只有在df.make == ''

df.loc[df.make == '', 'make'] = df.id.str.split().str.get(0) 

print df 

       id make 
0  abarth 1.4 abarth 
1  abarth 1 abarth 
2 land rover 1.3 rover 
3 land rover 2 rover 
4 land rover 5 rover 
5  mazda 4.55 mazda

来源

2016-05-29 00:33:46 piRSquared

如果我得到你的问题正确，你可以只使用replace功能：

df.make = df.make.replace("", test.id)

来源

2016-05-28 23:59:36

OP需要id'列的'第一个字。 – Parfait

考虑与loc正则表达式解决方案，它提取之前，首先空间的一切：

df.loc[df['make']=='', 'make'] = df['id'].str.extract('(.*) ', expand=False)

另外，使用numpy的where，它允许if/then/else条件逻辑：

df['make'] = np.where(df['make']=='', 
         df['id'].str.extract('(.*) ', expand=False), 
         df['make'])

来源

2016-05-29 00:07:37 Parfait

大熊猫据帧返回第一个词串

回答

相关问题