2017-02-10 22 views
2

我有一个需要用单个值(Drive-by)替换的各种值的列表。我做了我的研究,但是我能找到的最接近的帖子是下面没有使用熊猫的附加链接。什么是最可行的方法来实现这一目标?使用Pandas用单个值替换多个值

Python replace multiple strings

fourth = pd.read_csv('C:/infocentertracker.csv') 
fourth = fourth.rename(columns={'Phone Number: ': 'Phone Number:'}) 
fourth['Source:'] = fourth['Source:'].replace('......', 'Drive-by') 

fourth.to_csv(.............) 

Drive By 
Drive-By 
Drive-by; Return Visitor 
Drive/LTX.com/Internes Srch     Replace all with Drive-by 
Driving By/Lantana Website 
Drive by 
Driving By/Return Visitor 
Drive by/Resident Referral 
Driving by 
Drive- by 
Driving by/LTX Website 
Driving By 
Driving by/Return Visitor 
Drive By/Return Visitor 
Drive By/LTX Website 
+0

是安全的假设,只有目标值从“Driv”开始? – Marat

+0

是的,这是安全的假设。 –

回答

1

一种选择是下面为您请求的大熊猫方法:

fourth.ix[fourth['column name with values'].str.contains('driv', case=False, na=False), 'column name with values'] = 'Drive-by' 

我宁愿使用正则表达式这不一定要求大熊猫:

import re 

[re.sub('(Driv.+)', 'Drive-by', i) for i in fourth['column name']] 
+0

谢谢,我得到一个错误... ValueError:无法使用包含NA/NaN值的向量索引 –

+0

@Pythoner我在str.contains中添加了一个额外的参数,它是'na = False'。所有的本土熊猫功能。只是不确定你的数据是什么样子 –

+0

非常感谢A.Kot。 –

2

您可以使用布尔值掩码str.startswith替换所有值开始s的Driv和想法是从comment of Marat

df.loc[df.col.str.startswith('Driv'), 'col'] = 'Drive-by' 

样品:

print (fourth) 
          col 
0      Drive By 
1      Drive-By 
2  Drive-by; Return Visitor 
3 Drive/LTX.com/Internes Srch 
4 Driving By/Lantana Website 
5      Drive by 
6  Driving By/Return Visitor 
7 Drive by/Resident Referral 
8     Driving by 
9      Drive- by 
10  Driving by/LTX Website 
11     Driving By 
12 Driving by/Return Visitor 
13  Drive By/Return Visitor 
14   Drive By/LTX Website 
15       aaa 
fourth.loc[fourth['Source:'].str.startswith('Driv'), 'Source:'] = 'Drive-by' 
print (fourth) 
    Source: 
0 Drive-by 
1 Drive-by 
2 Drive-by 
3 Drive-by 
4 Drive-by 
5 Drive-by 
6 Drive-by 
7 Drive-by 
8 Drive-by 
9 Drive-by 
10 Drive-by 
11 Drive-by 
12 Drive-by 
13 Drive-by 
14 Drive-by 
15  aaa 

Series.mask另一种解决方案:

fourth['Source:']=fourth['Source:'].mask(fourth['Source:'].str.startswith('Driv', na=False), 
             'Drive-by') 
print (fourth) 
    Source: 
0 Drive-by 
1 Drive-by 
2 Drive-by 
3 Drive-by 
4 Drive-by 
5 Drive-by 
6 Drive-by 
7 Drive-by 
8 Drive-by 
9 Drive-by 
10 Drive-by 
11 Drive-by 
12 Drive-by 
13 Drive-by 
14 Drive-by 
15  aaa 
+0

谢谢,对不起,如果这听起来很愚蠢,我试过fourth.loc ['driv'),'Source:'] ='驾车',但它抛出了一个错误.... 。'DataFrame'对象没有属性'col' –

+0

它是列nmae,我把它改为你的列名为'Source:' – jezrael

相关问题