随着大熊猫functions使用split
:
df = pd.DataFrame({'a':['[email protected]','[email protected]','[email protected]']})
print (df)
a
0 [email protected]
1 [email protected]
2 [email protected]
print ('@' + df.a.str.split('@').str[1].str.split('.', 1).str[0])
0 @yahoo
1 @aol
2 @aol
Name: a, dtype: object
但更快的是使用apply
,如果在列不NaN
值:
df = pd.concat([df]*10000).reset_index(drop=True)
print ('@' + df.a.str.split('@').str[1].str.split('.', 1).str[0])
print (df.a.apply(lambda x: '@' + x.split('@')[1].split('.')[0]))
In [363]: %timeit ('@' + df.a.str.split('@').str[1].str.split('.', 1).str[0])
10 loops, best of 3: 79.1 ms per loop
In [364]: %timeit (df.a.apply(lambda x: '@' + x.split('@')[1].split('.')[0]))
10 loops, best of 3: 27.7 ms per loop
与extract
另一种解决方案比split
更快,可以使用它,如果NaN
值col UMN:
#not sure with all valid characters in email address
print ('@' + df.a.str.extract(r"\@([A-Za-z0-9_]+)\.", expand=False))
In [365]: %timeit ('@' + df.a.str.extract(r"\@([A-Za-z0-9 _]+)\.", expand=False))
10 loops, best of 3: 39.7 ms per loop
谢谢这是我的问题的完美解决方案 – Kalimantan
[email protected]c.nasa.gov或[email protected]会发生什么 –