2017-08-25 11 views
-1

我尝试基于在新列文本大熊猫数据框中添加新的列,例如这是我的数据:如何在pandas数据框中结合条件格式和str.contains来创建新列?

>>> data 

No Description 
1  Extention Slack 1 Month 
2  Extention Slack 1 Year 
3  Slack 6 Month 
4  Slack 1 Year 

我需要的是

No Description     M M+1 M+2 M+3 M+4 M+5 M+6 ... M+11 
1  Extention Slack 1 Month 1 0  0 0 0  0 0  0 
2  Extention Slack 1 Year  1 1  1 1 1  1 1  1 
3  Slack 6 Month    1 1  1 1 1  1 0  0 
4  Slack 3 Month    1 1  1 0 0  0 0  0 

我所做的是

import numpy as np 
data['M'] = np.where(data['Description'].str.contains('1 Year'), 1, 0) 

假设我这么做?

+4

什么是tahun? –

+1

Tahun是印度尼西亚语言年,对不起,我的坏 –

回答

1

从描述列中,您想根据部分{time} {time_label}(如1 Year1 Month)来推断哪里要在12个月的时间内填充一个或零。

下面就做你想做的一种方式:

# create two temporary columns 
# time: holds the numeric value associated with time_label (month or year) 
df['time'], df['time_label'] = df.Description.str.split().apply(lambda x: pd.Series(x[-2:])).values.T 

# define the numeric equivalent of Month and Year 
mapping = {"Month":1, "Year":12} 

for month in range(12): 
    # if is only here to pretty print M, M+1, M+2, ... 
    # you can remove it if you accept M+0, M+1, ... 
    if month == 0: 
     df["M"] = np.where(df.time.astype(int)*df.time_label.map(mapping) >= month+1, 1, 0) 
    else: 
     df["M"+"+"+str(month)] = np.where(df.time.astype(int)*df.time_label.map(mapping) >= month+1, 1, 0) 

一个完全重复的例子:

import pandas as pd 
import numpy as np 
from StringIO import StringIO 

data = """ 
No Description 
1  "Extention Slack 1 Month" 
2  "Extention Slack 1 Year" 
3  "Slack 6 Month" 
4  "Slack 3 Month" 
""" 
# StringIO(data) : to simulate reading the data 
# change df with your dataframe 
df = pd.read_table(StringIO(data), sep="\s+") 

# create two temporary columns 
# time: holds the numeric value associated with time_label (month or year) 
df['time'], df['time_label'] = df.Description.str.split().apply(lambda x: pd.Series(x[-2:])).values.T 

# define the numeric equivalent of Month and Year 
mapping = {"Month":1, "Year":12} 

for month in range(12): 
    # if is only here to pretty print M, M+1, M+2, ... 
    if month == 0: 
     df["M"] = np.where(df.time.astype(int)*df.time_label.map(mapping) >= month+1, 1, 0) 
    else: 
     df["M"+"+"+str(month)] = np.where(df.time.astype(int)*df.time_label.map(mapping) >= month+1, 1, 0) 

# remove temporary columns 
df.drop(['time','time_label'], axis=1, inplace=True) 

print(df) 

输出:

No    Description M M+1 M+2 M+3 M+4 M+5 M+6 M+7 M+8 \ 
0 1 Extention Slack 1 Month 1 0 0 0 0 0 0 0 0 
1 2 Extention Slack 1 Year 1 1 1 1 1 1 1 1 1 
2 3   Slack 6 Month 1 1 1 1 1 1 0 0 0 
3 4   Slack 3 Month 1 1 1 0 0 0 0 0 0 

    M+9 M+10 M+11 
0 0  0  0 
1 1  1  1 
2 0  0  0 
3 0  0  0 
+0

感谢您的想法 –

+0

@NabihIbrahimBawazir欢迎您。 – MedAli

相关问题