大熊猫选择多列有条件

假设我有一个数据帧：大熊猫选择多列有条件

C1 V1 C2 V2 Cond 
1 2 3 4 X 
5 6 7 8 Y 
9 10 11 12 X

的语句应该返回：if Cond == X, pick C1 and C2, else pick C2 and V2。

输出数据帧是一样的东西：

**编辑：要添加更多一个要求：列的数量可以改变，但遵循一定的命名模式。在这种情况下，选择其中包含“1”的所有列，否则选择“2”。我认为硬编码的解决方案可能无法正常工作。

来源

2017-01-02 iwbabn

可能的复制[Cre用熊猫的ELIF吃了一列]（http://stackoverflow.com/questions/18194404/create-column-with-elif-in-pandas） – e4c5

'indexer = {'X'：['C1'，'V1'] ，'Y'：['C2'，'V2']};对于k，v in indexer.items（）]）中的pd.concat（[pd.DataFrame（df.loc [df.Cond == k，v] .values，columns = ['C'，'V']））'是这样做的一种方式，但它不保存行的顺序。 – Abdou

我尝试用filter和numpy.where创造更多的通用的解决方案，为新的列名称使用extract：

#if necessary sort columns 
df = df.sort_index(axis=1) 

#filter df by 1 and 2 
df1 = df.filter(like='1') 
df2 = df.filter(like='2') 
print (df1) 
    C1 V1 
0 1 2 
1 5 6 
2 9 10 

print (df2) 
    C2 V2 
0 3 4 
1 7 8 
2 11 12

#np.where need same shape of mask as df1 and df2 
mask = pd.concat([df.Cond == 'X']*len(df1.columns), axis=1) 
print (mask) 
    Cond Cond 
0 True True 
1 False False 
2 True True 

cols = df1.columns.str.extract('([A-Za-z])', expand=False) 
print (cols) 
Index(['C', 'V'], dtype='object') 

print (np.where(mask, df1,df2)) 
Index(['C', 'V'], dtype='object') 
[[ 1 2] 
[ 7 8] 
[ 9 10]] 

print (pd.DataFrame(np.where(mask, df1, df2), index=df.index, columns=cols)) 
    C V 
0 1 2 
1 7 8 
2 9 10

来源

2017-01-02 08:31:11 jezrael

dropCond集中于值我从
reshape numpy的阵列，所以我可以用布尔值区分
索引第一维度与

np.arange(len(df))

索引第二维度与df.Cond.ne('X').mul(1)。 0为等于X
构建最终的数据帧

pd.DataFrame(
    df.drop('Cond', 1).values.reshape(3, 2, 2)[ 
     np.arange(len(df)), 
     df.Cond.ne('X').mul(1) 
    ], df.index, ['C', 'V']) 

    C V 
0 1 2 
1 7 8 
2 9 10

来源

2017-01-02 00:56:12 piRSquared

您可以尝试使用类似的方法在this post

首先，定义了几个功能：

def cond(row): 
    return row['Cond'] == 'X' 

def helper(row, col_if, col_ifnot): 
    return row[col_if] if cond(row) else row[col_ifnot]

然后，如suming您的数据框被称为df，

df_new = pd.DataFrame(index=df.index) 
for col in ['C', 'V']: 
    col_1 = col + '1' 
    col_2 = col + '2' 
    df_new[col] = df.apply(lambda row: helper(row, col_1, col_2), axis=1)

请记住，这种做法可能是大dataframes缓慢，因为apply没有利用量化的。但是，即使使用任意列名称也应该可以工作（只需将['C', 'V']替换为您的实际列名称）。

来源

2017-01-02 01:02:19 vbox

如果行的顺序不重要，则可以使用df.loc和df.append。

ndf1 = df.loc[df['Cond'] == 'X', ['C1','V1']] 
ndf2 = df.loc[df['Cond'] == 'Y', ['C2','V2']] 
ndf1.columns = ['C','V'] 
ndf2.columns = ['C','V'] 

result = ndf1.append(ndf2).reset_index(drop=True) 
print(result) 
    C V 
0 1 2 
1 9 10 
2 7 8

来源

2017-01-02 01:22:23

与DataFrame.where()另一种选择：

df[['C1', 'V1']].where(df.Cond == "X", df[['C2', 'V2']].values) 

# C1 V1 
#0 1 2 
#1 7 8 
#2 9 10

来源

2017-01-02 01:47:32 Psidom

太棒了！很优雅的解决方案为你+1！我为什么没有想到 –

大熊猫选择多列有条件

回答

相关问题