我认为你需要set_index
与unstack
,最后从MultiIndex
通过map
创建列名:
df = df.set_index(['request_id','crash_id','counter']).unstack()
df.columns = df.columns.map(lambda x: '{}_{}'.format(x[0], x[1]))
df = df.reset_index()
print (df)
request_id crash_id num_acc_x_0 num_acc_x_1 num_acc_x_2 \
0 745109.0 670140638.0 0.01 0.016 0.016
num_acc_y_0 num_acc_y_1 num_acc_y_2 num_acc_z_0 num_acc_z_1 \
0 0.0 -0.006 -0.006 -0.045 -0.034
num_acc_z_2
0 -0.034
与aggreagting重复另一种解决方案与pivot_table
:
df = df.pivot_table(index=['request_id','crash_id'], columns='counter', aggfunc='mean')
df.columns = df.columns.map(lambda x: '{}_{}'.format(x[0], x[1]))
df = df.reset_index()
print (df)
request_id crash_id num_acc_x_0 num_acc_x_1 num_acc_x_2 \
0 745109.0 670140638.0 0.01 0.016 0.016
num_acc_y_0 num_acc_y_1 num_acc_y_2 num_acc_z_0 num_acc_z_1 \
0 0.0 -0.006 -0.006 -0.045 -0.034
num_acc_z_2
0 -0.034
df = df.groupby(['request_id','crash_id','counter']).mean().unstack()
df.columns = df.columns.map(lambda x: '{}_{}'.format(x[0], x[1]))
df = df.reset_index()
print (df)
request_id crash_id num_acc_x_0 num_acc_x_1 num_acc_x_2 \
0 745109.0 670140638.0 0.01 0.016 0.016
num_acc_y_0 num_acc_y_1 num_acc_y_2 num_acc_z_0 num_acc_z_1 \
0 0.0 -0.006 -0.006 -0.045 -0.034
num_acc_z_2
0 -0.034
差不多,因为输出不会添加计数器到列的名称。我需要以下名称的列:num_acc_x _1,num_acc_x_2,...和num_acc_y和num_acc_z同样保留request_id crash_id作为初始列 –
嗯,然后使用'df ['mycounter'] = df.groupby(['request_id ','crash_id'])。cumcount()+ 1'进行计数。并将'df = df.set_index(['request_id','crash_id','counter'])。unstack()'改为'df = df.set_index(['request_id','crash_id','mycounter']) .unstack()' – jezrael