添加使用熊猫

到数据帧的子集新行我有以下数据框：添加使用熊猫

Customer ProductID Count 

John  1   25 
John  6   50 
Mary  2   15 
Mary  3   35

我想我的输出看起来像这样：

Customer ProductID Count 

John  1   25 
John  2   0 
John  3   0 
John  6   50 
Mary  1   0 
Mary  2   15 
Mary  3   35 
Mary  6   0

我所试图做的是确定唯一ProductID从数据帧开始

unique_ID = pd.unique(df.ProductID.ravel()) 
print (unique_ID) = array([1,6,2,3])

由于ProductID 2,3不存在客户约翰，我将客户姓名分成数据帧

df1 = df[df['Customer']=='John'] 
df2 = df[df['Customer']=='Mary']

打印DF1

Customer ProductID Count 
John  1   25 
John  6   50

打印DF2

Customer ProductID Count 
Mary  2   15 
Mary  3   35

我想补充ProductID 2,3约翰和ProductID 1 ，6给玛丽，并且将Count设置为0，这些是ProductID，如上面我所期望的输出所示。

来源

2016-06-14 KeyboardWarrior

我认为你可以使用pivot - 你NaN值这是fillna通过0和df最后需要原来的形状 - 使用stack与reset_index：

print (df.pivot(index='Customer',columns='ProductID', values='Count') 
     .fillna(0) 
     .stack() 
     .reset_index(name='Count')) 
    Customer ProductID Count 
0  John   1 25.0 
1  John   2 0.0 
2  John   3 0.0 
3  John   6 50.0 
4  Mary   1 0.0 
5  Mary   2 15.0 
6  Mary   3 35.0 
7  Mary   6 0.0

另一个解决方案 - 首先获得列unique值（ sort_values列ProductID），然后创建MultiIndex.from_product和reindexdf由此Multiindex：

a = df.Customer.unique() 
b = df.ProductID.sort_values().unique() 

print (a) 
['John' 'Mary'] 
print (b) 
[1 2 3 6] 

m = pd.MultiIndex.from_product([a,b]) 
print (m) 
MultiIndex(levels=[['John', 'Mary'], [1, 2, 3, 6]], 
      labels=[[0, 0, 0, 0, 1, 1, 1, 1], [0, 1, 2, 3, 0, 1, 2, 3]]) 

df1 = df.set_index(['Customer','ProductID']).reindex(m, fill_value=0).reset_index() 
df1.columns = ['Customer','ProductID','Count'] 
print (df1) 
    Customer ProductID Count 
0  John   1  25 
1  John   2  0 
2  John   3  0 
3  John   6  50 
4  Mary   1  0 
5  Mary   2  15 
6  Mary   3  35 
7  Mary   6  0

来源

2016-06-14 20:57:38 jezrael

添加使用熊猫

回答

相关问题