2017-07-31 68 views
1

部分切片设定值我有这段代码,产生以下空数据框:大熊猫 - 使用上多指标

>>> first = ['foo', 'bar'] 
>>> second = ['baz', 'can'] 
>>> third = ['ok', 'ko'] 
>>> colours = ['blue', 'yellow', 'green'] 

>>> idx = pd.IndexSlice 
>>> ix = pd.MultiIndex.from_arrays(np.array([i for i in itertools.product(first, second, third)]).transpose().tolist(), 
            names=('first', 'second', 'third')) 
>>> df1 = pd.DataFrame(index=ix, columns=colours).sort_index() 
>>> print(df1) 

        blue yellow green 
first second third     
bar baz ko  NaN NaN NaN 
      ok  NaN NaN NaN 
     can ko  NaN NaN NaN 
      ok  NaN NaN NaN 
foo baz ko  NaN NaN NaN 
      ok  NaN NaN NaN 
     can ko  NaN NaN NaN 
      ok  NaN NaN NaN 

我打算做什么,是从另一个数据框中填入这个基于多指标空数据帧是给予,并且是基于列的,像下面(列截断清晰度):

 baz_ok_blue baz_ko_blue can_ok_blue can_ko_blue baz_ok_yellow 
foo -1.385111 -1.014812 -1.419643  1.540341  0.663933 
bar  0.445372 -0.226087  0.450982 -1.114169  0.896522 

到目前为止,我一直是这样的:

idx = pd.IndexSlice 
for s in second: 
    for t in third: 
     for c in colours: 
      column_name = '{s}_{t}_{c}'.format(s=s, c=c, t=t) 
      values = df2[column_name] 
      df1.loc[idx[:, s, t], c] = values 

在每次迭代中,values系列都已正确确定,但Pandas与第一级df1的MultiIndex不匹配values的索引。因此,所有的df1值都保持为NaN,因为Pandas试图将MultiIndex与单个索引匹配。有没有办法呢?

基本上,为了给出更高层次的观点,我只是试图将df2(基于字符串列)重新排列为df1(基于MultiIndex)的形式。

回答

2

您可以创建MultiIndex首先str.split,然后通过stack和最后reindex重塑:

df.columns = df.columns.str.split('_', expand=True) 
print (df) 
      baz     can     baz 
      ok  ko  ok  ko  ok 
     blue  blue  blue  blue yellow 
foo -1.385111 -1.014812 -1.419643 1.540341 0.663933 
bar 0.445372 -0.226087 0.450982 -1.114169 0.896522 

df = df.stack([0,1]).reindex(index=df1.index, columns=df1.columns) 
print (df) 
         blue yellow green 
first second third       
bar baz ko -0.226087  NaN NaN 
      ok  0.445372 0.896522 NaN 
     can ko -1.114169  NaN NaN 
      ok  0.450982  NaN NaN 
foo baz ko -1.014812  NaN NaN 
      ok -1.385111 0.663933 NaN 
     can ko  1.540341  NaN NaN 
      ok -1.419643  NaN NaN 
+0

谢谢,辉煌。然而,似乎有些值在最后阶段丢失了(它们在堆叠后仍然存在,但在重新索引后消失 - 最终成为'NaN') – Jivan

+0

是否可以模拟它? – jezrael

+0

这可能是由于事实上,'second'和'color'可能具有相同的标签 - 我会尝试更改此 – Jivan