熊猫集合数据帧只返回一列

我有一个熊猫数据帧（DF）是这样的：

 foo id1 bar id2 
0 8.0 1 NULL 1 
1 5.0 1 NULL 1 
2 3.0 1 NULL 1 
3 4.0 1  1 2 
4 7.0 1  3 2 
5 9.0 1  4 3 
6 5.0 1  2 3 
7 7.0 1  3 1 
...

我想通过组ID1和ID2，并试图让foo和bar的平均值。

我的代码：

res = df.groupby(["id1","id2"])["foo","bar"].mean()

我所得到的几乎是我所期望的：

  foo 
id1 id2   
1 1 5.750000 
    2 7.000000 
2 1 3.500000 
    2 1.500000 
3 1 6.000000 
    2 5.333333

列“富”的值是完全相同的平均值（手段），我找但我的列“酒吧”在哪里？

所以，如果这将是SQL我一直在寻找像一个结果： “选择平均由ID1，ID2数据帧组（富），AVG（条）;” （对不起，这一点，但我更多SQL的人与新来的大熊猫，但现在我需要它）

我试过交替。

groupedFrame = res.groupby(["id1","id2"]) 
aggrFrame = groupedFrame.aggregate(numpy.mean)

这给了我完全相同的结果，仍然缺少栏“栏”。

网站我读：

http://wesmckinney.com/blog/groupby-fu-improvements-in-grouping-and-aggregating-data-in-pandas/
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.aggregate.html
和文件组通过，但我不能在这里张贴的链接。

我在做什么错？ - 在前面感谢。

来源

2017-06-15 dlg_

有问题您的列bar不是数字，所以聚合函数省略它。

您可以检查被遗漏的列dtype - 不是数字：

print (df['bar'].dtype) 
object

您可以检查automatic exclusion of nuisance columns。

解决方案是之前汇总转换string值numeric，如果没有可能，to_numeric和参数errors='coerce'添加NaN S：

df['bar'] = pd.to_numeric(df['bar'], errors='coerce') 
res = df.groupby(["id1","id2"])["foo","bar"].mean() 
print (res) 
      foo bar 
id1 id2   
1 1 5.75 3.0 
    2 5.50 2.0 
    3 7.00 3.0

但如果有混合数据 - 数字与strings可以使用replace：

df['bar'] = df['bar'].replace("NULL", np.nan)

来源

2017-06-15 12:12:17 jezrael

感谢。现在它工作正常。也许我忘了，NULL不是在SQL中相同。 –

很高兴能帮到;） – jezrael

如前所述，您应该在取平均值之前替换您的NULL值

个

df.replace("NULL",-1).groupby(["id1","id2"])["foo","bar"].mean()

输出

id1 id2 foo bar 
1 1 5.75 3.0 
1 2 5.5 2.0 
1 3 7.0 3.0

来源

2017-06-15 12:16:25 Tbaki

熊猫集合数据帧只返回一列

回答

相关问题