2015-05-02 59 views
2

两者都返回每个组的第一行的DataFrame。在阅读API参考时,它首先说的是“计算第一组值”,但当同时查看两个输出时,我没有看到重大区别。groupby.first()和groupby.head(1)有什么区别?

我错过了什么吗?

df = pd.DataFrame({'id' : [1,1,1,2,2,3,3,3,3,4,4,5,6,6,6,7,7], 
        'value' : ["first","second","second","first", 
           "second","first","third","fourth", 
           "fifth","second","fifth","first", 
           "first","second","third","fourth","fifth"]}) 

First API

回答

3

的主要区别是,将first()跳到第一非空值,而head(1)不会。

如果我放弃np.nan到实例:

df = pd.DataFrame({'id' : [1,1,1,2,2,3,3,3,3,4,4,5,6,6,6,7,7], 
        'value' : [np.nan,"second","second","first", 
           "second","first","third","fourth", 
           "fifth","second","fifth","first", 
           "first","second","third","fourth","fifth"]}) 

然后我们有:(。而且,正如你看到的,head()重置指数)

>>> df.groupby('id').head(1) 
    id value 
0 1  NaN  # NaN is included 
3 2 first 
5 3 first 
9 4 second 
11 5 first 
12 6 first 
15 7 fourth 

>>> df.groupby('id').first() 
    value 
id   
1 second   # NaN is skipped 
2 first 
3 first 
4 second 
5 first 
6 first 
7 fourth 

+0

非常感谢 – canyon289