2015-09-25 33 views
1

我有一个数据帧:我怎样才能在空场统计数据帧

| city | field2 | field3 | field4 | field5 | 
| 1 | a |  | b | b | 
| 2 |  |  | c |  | 
| 3 |  | a |  |  | 
| 4 | a |  |  |  | 
| 1 |  | a |  | b | 
| 2 | b |  | c |  | 
| 4 |  | a |  |  | 
| 3 |  |  | a |  | 
| 2 | b |  |  |  | 
| 1 |  | a |  | b | 
| 2 |  |  | a |  | 
| 3 | a |  |  | b | 
| 1 |  |  | b |  | 
| 1 | b | a |  |  | 
| 2 |  |  | b | b | 
| 1 | b | a |  | b | 

我需要在这里是统计上的场“城市”群空白字段的列表。

| city | field2 | field3 | field4 | field5 | 
| 1 | 3 | 2 | 4 | 2 | 
| 2 | 3 | 5 | 1 | 4 | 
| 3 | 2 | 2 | 2 | 2 | 
| 4 | 1 | 1 | 2 | 2 | 

我该如何用python熊猫做到这一点?

+0

你是如何确定的值,以填补在? – rurp

+0

@rurp这是这个“城市”的空白单元格的字段数。例如city 1在field2中有3个空白单元格,在field3中有2个空白单元格等。 – NCNecros

+0

你是什么意思的空白单元格?这是否意味着NaN或其他? – rurp

回答

3
import pandas as pd 
import numpy as np 

df = pd.DataFrame({ 
    "city": [1,2,1,2,1,2], 
    "field2": [np.nan, "a", np.nan, np.nan, "b", np.nan], 
    "field3": [np.nan, np.nan, np.nan, "b", "a", "b"], 
    }) 
df 

这是我的示例数据:

city field2 field3 
0 1 NaN NaN 
1 2 a NaN 
2 1 NaN NaN 
3 2 NaN b 
4 1 b a 
5 2 NaN b 

现在的逻辑:

# define a function that counts the number of `nan` in a series. 
def count_nan(col): 
    return col.isnull().sum() 

# group by city and count the number of `nan` per city 
df.groupby("city").agg({"field2": count_nan, "field3": count_nan}) 

这是输出:

field2 field3 
city   
1 2 2 
2 2 1 
+1

我更喜欢使用'col.isnull()。sum()'。 –

+0

谢谢。如果你想不为null,条件是什么?例如'==“”' – NCNecros

+0

@AndyHayden,你的版本更具可读性,甚至更快。 – cel