2017-12-27 209 views
0

当我尝试计算每个组的记录数时,我发现该组具有空值但没有记录,但这是不正确的。Apache Spark计数记录每个组的空值

输入数据帧:

+--------+ 
| Name| 
+--------+ 
| Andrei| 
| Andrei| 
| null| 
| null| 
|Grigorii| 
+--------+ 

代码:

Dataset<Row> df = inputDf.groupBy("Name") 
      .agg(functions.count("Name").as("Name_count")); 

实际数据框:

+--------+----------+ 
| Name|Name_count| 
+--------+----------+ 
| null|   0| 
| Andrei|   2| 
|Grigorii|   1| 
+--------+----------+ 

预期的数据帧:

+--------+----------+ 
| Name|Name_count| 
+--------+----------+ 
| null|   2| 
| Andrei|   2| 
|Grigorii|   1| 
+--------+----------+ 

回答

0

此作品:

Dataset<Row> storageFrame = leftDataset.groupBy("Name") 
      .agg(functions.count("*").as("Name_count"));