Spark Hive - 带窗口函数的UDFArgumentTypeException？

我有以下DF：Spark Hive - 带窗口函数的UDFArgumentTypeException？

+------------+----------------------+-------------------+         
|increment_id|base_subtotal_incl_tax|   eventdate|         
+------------+----------------------+-------------------+         
|  1086|   14470.0000|2016-06-14 09:54:12|         
|  1086|   14470.0000|2016-06-14 09:54:12|         
|  1086|   14470.0000|2015-07-14 09:54:12|         
|  1086|   14470.0000|2015-07-14 09:54:12|         
|  1086|   14470.0000|2015-07-14 09:54:12|         
|  1086|   14470.0000|2015-07-14 09:54:12|         
|  1086|    1570.0000|2015-07-14 09:54:12|         
|  5555|   14470.0000|2014-07-14 09:54:12|         
|  5555|   14470.0000|2014-07-14 09:54:12|         
|  5555|   14470.0000|2014-07-14 09:54:12|         
|  5555|   14470.0000|2014-07-14 09:54:12|         
+------------+----------------------+-------------------+

我想运行一个窗口功能：

WindowSpec window = Window.partitionBy(df.col("id")).orderBy(df.col("eventdate").desc()); 
df.select(df.col("*"),rank().over(window).alias("rank")) //error for this line 
     .filter("rank <= 2") 
     .show();

我想要得到的是最后两个条目（最后为最新的日期，但因为它是由下降，前两行）为每个用户下令：

+------------+----------------------+-------------------+         
|increment_id|base_subtotal_incl_tax|   eventdate|         
+------------+----------------------+-------------------+         
|  1086|   14470.0000|2016-06-14 09:54:12|         
|  1086|   14470.0000|2016-06-14 09:54:12| 
|  5555|   14470.0000|2014-07-14 09:54:12|         
|  5555|   14470.0000|2014-07-14 09:54:12|          
+------------+----------------------+-------------------+

，但我得到这个：

+------------+----------------------+-------------------+----+ 
|increment_id|base_subtotal_incl_tax|   eventdate|rank|        
+------------+----------------------+-------------------+----+        
|  5555|   14470.0000|2014-07-14 09:54:12| 1|        
|  5555|   14470.0000|2014-07-14 09:54:12| 1|        
|  5555|   14470.0000|2014-07-14 09:54:12| 1|        
|  5555|   14470.0000|2014-07-14 09:54:12| 1|        
|  1086|   14470.0000|2016-06-14 09:54:12| 1|        
|  1086|   14470.0000|2016-06-14 09:54:12| 1|        
+------------+----------------------+-------------------+----+

我错过了什么？

[老] - 原来，我有一个错误，这是目前解决：

WindowSpec window = Window.partitionBy(df.col("id")); 
df.select(df.col("*"),rank().over(window).alias("rank")) //error for this line 
     .filter("rank <= 2") 
     .show();

然而这会返回一个错误Exception in thread "main" org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException: One or more arguments are expected.对于上面标有注释的行。我错过了什么？这个错误是什么意思？谢谢！

来源

2016-07-20 lte__

rank窗函数需要与orderBy例如一个窗口，子句：

WindowSpec window = Window.partitionBy(df.col("id")).orderBy(df.col("payment"));

如果没有一个顺序是根本没有意义的，因此，该错误。

来源

2016-07-20 08:17:18 zero323

谢谢！我会接受你的回答，但更新了我的问题。如果你也可以帮助我，我会非常感激。 –

Spark Hive - 带窗口函数的UDFArgumentTypeException？

回答

相关问题