2016-11-14 64 views
0

我有一个数据框有“CUSTOMER_MAILID”,“OFFER_NAME”,“OFFER_ISAPPLIED”列。如何更新基于其他列的PySpark中的列?

样本数据:

+--------------------+--------------------+---------------+ 
|  CUSTOMER_MAILID|   OFFER_NAME|OFFER_ISAPPLIED| 
+--------------------+--------------------+---------------+ 
|pushpendrakaushik...|Jaipur Pink Panth...|    N| 
|pushpendrakaushik...|Jaipur Pink Panth...|    N| 
|[email protected]|     |    N| 
|spdadhichassociat...|     |    N| 
|[email protected]|Jaipur Pink Panth...|    N| 
|[email protected]|     |    N| 
| [email protected]|     |    N| 
|[email protected]|     |    N| 
| [email protected]|Jaipur Pink Panth...|    N| 

我想用 “Y” 更新 “OFFER_ISAPPLIED” 列值,如果 “OFFER_NAME” 列有一定价值的,除空。

我该如何实现它?

输出应该是这样的:

+--------------------+--------------------+---------------+ 
|  CUSTOMER_MAILID|   OFFER_NAME|OFFER_ISAPPLIED| 
+--------------------+--------------------+---------------+ 
|pushpendrakaushik...|Jaipur Pink Panth...|    Y| 
|pushpendrakaushik...|Jaipur Pink Panth...|    Y| 
|[email protected]|     |    N| 
|spdadhichassociat...|     |    N| 
|[email protected]|Jaipur Pink Panth...|    Y| 
|[email protected]|     |    N| 
| [email protected]|     |    N| 
|[email protected]|     |    N| 
| [email protected]|Jaipur Pink Panth...|    Y| 

回答

1

用途:

from pyspark.sql.functions import * 

df.withColum("OFFER_ISAPPLIED", 
    when(col("OFFER_NAME").isNull(), "N").otherwise("Y")) 
+0

它的工作....谢谢:) –