在星火列编写自定义的Spark功能/数据帧

Input: orčpžsíáýd 
Output: orcpzsiayd

下面的代码可以让我实现这一目标。我怎么一直不知道如何使用spark函数做到这一点，其中我的输入是dataframe col。

def stringNormalizer(c : Column) = (
    import org.apache.commons.lang.StringUtils 
    return StringUtils.stripAccents(c.toString) 
)

路上，我应该能够把它

val normalizedAuthor = flat_author.withColumn("NormalizedAuthor",  
stringNormalizer(df_article("authors")))

我刚开始学习的火花。所以请让我知道是否有更好的方法来实现这个没有UDFs。

它需要一个UDF：

val stringNormalizer = udf((s: String) => StringUtils.stripAccents(s)) 

df_article.select(stringNormalizer(col("authors")))

2016-03-16 21:53:27

回答