如何获取Column的名称或更改现有的名称？

我必须建立一个函数“removePunctuation”这条标点符号和结果通过这项测试任务：如何获取Column的名称或更改现有的名称？

# TEST Capitalization and punctuation (4b) 
testPunctDF = sqlContext.createDataFrame([(" The Elephant's 4 cats. ",)]) 
testPunctDF.show() 
Test.assertEquals(testPunctDF.select(removePunctuation(col('_1'))).first()[0], 
        'the elephants 4 cats', 
        'incorrect definition for removePunctuation function')

这是我设法写。

def removePunctuation(column): 
    """Removes punctuation, changes to lower case, and strips leading and trailing spaces. 

    Note: 
     Only spaces, letters, and numbers should be retained. Other characters should should be 
     eliminated (e.g. it's becomes its). Leading and trailing spaces should be removed after 
     punctuation is removed. 

    Args: 
     column (Column): A Column containing a sentence. 

    Returns: 
     Column: A Column named 'sentence' with clean-up operations applied. 
    """ 

    return lower(trim(regexp_replace("column_name", "[\W_]+"," "))).alias("sentence");

但我仍然不能使函数regexp_replace使用别名“句子”。我收到此错误：

AnalysisException: u"cannot resolve 'sentence' given input columns: [_1];"

来源

2016-09-03 Dmitrij Kostyushko

我会尝试：

stringWithPunctuation.translate(None, string.punctuation)

它采用c引擎盖下，简直是最好的在效率方面！

你尝试：

return lower(trim(regexp_replace(, "[\W_]+"," "))).alias("sentence");

似乎并没有使用参数column任何地方，这也许可以解释的错误。

来源

2016-09-03 17:47:50 gsamaras

哦对不起，在我发布的代码中有一个错误，在regexp_replace（）第一个参数中必须有bean“column_name”，无论如何，我已经解决了它，但谢谢。 –

@DmitrijKostyushko很高兴你解决了它！如果我知道您的问题中的代码不是您正在使用的代码，我可能会发布更好的问题。请记住稍后再接受答案。 ;） – gsamaras

令人惊讶的是我只能通过regexp_replace()参数中的列对象而不是列名。

来源

2016-09-03 17:48:31

如何获取Column的名称或更改现有的名称？

回答

相关问题