2017-03-12 75 views
0

功能应该在所述第一和第99百分位数采取矢量和winsorize值(与第99百分位,反之亦然下替换值较大的第99百分位为值比所述第一百分位数)。我可以在没有任何错误的情况下运行该函数,但它不会更改作为参数给出的向量。当我在函数外部运行相同的代码时,它运行良好,但我必须为data.frame中的许多列执行此操作,所以我希望能够通过apply函数传递函数。功能到子集和调整矢量

wins <- function(vect, prob = c(0.01, 0.99)){ 
    #vect is a vector with values to be winsorized 
    #prob contains top and bottom percentiles at which to winsorize data in vect 

    low_quantile <- quantile(vect, probs = prob[1], na.rm = TRUE) 
    high_quantile <- quantile(vect, probs = prob[2], na.rm = TRUE) 

    vect[vect < low_quantile] <- low_quantile 
    vect[vect > high_quantile] <- high_quantile 
} 

有什么建议吗?

+0

你可能觉得事情在函数内部发生的神奇影响功能之外的对象。他们不。您需要显式返回vect并将函数的结果分配给新对象或现有对象。 – joran

回答

1

在你的函数的末尾添加vect,使返回的最后一个元素。

wins <- function(vect, prob = c(0.01, 0.99)){ 
#vect is a vector with values to be winsorized 
#prob contains top and bottom percentiles at which to winsorize data in vect 

low_quantile <- quantile(vect, probs = prob[1], na.rm = TRUE) 
high_quantile <- quantile(vect, probs = prob[2], na.rm = TRUE) 

vect[vect < low_quantile] <- low_quantile 
vect[vect > high_quantile] <- high_quantile 
vect 
} 

wins(1:100) 
    [1] 1.99 2.00 3.00 4.00 5.00 6.00 7.00 8.00 9.00 10.00 11.00 12.00 13.00 14.00 15.00 16.00 17.00 18.00 
[19] 19.00 20.00 21.00 22.00 23.00 24.00 25.00 26.00 27.00 28.00 29.00 30.00 31.00 32.00 33.00 34.00 35.00 36.00 
[37] 37.00 38.00 39.00 40.00 41.00 42.00 43.00 44.00 45.00 46.00 47.00 48.00 49.00 50.00 51.00 52.00 53.00 54.00 
[55] 55.00 56.00 57.00 58.00 59.00 60.00 61.00 62.00 63.00 64.00 65.00 66.00 67.00 68.00 69.00 70.00 71.00 72.00 
[73] 73.00 74.00 75.00 76.00 77.00 78.00 79.00 80.00 81.00 82.00 83.00 84.00 85.00 86.00 87.00 88.00 89.00 90.00 
[91] 91.00 92.00 93.00 94.00 95.00 96.00 97.00 98.00 99.00 99.01 

编辑 如何将其应用到data.frame后续问题:

df1 <- data.frame(matrix(1:200,ncol=2)) 
apply(df1,2,wins) # apply by column 
> apply(df1,2,wins) 
      X1  X2 
    [1,] 1.99 101.99 
    [2,] 2.00 102.00 
    [3,] 3.00 103.00 
    [4,] 4.00 104.00 
    [5,] 5.00 105.00 
... 

,你把你的后续它还与一列工作:

wins(df1$X1) 
[1] 1.99 2.00 3.00 4.00 5.00 6.00 7.00 8.00 9.00 10.00 11.00 12.00 13.00 14.00 15.00 16.00 17.00 18.00 
[19] 19.00 20.00 21.00 22.00 23.00 24.00 25.00 26.00 27.00 28.00 29.00 30.00 31.00 32.00 33.00 34.00 35.00 36.00 
[37] 37.00 38.00 39.00 40.00 41.00 42.00 43.00 44.00 45.00 46.00 47.00 48.00 49.00 50.00 51.00 52.00 53.00 54.00 
[55] 55.00 56.00 57.00 58.00 59.00 60.00 61.00 62.00 63.00 64.00 65.00 66.00 67.00 68.00 69.00 70.00 71.00 72.00 
[73] 73.00 74.00 75.00 76.00 77.00 78.00 79.00 80.00 81.00 82.00 83.00 84.00 85.00 86.00 87.00 88.00 89.00 90.00 
[91] 91.00 92.00 93.00 94.00 95.00 96.00 97.00 98.00 99.00 99.01 
+0

谢谢你的回复。出于某种原因,这只有在我定义了一系列值并将其直接传递给它时才起作用。从数据框传递向量列时,它仍然不起作用。我有一个20 colums的数据框,所以当我通过胜利(数据帧$ rowname)它没有任何期望打印原始行。 – claushojmark

+0

它适用于我使用'data.frame'和'apply'。看我的编辑。 –

+0

几乎从不需要在data.frame上使用'apply(df,2,FUN)',而是使用'[lsv] apply'。 – thelatemail