2016-01-29 44 views
-1

我有一个数据帧df随机值应用功能与外部条件

df <- data.frame(x1=runif(20,1,200),x2=runif(20,1,18),x3=runif(20,1,7),x4=runif(20,1,3),x5=runif(20,1,25),x6=runif(20,1,220),x7=runif(20,1,10),x8=runif(20,1,8),x9=runif(20,1,20),x10=runif(20,1,32)) 
df 

      x1  x2  x3  x4  x5   x6  x7  x8  x9  x10 
1 43.942462 14.983885 4.267664 2.591210 19.650770 95.710478 8.830253 7.089017 5.341859 3.574852 
2 185.965077 8.099796 3.592361 1.953196 8.837645 111.846707 8.180938 3.355258 13.889081 26.878697 
3 83.532083 2.782204 3.160955 1.892041 23.216698 80.521986 3.864614 6.799805 17.493065 9.246177 
4 48.416861 17.019713 5.182366 2.501890 8.108828 219.419766 4.687034 6.785789 2.525997 7.145447 
5 66.766778 11.716819 1.649946 2.136352 2.957554 126.164722 9.980739 1.919323 16.556541 5.447096 
6 78.305312 12.148354 6.408544 2.644811 10.362618 53.112153 1.092853 1.360766 6.693875 17.108564 
7 64.995759 13.385556 3.375907 1.923173 19.732286 219.780082 4.074889 4.609356 7.098822 25.412262 
8 196.463100 17.491693 2.317492 2.573539 24.350820 36.696244 6.277854 6.247473 5.535765 12.121822 
9 48.467431 11.659182 4.324854 1.380067 15.269617 102.453557 2.724937 1.481521 14.916894 3.451188 
10 134.913063 8.927522 2.637946 1.526043 17.956797 49.671752 5.014152 4.737910 4.241197 28.916885 
11 190.841615 2.639374 5.038702 2.806088 15.127840 8.841983 2.155842 7.589245 13.799412 28.025792 
12 46.963826 11.212431 4.944327 2.937039 16.410549 25.048928 6.330826 5.006221 2.986566 17.005088 
13 97.258821 17.847892 6.202023 2.228292 19.804482 159.922462 2.587568 4.175234 5.360039 15.812061 
14 123.439971 15.415940 5.785273 2.075161 11.496406 12.449913 6.484951 7.911373 11.578242 22.398292 
15 4.225315 11.775122 6.908108 2.980960 22.768381 109.853774 2.535843 7.293656 13.290552 29.302949 
16 49.927327 4.086780 3.941200 1.129892 18.200466 164.281496 6.881178 6.199219 4.091858 29.963647 
17 105.716881 12.421335 6.527660 2.767754 22.055987 208.188895 8.125112 7.702927 3.027778 20.080756 
18 195.205248 5.749007 6.204989 1.815563 3.875226 200.608675 1.500572 7.116924 1.608354 13.292293 
19 27.564433 16.788191 1.648707 2.360290 22.539064 192.914543 1.327605 6.096303 7.105979 22.650040 
20 122.620812 11.475314 5.588179 1.884028 3.692936 200.056348 3.248232 1.562624 18.998767 29.424066 

和与对应于每一列中df一定值的向量indind中的值是标准化程序的指标。

ind 
x1  x2  x3  x4  x5  x6  x7  x8  x9 x10 
0.800 1.000 0.400 0.010 6.000 0.100 0.180 0.006 10.000 1.000 

现在我需要编写在df施加期望函数的每个值在一列中,如果其ind对应的值等于或高于某个阈值的代码。

有关示例,如果该阈值是0.8,在df受影响的列将是x1x2x5x9,和x10

我试过类似apply(df,2,function(x)...但我没有足够的技巧来插入明显需要的ifelse

+0

试试'lapply(df [ind> = thresh],myfun)' –

回答

0
apply(df[,ind>=threshold],2,function(x) {... 

应该完成这项工作。

+0

这很好。但是不要使用'apply'来超过保证金2的数据。框架 –

+0

谢谢大家的意见。你能否澄清一下为什么不应该在数据框架上使用'apply'而不是margin 2? –

+0

@OlliJ,因为这会将你的数据转换为矩阵。通常使用'lapply'来代替。 –

0

刚子集数据框来选择适合你的阈值标准列df[,ind >= threshold]

> df <- data.frame(x1=runif(20,1,200),x2=runif(20,1,18),x3=runif(20,1,7), 
+     x4=runif(20,1,3),x5=runif(20,1,25),x6=runif(20,1,220), 
+     x7=runif(20,1,10),x8=runif(20,1,8),x9=runif(20,1,20), 
+     x10=runif(20,1,32)) 
> df 
      x1  x2  x3  x4  x5  x6  x7  x8  x9  x10 
1 144.616823 5.066181 6.577798 1.941608 19.250274 79.88517 3.487795 5.397938 19.226113 9.469675 
2 143.563809 1.608130 6.446411 2.071802 12.636476 193.12108 5.685940 1.643825 11.111157 5.676330 
3 124.396884 3.693522 3.660122 1.346020 21.605446 98.05443 1.965067 5.332797 7.879099 2.252806 
4 75.936173 9.596695 1.130494 2.014904 19.460231 195.26396 1.132060 6.338672 4.077532 31.391598 
5 8.913065 1.170144 3.197571 1.011599 3.970510 211.02147 6.483770 5.654871 12.669959 16.107771 
6 177.159043 11.625298 6.282233 1.576242 13.997114 119.77652 9.507075 4.430359 13.564249 1.256496 
7 70.383858 14.545778 2.018208 2.990232 3.391777 83.47019 2.232830 7.433207 1.809452 18.440641 
8 48.883343 8.747942 4.473287 1.163179 13.949834 77.34972 3.959059 1.320038 10.385028 2.291721 
9 85.618694 5.421825 4.675017 1.935956 10.877031 185.46998 7.548788 4.160425 16.304787 23.373557 
10 152.615778 12.088414 2.154604 2.266516 20.823971 159.11784 7.047805 3.570086 18.286411 22.731629 
11 83.139696 1.909547 3.124565 2.580109 4.726824 100.42819 1.994356 2.910579 2.034623 26.973796 
12 85.488980 4.193826 2.051200 1.063903 18.012469 210.97311 5.783519 5.846847 9.931950 17.261856 
13 172.446057 14.226508 3.080864 2.153755 6.844384 201.36755 1.593935 4.389736 10.549154 12.728925 
14 23.892525 13.907691 2.494084 1.658334 11.922202 159.96523 1.605302 4.113502 7.151511 11.186883 
15 24.836826 16.390015 2.989483 2.327674 17.067639 44.66071 5.275591 2.970786 6.068440 1.898431 
16 84.552408 6.670091 3.059626 1.693665 6.243420 175.88141 9.638818 2.090328 17.085817 23.759445 
17 29.615649 12.239127 5.728309 1.034658 3.793404 17.34458 2.211930 7.648141 13.080505 21.024933 
18 106.492512 13.543715 3.244059 2.167515 21.803114 204.25419 7.807202 1.519835 1.117334 9.732187 
19 156.503788 16.186274 4.825950 2.019083 6.594384 61.66293 9.693650 5.181686 10.884431 23.105221 
20 196.592843 6.461601 4.183722 1.742368 21.129107 175.12238 9.239206 6.657412 8.371315 15.648119 
> ind <- c(0.800, 1.000, 0.400, 0.010, 6.000, 0.100, 0.180, 0.006, 10.000, 1.000) 
> ind 
[1] 0.800 1.000 0.400 0.010 6.000 0.100 0.180 0.006 10.000 1.000 
> sapply(df[,ind>=0.8], function(x){max(x)}) 
     x1  x2  x5  x9  x10 
196.59284 16.39002 21.80311 19.22611 31.39160 

[编辑,以避免在使用应用在尺寸2]

0

我想你可以采用这样的方法:

df <- data.frame(x1=runif(20,1,200),x2=runif(20,1,18),x3=runif(20,1,7),x4=runif(20,1,3),x5=runif(20,1,25),x6=runif(20,1,220),x7=runif(20,1,10),x8=runif(20,1,8),x9=runif(20,1,20),x10=runif(20,1,32)) 
ind <- c(0.8,1.0,0.4,0.01,6.0,0.1,0.18, 0.006, 10.0, 1.0) 
threshold <- 0.8 

m<- ind>=0.8 
index<- m %in% c(TRUE) 
df2<-df[,index] 
df3<-apply(df,2,scale) 

规范化的功能可以自行选择。