我对R和统计数据非常陌生,并且无法使tapply()正常工作。我有一个有15列和数千行的数据框。我用y1<-((x>0)&(x<=5))
等类似的东西做了一堆逻辑向量,其中x是数据帧中的列名。然后将这些逻辑向量组合并使用因子()将其转换为分组因子。一切看起来都可以正常工作。分组因子,数据框和tapply问题
问题是,当我尝试使用tapply()与tapply(dataframe, group, sample, size=20)
其中group
是分组因子时,我得到错误:'参数必须具有相同的长度'。当我尝试length(dataframe)
时,我得到数据框中的列数(仅15),而length(group)
返回行数(数千)。我在创建逻辑向量和分组因子方面有错误吗?
下面是来自dput()作为Maxim.K建议的输出:(对不起,这不是很整齐)
structure(list(Lat = c(-90L, -90L, -90L, -90L, -90L, -90L, -90L,
-90L, -90L, -90L, -90L, -90L, -90L, -90L, -90L), Lon = -180:-166,
Jan = c(2.79, 2.79, 2.79, 2.79, 2.79, 2.79, 2.79, 2.79, 2.79,
2.79, 2.79, 2.79, 2.79, 2.79, 2.79), Feb = c(2.35, 2.35,
2.35, 2.35, 2.35, 2.35, 2.35, 2.35, 2.35, 2.35, 2.35, 2.35,
2.35, 2.35, 2.35), Mar = c(0.49, 0.49, 0.49, 0.49, 0.49,
0.49, 0.49, 0.49, 0.49, 0.49, 0.49, 0.49, 0.49, 0.49, 0.49
), Apr = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
May = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), Jun = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), Jul = c(0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), Aug = c(0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), Sep = c(0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), Oct = c(1.75, 1.75, 1.75,
1.75, 1.75, 1.75, 1.75, 1.75, 1.75, 1.75, 1.75, 1.75, 1.75,
1.75, 1.75), Nov = c(2.77, 2.77, 2.77, 2.77, 2.77, 2.77,
2.77, 2.77, 2.77, 2.77, 2.77, 2.77, 2.77, 2.77, 2.77), Dec = c(2.65,
2.65, 2.65, 2.65, 2.65, 2.65, 2.65, 2.65, 2.65, 2.65, 2.65,
2.65, 2.65, 2.65, 2.65), Ann = c(1.07, 1.07, 1.07, 1.07,
1.07, 1.07, 1.07, 1.07, 1.07, 1.07, 1.07, 1.07, 1.07, 1.07,
1.07)), .Names = c("Lat", "Lon", "Jan", "Feb", "Mar", "Apr",
"May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec", "Ann"
), row.names = c(NA, 15L), class = "data.frame")
而对于群:从头部
15值(从dput() )
structure(c(8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L,
8L, 8L), .Label = c("1", "2", "3", "4", "5", "6", "7", "8"), class = "factor")
...并从尾部
structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L), .Label = c("1", "2", "3", "4", "5", "6", "7", "8"), class = "factor")
我试图从tapply()(大小为20)的所有8个类别中随机抽取样本。
完全不出所料,问题不在于问题和要求,而在于我的理解。我误解了这个问题;实际上,我只是应该从一列中抽样,而不是从整个数据框中抽样。
如果您提供了一些示例数据,问题会更容易回答。使用'dput(head(yourdata,15))'或某种程度可能会有所帮助。 – 2013-04-22 10:59:45
为了进行比较,您可能需要使用'nrow(dataframe)',它给出了行数,而不是'length(dataframe)',它给出了列数。 – Roland 2013-04-22 11:04:02
谢谢,我刚刚尝试过,并返回正确的行数(即数据框中的行数与分组因子中的行数相同)。 – 2013-04-22 11:08:56