2017-02-04 43 views
2

给定一个数据集相等的宽度合并,我想同时使用相同的频率离散化和相等宽度合并所描述here将其分割成4个箱,但是,我想使用R输入语言。相等的频率和作为R

数据集:

0, 4, 12, 16, 16, 18, 24, 26, 28 

我试图写的宽度相同的分级一些代码,但它只是产生一个直方图。

bins<-4; 
minimumVal<-min(dataset) 
maximumVal<-max(dataset) 
width=(maximumVal-minimumVal)/bins; 
edges = minimumVal:width:maximumVal; 
hist(dataset, breaks = "Sturges", freq = TRUE, xlim = range(edges)) 

我是新来的R,所以就产生这两种中的R binnings的一点点帮助,将不胜感激。

回答

3

的宽度相同的分级,我建议使用classInt包:

dataset <- c(0, 4, 12, 16, 16, 18, 24, 26, 28) 

library(classInt) 
classIntervals(dataset, 4) 
x <- classIntervals(dataset, 4, style = 'equal') 

要使用的休息,你可以检查x$brks

至于频率相等分级,你可以使用相同的封装,选项style = 'quantile'

classIntervals(dataset, 4, style = 'quantile') 

它不会因重复值完全相等大小的垃圾箱中dataset(16)分开,因为由于数据集有9个元素,因此无法将数据集完全分割成4个元素,并且元素数量严格相同。我不知道这是一个问题,因为在提供的参考,它说,

“......每个组包含大约相同数量的值。”

当你没有明确你所寻找的,我建议参照this post的另一种方法的准确方法,在你的例子那就是:

library(Hmisc) 
table(cut2(dataset, m = length(dataset)/4)) 

此外,在其他职位上面提出的链接提供了其他选择和一些关于这些方法的相关讨论

+0

classIntervals完美运行两种类型的分级的。谢谢! –

0

您可以尝试为equal-width-binning如下:

set.seed(1) 
dataset <- runif(100, 0, 10) # some random data 
bins<-4 
minimumVal<-min(dataset) 
maximumVal<-max(dataset) 
width=(maximumVal-minimumVal)/bins; 
cut(dataset, breaks=seq(minimumVal, maximumVal, width)) 

#[1] (2.58,5.03] (2.58,5.03] (5.03,7.47] (7.47,9.92] (0.134,2.58] (7.47,9.92] (7.47,9.92] (5.03,7.47] (5.03,7.47] (0.134,2.58] (0.134,2.58] (0.134,2.58] 
#[13] (5.03,7.47] (2.58,5.03] (7.47,9.92] (2.58,5.03] (5.03,7.47] (7.47,9.92] (2.58,5.03] (7.47,9.92] (7.47,9.92] (0.134,2.58] (5.03,7.47] (0.134,2.58] 
#[25] (2.58,5.03] (2.58,5.03] <NA>   (2.58,5.03] (7.47,9.92] (2.58,5.03] (2.58,5.03] (5.03,7.47] (2.58,5.03] (0.134,2.58] (7.47,9.92] (5.03,7.47] 
#[37] (7.47,9.92] (0.134,2.58] (5.03,7.47] (2.58,5.03] (7.47,9.92] (5.03,7.47] (7.47,9.92] (5.03,7.47] (5.03,7.47] (7.47,9.92] (0.134,2.58] (2.58,5.03] 
#[49] (5.03,7.47] (5.03,7.47] (2.58,5.03] (7.47,9.92] (2.58,5.03] (0.134,2.58] (0.134,2.58] (0.134,2.58] (2.58,5.03] (5.03,7.47] (5.03,7.47] (2.58,5.03] 
#[61] (7.47,9.92] (2.58,5.03] (2.58,5.03] (2.58,5.03] (5.03,7.47] (0.134,2.58] (2.58,5.03] (7.47,9.92] (0.134,2.58] (7.47,9.92] (2.58,5.03] (7.47,9.92] 
#[73] (2.58,5.03] (2.58,5.03] (2.58,5.03] (7.47,9.92] (7.47,9.92] (2.58,5.03] (7.47,9.92] (7.47,9.92] (2.58,5.03] (5.03,7.47] (2.58,5.03] (2.58,5.03] 
#[85] (7.47,9.92] (0.134,2.58] (5.03,7.47] (0.134,2.58] (0.134,2.58] (0.134,2.58] (0.134,2.58] (0.134,2.58] (5.03,7.47] (7.47,9.92] (7.47,9.92] (7.47,9.92] 
#[97] (2.58,5.03] (2.58,5.03] (7.47,9.92] (5.03,7.47] 
#Levels: (0.134,2.58] (2.58,5.03] (5.03,7.47] (7.47,9.92] 

#plot frequencies in the bins 
barplot(table(cut(dataset, breaks=seq(minimumVal, maximumVal, width)))) 

enter image description here