2013-01-22 79 views
13

我需要将cut应用于连续变量,以在ggplot2中用布鲁尔色标显示,如Setting breakpoints for data with scale_fill_brewer() function in ggplot2中所示。连续变量是一个相对差异,我想将数据格式设置为“18.2%”而不是“0.182”。有没有简单的方法来实现这一目标?将cut()生成的标签格式化为百分比

x <- runif(100) 
levels(cut(x, breaks=10)) 

[1] "(0.0223,0.12]" "(0.12,0.218]" "(0.218,0.315]" "(0.315,0.413]" 
[5] "(0.413,0.511]" "(0.511,0.608]" "(0.608,0.706]" "(0.706,0.804]" 
[9] "(0.804,0.901]" "(0.901,0.999]" 

我想,例如,第一级显示为(2.23 %, 12 %]cut有没有更好的选择?

+1

+1用于清晰的问题标题,可重复的代码和明确的目标。人们可以从这篇文章中学习。 –

回答

15

我在kimisc包0.2-3版本中实现cut_format(),版本0.3是现在CRAN。

# devtools::install_github("krlmlr/kimisc") 
x <- seq(0.1, 0.9, by = 0.2) 

breaks <- seq(0, 1, by = 0.25) 

cut(x, breaks) 
## [1] (0,0.25] (0.25,0.5] (0.25,0.5] (0.5,0.75] (0.75,1] 
## Levels: (0,0.25] (0.25,0.5] (0.5,0.75] (0.75,1] 

cut_format(x, breaks, format_fun = scales::percent) 
## [1] (0%, 25%] (25%, 50%] (25%, 50%] (50%, 75%] (75%, 100%] 
## Levels: (0%, 25%] (25%, 50%] (50%, 75%] (75%, 100%] 

它仍然不完美,传递中断的数量(如原始示例中)不起作用。

+0

这太神奇了! – novice

8

使用gsub与乘以100的原始数据后,一些正则表达式:

gsub("([0-9.]+)","\\1%",levels(cut(x*100,breaks=10))) 
[1] "(0.449%,10.4%]" "(10.4%,20.3%]" "(20.3%,30.2%]" "(30.2%,40.2%]" "(40.2%,50.1%]" "(50.1%,60%]" "(60%,69.9%]" "(69.9%,79.9%]" "(79.9%,89.8%]" "(89.8%,99.7%]" 
+0

只是也砍了类似的东西。我认为一定有更好的办法:-) – krlmlr

+0

我认为这可能是最简单的方法,一旦你创建了文本标签,乘以100的步骤将很难做到。 – James

+0

我正在考虑'cut(x,labels = function(lo,hi)paste0(...))'...... – krlmlr

5

为什么不可复制的代码cut.default和改良制作自己的版本?请参阅this gist

两条线被改变:

第22行:ch.br <- formatC(breaks, digits = dig, width = 1)变更为ch.br <- formatC(breaks*100, digits = dig, width = 1)

第29行:else "[", ch.br[-nb], ",", ch.br[-1L], if (right)改为else "[", ch.br[-nb], "%, ", ch.br[-1L], "%", if (right)

的其余部分是相同的。在这里它的行动:

library(devtools) 
source_gist(4593967) 

set.seed(1) 
x <- runif(100) 
levels(cut2(x, breaks=10)) 
# [1] "(1.24%, 11%]" "(11%, 20.9%]" "(20.9%, 30.7%]" "(30.7%, 40.5%]" "(40.5%, 50.3%]" 
# [6] "(50.3%, 60.1%]" "(60.1%, 69.9%]" "(69.9%, 79.7%]" "(79.7%, 89.5%]" "(89.5%, 99.3%]" 
+0

我已经设法使用一行代码来完成它:https:// gist.github.com/4594243。但是,这会失去','之后的空间,为此需要第二个参数“cut”。如果没有其他问题出现,我们会建议扩展R的'cut.default'。 – krlmlr

2

一个古老的问题的新答案。

您可以使用label参数传递函数来格式化标签。我将使用gsubfnscales::percent

library(gsubfn) 
library(scales) 
pcut <- function(x) gsubfn('\\d\\.\\d+', function(x) percent(as.numeric(x)),xx) 
d <- data.frame(x=runif(100)) 

ggplot(d,aes(x=x,y=seq_along(x))) + 
geom_point(aes(colour = cut(x, breaks = 10))) + 
scale_colour_brewer(name = 'x', palette = 'Spectral', label = pcut) 

enter image description here

+0

感谢您的意见,这确实是另一个不错的选择。我刚刚发布了关于在这里合并数据的复杂问题的想法:http://stackoverflow.com/a/17438591/946850。主要问题是数据,切割算法,断点数和调色板都相互影响,但分散在几个不同的函数调用中。这不应该成为R包的一部分,你觉得呢? – krlmlr

+0

@krlmlr是的,但我通常沿'scale_colour_gradientn'道路行驶,例如'ggplot(d,aes(x = x,y = seq_along(x)))+ geom_point(aes(color = x))+ scale_colour_gradientn = brewer.pal('Spectral',n = 10),label = percent)' - 这会杀死一些小猫。 – mnel

+0

是的,当然,甚至有一个[pull request](https://github.com/hadley/ggplot2/pull/439)解决了这个问题,然后它必须是一个小猫杀手机器。我正在考虑使用它,但现在我发现离散化/切割/分档方法更清晰。就是这样,目前,似乎没有完整的符号。 – krlmlr