2011-03-23 40 views
6

我正在对银行方面的非违约者和违约者进行一些研究。在这种情况下,我将他们的分布相对于条形图中的某个分数进行绘图。评分越高,信用评级越高。从ggplot条形图中的勾号中删除几个文本标记阴谋

由于与非默认值的数量相比,默认值的数量非常有限,因此绘制默认值和非默认值在同一条形图上并不是很给力,因为您几乎看不到默认值。然后,我根据违纪者的分数创建第二个条形图,但是与违纪者和非违约者的分数的完整条形图相同。然后,我在第一个条形图中添加垂直线,指示最高和最低违规分数所在的位置。那就是要弄清楚违约者的分布在哪里适合违约者和非违约者的整体分配。

x轴容易变得非常“拥挤”。我想删除一些滴答的文本,但不是所有的滴答标记。

下面是我正在使用的代码替换(种子)随机数据。

第一个条形图是我想要的关于刻度线上的文本,但是我缺少第二个条形图中的所有刻度。在第二个柱状图中显示“拥挤”状态!

library(ggplot2) 
library(ggExtra) 

#NDS represents non-defaults and DS defaults on the same scale 
#although here being just some random normals for the sake of simplicity. 
set.seed(10) 
NDS<-rnorm(10000,sd=1)-2 
DS<-rnorm(100,sd=2)-5 

#Cutoffs are constructed such that intervals of size 0.3 contain all values 
#of NDS & DS 
minCutoff<--9.3 
maxCutoff<-2.1 

#Generate the actual interval "bins" 
NDS_CUT<-cut(NDS,breaks=seq(minCutoff, maxCutoff, by = 0.3)) 
DS_CUT<-cut(DS,breaks=seq(minCutoff, maxCutoff, by = 0.3)) 

#Manually generate where to put the vertical lines for min(DS) and max(DS) 
minDS_bar<-levels(cut(NDS,breaks=seq(minCutoff, maxCutoff, by = 0.3)))[1] 
maxDS_bar<-levels(cut(NDS,breaks=seq(minCutoff, maxCutoff, by = 0.3)))[32] 

#Generate data frame - seems stupid, but makes sense 
#when the "real" data is used :-) 
NDSdataframe<-cbind(as.data.frame(NDS_CUT),rep(factor("State-1"),length(NDS_CUT))) 
colnames(NDSdataframe)<-c("Score","Action") 
DSdataframe<-cbind(as.data.frame(DS_CUT),rep(factor("State-2"),length(DS_CUT))) 
colnames(DSdataframe)<-c("Score","Action") 
fulldataframe<-rbind(NDSdataframe,DSdataframe) 
attach(fulldataframe) 

#Plot the full distribution of NDS & DS with geom_vlines 

#Get the tick texts I want to show 
myLevels<-levels(cut(NDS,breaks=seq(roundDownNDS, roundUpNDS, by = 0.3))) 
lengthMyLevels<-length(myLevels) 
myBreaks<-seq(1,lengthMyLevels,3) 
chosenbreaks<-myLevels[myBreaks[1]] 
for(i in 2:length(myBreaks)) 
{ 
chosenbreaks<-rbind(chosenbreaks,myLevels[myBreaks[i]]) 
} 


#Generate the plot of both NDS & DS 
fullplot<-ggplot(fulldataframe, aes(Score, fill=factor(Action,levels=c("State- 2","State-1")))) + geom_bar(position="stack") + opts(axis.text.x = theme_text(angle = 45,size=8)) + opts(legend.position = "none") + xlab("Scoreinterval") + ylab("Antal pr. interval") + geom_vline(aes(xintercept = minDS_bar, colour="red")) + geom_vline(aes(xintercept = maxDS_bar, colour="red")) + scale_x_discrete("test",breaks=chosenbreaks) 

#Generate second dataframe for the plot of DS only 
DSdataframe2<-cbind(na.omit(as.data.frame(DS_CUT)),rep(factor("Fallit"),length (na.omit(as.data.frame(DS_CUT))))) 
colnames(DSdataframe2)<-c("theScore","theAction") 

#Calculate max value for the DS 
myMax<-max(table(DSdataframe2))+1 

attach(DSdataframe2) 

#Generate plot for the DS only 
subplot<-ggplot(fulldataframe, aes(theScore, fill=factor(theAction))) + geom_bar (position="stack") + opts(axis.text.x = theme_text(angle = 45)) + opts(legend.position = "none") + ylim(0, myMax) + xlab("Scoreinterval") + ylab("Antal pr. interval") 

#Using the ggExtra package the to plots are aligned 
align.plots(fullplot, subplot) 

detach(DSdataframe2) 
detach(fulldataframe) 

任何帮助非常感谢!

感谢,

基督教

回答

3

如果我理解正确的话,你可以简单地为每一个其它标签指定空的文本标签,

library(ggplot2) 

interleave <- function(x,y){ 
    lx <- length(x) 
    ly <- length(y) 
    n <- max(lx,ly) 
    as.vector(rbind(rep(x, length.out=n), rep(y, length.out=n))) 
} 

d <- data.frame(x=1:10, y=rnorm(10)) 

my_breaks <- seq(1,10,by=1) 
my_labs <- interleave(seq(1,10,by=2), "") 

qplot(x,y,data=d)+ 
    scale_x_continuous(breaks=my_breaks, labels=my_labs) 

enter image description here

+0

甜蜜的解决方法!非常感谢! – 2011-03-24 08:44:25

+0

在当前的ggplot2库(版本2.0.0)中找不到'ggplot2 :: interleave'。 这个功能的替代方案是什么? – user890739 2016-01-11 21:34:38

+0

@ user890739您将需要一个三重冒号来访问交织函数'ggplot2 ::: interleave'。尽管如此,请记住使用三重冒号需要您自担风险,因为这些功能是保留在包装内部的;请参阅:https://stat.ethz.ch/R-manual/R-devel/library/base/html/ns-dblcolon.html – aseagram 2016-07-04 06:03:10

0

这里是基于另一个版本, @baptiste,可以轻松选择偶数或奇数,每三分之一等。

library(ggplot2) 
library(gridExtra) 

## helper function 
## periodically replace orig with .fill 
label_fill <- function(orig, .offset=0, .mod=2, .fill=""){ 
    ## replace 
    ii <- as.logical(
     ## offset==0 keeps first 
     (1:length(orig)-1+.offset) %% .mod 
    ) 
    orig[ii] <- .fill 
    orig 
} 

## data, labels 
nn <- 10 
my_dat <- data.frame(x=1:nn, y=rnorm(nn)) 
my_breaks <- my_dat$x 

my_plot <- (
    ggplot(my_dat, aes(x,y)) 
    + geom_line() 
    ## default: every other, start from 1 
    ## by default, function takes breaks 
    + scale_x_continuous(
     breaks=my_breaks, labels=label_fill 
    ) 
    + theme_bw() 
) 

## another form 
## manually pass breaks 
every_third <- scale_x_continuous(
    breaks=my_breaks, 
    labels=label_fill(my_breaks, .mod=3) 
) 

## side-by-side 
grid.arrange(ncol=2, 
    my_plot, 
    ## every third with offset 
    my_plot + every_third 
) 

enter image description here