2017-08-04 40 views
0

尽管我发现了相当类似的问题,但我仍然努力使用ggplot2,但我并没有设法实现它。我想按列重新排序,并按照分层聚类排列热图。ggplot2基于分层聚类重新编制热图

这里我实际的代码:

# import 
library("ggplot2") 
library("scales") 
library("reshape2") 

# data loading 
data_frame = read.csv(file=input_file, header=TRUE, row.names=1, sep='\t') 

# clustering with hclust on row and on column 
dd.col <- as.dendrogram(hclust(dist(data_frame))) 
dd.row <- as.dendrogram(hclust(dist(t(data_frame)))) 

# ordering based on clustering 
col.ord <- order.dendrogram(dd.col) 
row.ord <- order.dendrogram(dd.row) 


# making a new data frame reordered 
new_df = as.data.frame(data_frame[col.ord, row.ord]) 
print(new_df) # when mannualy looking new_df it seems working 

# get the row name 
name = as.factor(row.names(new_df)) 

# reshape 
melte_df = melt(cbind(name, new_df)) 

# the solution is here to reorder the name column factors levels. 
melte_df$name = factor(melte_df$name, levels = row.names(data_frame)[as.vector(row.ord)]) 

# ggplot2 dark magic 
(p <- ggplot(melte_df, aes(variable, name)) + geom_tile(aes(fill = value), 
colour = "white") + scale_fill_gradient(low = "white", 
high = "steelblue") + theme(text=element_text(size=12), 
axis.text.y=element_text(size=3))) 

# save fig 
ggsave(file = "test.pdf") 

# result is ordered as only by column what I have missed? 

我有R相当牛逼,如果你可以开发你的答案,你会受到欢迎。

回答

1

没有一个例子集再现,我不是100%肯定这是原因,但我猜想,你的问题依赖于该行:

name = as.factor(row.names(new_df)) 

当您使用的一个因素,排序是基于该因素水平的排序。您可以根据需要对数据框进行重新排序,绘图时使用的顺序将成为关卡的顺序。

下面是一个例子:

data_frame <- data.frame(x = c("apple", "banana", "peach"), y = c(50, 30, 70)) 
data_frame 
     x y 
1 apple 50 
2 banana 30 
3 peach 70 

data_frame$x <- as.factor(data_frame$x) # Make x column a factor 

levels(data_frame$x) # This shows the levels of your factor 
[1] "apple" "banana" "peach" 

data_frame <- data_frame[order(data_frame$y),] # Order by value of y 
data_frame 
    x y 
2 banana 30 
1 apple 50 
3 peach 70 

# Now let's plot it: 
p <- ggplot(data_frame, aes(x)) + geom_bar(aes(weight=y)) 
p 

这是结果:

example-result

看到了吗?它不是按照我们想要的y值排序的。它按照因素的等级排序。现在,如果问题确实存在,那么在这里有解决方案R - Order a factor based on value in one or more other columns

应用实例与dplyr的解决方案:

library(dplyr) 
data_frame <- data_frame %>% 
     arrange(y) %>%   # sort your dataframe 
     mutate(x = factor(x,x)) # reset your factor-column based on that order 

data_frame 
     x y 
1 banana 30 
2 apple 50 
3 peach 70 

levels(data_frame$x) # Levels of the factor are reordered! 
[1] "banana" "apple" "peach" 

p <- ggplot(data_frame, aes(x)) + geom_bar(aes(weight=y)) 
p 

这是现在的结果是:

enter image description here

我希望这可以帮助,否则,你可能想给的例子你的原始数据集!

+0

你的答案真正有用的地方指出问题。但最终我找到了一个更方便的方法。通过重新排列因素水平。我将编辑我的问题,添加使其工作的原因,但再次感谢您的帮助。 –