2014-01-28 205 views
0

我有两列数据表格,即Columns列表,即一对疾病和它们的一对。下面是disease_table1在R中绘制图表

**d1** **d2** **Value** 

Disease1 Disease2 3.5 
Disease3 Disease4 5 
Disease5 Disease6 1.1 
Disease1 Disease3 2.4 
Disease6 Disease2 6.7 

真实数据集1(disease_table1)低于第一个(样本数据):

Bladder cancer       X-linked ichthyosis (XLI)  3.5 
Leukocyte adhesion deficiency (LAD) Aldosterone synthase Deficiency 1.8 
Leukocyte adhesion deficiency (LAD) Brain Cancer      1.5 
Tangier disease      Pancreatic cancer    0.66 

我想说明这两个数据表之间的差异,同时绘制疾病对及其两个表的值。 我使用了plot函数和直线函数,但它太简单了,不能很好地区分。另外我想在绘图时有疾病对的名称。

plot(density(disease_table1$value)) 
    lines(density(disease_table1$value)) 

感谢

+3

你能否给我们提供一个[reproducable example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example)? – Jaap

+0

我已经添加了真实的数据集,代码作为例子。 – Rgeek

+0

400,000+疾病对可能需要一种聚类方法。你可以发布一个链接到你的数据,或更具代表性的子集,说几千条记录? – jlhoward

回答

2

一些示例代码:

# creating dataframes (i made up a second one) 
df1 <- read.table(text = "d1 d2 x 
Disease1 Disease2 3.5 
Disease3 Disease4 5 
Disease5 Disease6 1.1 
Disease1 Disease3 2.4 
Disease6 Disease2 6.7", header = TRUE, strip.white = TRUE) 

df2 <- read.table(text = "d1 d2 y 
Disease1 Disease2 4.5 
Disease3 Disease4 2 
Disease5 Disease6 3.1 
Disease1 Disease3 1.4 
Disease6 Disease2 5.7", header = TRUE, strip.white = TRUE) 

# needed libraries 
library(reshape2) 
library(ggplot2) 

# merging dataframes & creating unique identifier variable 
data <- merge(df1, df2, by = c("d1","d2")) 
data$diseasepair <- paste0(data$d1,"-",data$d2) 

data.long <- melt(data, id="diseasepair", measure=c("x","y"), variable="group") 

# make the plot 
ggplot(data.long) + 
    geom_bar(aes(x = diseasepair, y = value, fill = group), 
      stat="identity", position = "dodge", width = 0.7) + 
    scale_fill_manual("Group\n", values = c("red","blue"), 
        labels = c(" X", " Y")) + 
    labs(x="\nDisease pair",y="Value\n") + 
    theme_bw() 

结果:

enter image description here

这是你看着什么?

+0

我有40万对这样的类型,所以我认为这不会起作用。尽管如此,它对于较小的数据集效果会很好。我相信曲线或热图可以工作吗? – Rgeek

+0

对于400k对热图不会工作,恕我直言。你想比较每一对的值吗?或者只是针对特定的配对? – Jaap

+0

基本上,我想用一个数据集中的值与另一个数据集中的值来显示疾病对的富集。因此,我想比较每对数值。 – Rgeek