2017-10-04 24 views
0

我有一个csv文件,具有四列AGE,DIASTOLIC,BMI,EVER.PREGNANT。我希望绘制直方图,比较x轴上的AGE和y轴上的DIASTOLIC。我怎么能这样做?我写的代码是:从四列csv文件中R的直方图

Sheet=read.csv("/home/prajnan/Downloads/1739230_1284354330_PIMA.csv - 1739230_1284354330_PIMA.csv.csv",sep=",", header = T) hist(Sheet[2],Sheet[3]$AGE$DIASTOLIC)

我得到的错误是:

Error in hist.default(Sheet[2], Sheet[3]$AGE$DIASTOLIC) :'x' must be numeric 哪里错了吗?事先感谢。

注意:输出用于dput(头(薄板10))是:

structure(list(X = c(NA, NA, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L), 
X.1 = structure(c(1L, 53L, 31L, 12L, 13L, 2L, 14L, 11L, 7L, 
34L), .Label = c("", "21", "22", "23", "24", "25", "26", 
"27", "28", "29", "30", "31", "32", "33", "34", "35", "36", 
"37", "38", "39", "40", "41", "42", "43", "44", "45", "46", 
"47", "48", "49", "50", "51", "52", "53", "54", "55", "56", 
"57", "58", "59", "60", "61", "62", "63", "64", "65", "66", 
"67", "68", "69", "70", "81", "AGE"), class = "factor"), 
X.2 = structure(c(1L, 48L, 31L, 28L, 26L, 28L, 13L, 32L, 
17L, 30L), .Label = c("", "100", "102", "104", "106", "108", 
"110", "114", "122", "24", "30", "38", "40", "44", "46", 
"48", "50", "52", "54", "55", "56", "58", "60", "61", "62", 
"64", "65", "66", "68", "70", "72", "74", "75", "76", "78", 
"80", "82", "84", "85", "86", "88", "90", "92", "94", "95", 
"96", "98", "DIASTOLIC"), class = "factor"), X.3 = structure(c(1L, 
248L, 124L, 63L, 31L, 78L, 210L, 54L, 104L, 100L), .Label = c("", 
"18.2", "18.4", "19.1", "19.3", "19.4", "19.5", "19.6", "19.9", 
"20", "20.1", "20.4", "20.8", "21", "21.1", "21.2", "21.7", 
"21.8", "21.9", "22.1", "22.2", "22.3", "22.4", "22.5", "22.6", 
"22.7", "22.9", "23", "23.1", "23.2", "23.3", "23.4", "23.5", 
"23.6", "23.7", "23.8", "23.9", "24", "24.1", "24.2", "24.3", 
"24.4", "24.5", "24.6", "24.7", "24.8", "24.9", "25", "25.1", 
"25.2", "25.3", "25.4", "25.5", "25.6", "25.8", "25.9", "26", 
"26.1", "26.2", "26.3", "26.4", "26.5", "26.6", "26.7", "26.8", 
"26.9", "27", "27.1", "27.2", "27.3", "27.4", "27.5", "27.6", 
"27.7", "27.8", "27.9", "28", "28.1", "28.2", "28.3", "28.4", 
"28.5", "28.6", "28.7", "28.8", "28.9", "29", "29.2", "29.3", 
"29.5", "29.6", "29.7", "29.8", "29.9", "30", "30.1", "30.2", 
"30.3", "30.4", "30.5", "30.7", "30.8", "30.9", "31", "31.1", 
"31.2", "31.3", "31.6", "31.9", "32", "32.1", "32.2", "32.3", 
"32.4", "32.5", "32.6", "32.7", "32.8", "32.9", "33.1", "33.2", 
"33.3", "33.5", "33.6", "33.7", "33.8", "33.9", "34", "34.1", 
"34.2", "34.3", "34.4", "34.5", "34.6", "34.7", "34.8", "34.9", 
"35", "35.1", "35.2", "35.3", "35.4", "35.5", "35.6", "35.7", 
"35.8", "35.9", "36", "36.1", "36.2", "36.3", "36.4", "36.5", 
"36.6", "36.7", "36.8", "36.9", "37", "37.1", "37.2", "37.3", 
"37.4", "37.5", "37.6", "37.7", "37.8", "37.9", "38", "38.1", 
"38.2", "38.3", "38.4", "38.5", "38.6", "38.7", "38.8", "38.9", 
"39", "39.1", "39.2", "39.3", "39.4", "39.5", "39.6", "39.7", 
"39.8", "39.9", "40", "40.1", "40.2", "40.5", "40.6", "40.7", 
"40.8", "40.9", "41", "41.2", "41.3", "41.5", "41.8", "42", 
"42.1", "42.2", "42.3", "42.4", "42.6", "42.7", "42.8", "42.9", 
"43.1", "43.3", "43.4", "43.5", "43.6", "44", "44.1", "44.2", 
"44.5", "44.6", "45", "45.2", "45.3", "45.4", "45.5", "45.6", 
"45.7", "45.8", "46.1", "46.2", "46.3", "46.5", "46.7", "46.8", 
"47.9", "48.3", "48.8", "49.3", "49.6", "49.7", "50", "52.3", 
"52.9", "53.2", "55", "57.3", "59.4", "67.1", "BMI"), class = "factor"), 
X.4 = structure(c(1L, 2L, 4L, 4L, 4L, 4L, 3L, 4L, 4L, 4L), .Label = c("", 
"EVER-PREGNANT", "\"no\"", "\"yes\""), class = "factor")), .Names = c("X", 

“X.1”, “X.2”, “X.3”, “X.4” ),row.names = c(NA,10L),class =“data.frame”)

回答

2

首先,直方图是显示单个分布中值的频率的图形。你不能用它来比较两个值。看看你的数据集中在一个单一的分布,你可以做这样的事情:

hist(sheet$AGE) 

,同样:

hist(sheet$DIASTOLIC) 

,如果你希望他们绘制在一起的两个分布你可以这样做比较:

par(mfrow = c(2, 1)) 
hist(sheet$AGE) 
hist(sheet$DIASTOLIC) 

但是,如果您希望直接比较两个变量,则直方图可能不是您想要的。你可以从这样简单的散点图开始:

plot(sheet$AGE, sheet$DIASTOLIC) 
+0

当我输入'hist(Sheet $ AGE)'时,我得到错误; 'x'必须是数字,我应该如何继续? – vidyarthi

+0

如果我不得不猜测,我会猜测该列是作为因子而不是数字读入的。尝试'hist(as.numeric(Sheet $ AGE))'我可以肯定地说,如果你将'dput(Sheet)'的输出粘贴到你的问题 – tbradley

+0

尝试'hist(as.numeric(Sheet $ AGE) )'输出为hist.default(as.numeric(Sheet $ AGE))中的错误:无效的'中断'数量。 'dput(Sheet)'输出很长,以至于在代码需要一段时间时缩进它。我应该发一张照片吗? – vidyarthi