在读取数据从文本文件数据框,您可以指定使用colClasses
说法每一列的类型的时间。查看文件下面有一个在我的电脑:
> head(read.csv("R/Data/ZipcodeCount.csv"))
X zipcode stateabb countyno countyname
1 1 401 NY 119 WESTCHESTER
2 391 501 NY 103 SUFFOLK
3 392 544 NY 103 SUFFOLK
4 393 601 PR 1 ADJUNTAS
5 630 602 PR 3 AGUADA
6 957 603 PR 5 AGUADILLA
> head(read.csv("R/Data/ZipcodeCount.csv",colClasses=c(rep("factor",5))))
X zipcode stateabb countyno countyname
1 1 00401 NY 119 WESTCHESTER
2 391 00501 NY 103 SUFFOLK
3 392 00544 NY 103 SUFFOLK
4 393 00601 PR 001 ADJUNTAS
5 630 00602 PR 003 AGUADA
6 957 00603 PR 005 AGUADILLA
> zip<-read.csv("R/Data/ZipcodeCount.csv",colClasses=c(rep("factor",5)))
> str(zip)
'data.frame': 53424 obs. of 5 variables:
$ X : Factor w/ 53424 levels "1","10000081",..: 1 36316 36333 36346 43638 52311 19581 23775 26481 26858 ...
$ zipcode : Factor w/ 41174 levels "00401","00501",..: 1 2 3 4 5 6 6 7 8 9 ...
$ stateabb : Factor w/ 60 levels ""," ","AK","AL",..: 41 41 41 46 46 46 46 46 46 46 ...
$ countyno : Factor w/ 380 levels "","000","001",..: 106 95 95 3 5 7 5 7 7 9 ...
$ countyname: Factor w/ 1925 levels "","ABBEVILLE",..: 1844 1662 1662 9 10 11 10 11 11 12 ...
> head(table(zip[,"zipcode"]))
00401 00501 00544 00601 00602 00603
1 1 1 1 1 2
,你可以看到R的不再治疗拉链码为数字,但作为因素。在你的情况下,你需要指定前6列的类,然后选择factor
作为第七列。所以如果前6列是数字,它应该是这样的colClasses = c(rep("numeric",6),"factor")
。
格式化并添加'r'可以找到'R'。 – 2013-02-27 18:35:54
@Julius所以'R'和'R'是一样的吗? – 2013-02-27 19:31:23
@GrijeshChauhan,我会说这是更常见的称为R,但这里'r'标签是正确的。 – Julius 2013-02-27 19:41:50