故障与cbc.read.table功能中的R

Possible Duplicate:
Some issues trying to read a file with cbc.read.table function in R + using filter while reading files 故障与cbc.read.table功能中的R

一）我想根据我经历了什么从colbycol包河读一个比较大的.txt文件与功能cbc.read.table当我们有大文件时（读取R中的GB多于一个），读取这个包使得工作更容易，而且我们不需要所有的列/变量来进行分析。另外，我读了功能cbc.read.table可以支持相同的read.table的参数。但是，如果我传递参数nrows（为了让我在R档的预览）我得到以下错误：

#My line code. I'm just reading columns 5,6,7,8 out of 27 
i.can <- cbc.read.table("xxx.txt", header = T, sep = "\t",just.read=5:8, nrows=20) 
#error message 
Error in read.table(file, nrows = 50, sep = sep, header = header, ...) : 
formal argument "nrows" matched by multiple actual arguments

所以，我的问题是：你能告诉我怎样才能解决这个问题？

B）之后，我试图读取用下面的代码的所有实例：

i.can.b <- cbc.read.table("xxx.txt", header = T, sep = "\t",just.read=4:8) #done perfectly 
my.df <- as.data.frame(i.can.b) #getting error in this line 
Error in readSingleKey(con, map, key) : unable to obtain value for key 'Company' #Company is a string column in my data set

所以，我的问题又是：我该如何解决这个问题？

c）您是否知道在阅读文件时可以过滤（通过实例条件）的方式？

来源

2012-05-18 Nestorghh

在回答）：

cbc.read.table()读取该数据在50块的块：

tmp.data <- read.table(file, nrows = 50, sep = sep, header = header, 
     ...)

由于函数已经分配nrows参数的值50，当它通过nrows您指定的参数，有两个nrows参数传递给read.table()，导致错误。对我来说，这似乎是一个错误。为了解决这个问题，你可以修改cbc.read.table()函数来处理指定的nrows参数或者接受类似于max.rows参数的东西（也可能把它作为潜在补丁传递给维护者）。或者，您可以指定参数sample.pct，该参数指定要读取的行的比例。所以，如果该文件包含100行，并且只需要50：sample.pct = 0.5。

在回答B）：

不知道这个错误是什么意思。没有可重复的例子很难诊断。如果你阅读一个较小的文件，你会得到同样的错误吗？

在回答到c）：

我一般喜欢存储在关系数据库中非常大的字符数据，例如MySQL。在你的情况下，使用RQLite包可能会更容易一些，它在R中嵌入一个SQLite引擎。然后SQL SELECT查询可以用来检索数据的条件子集。大于内存数据的其他程序包可在下找到大内存和内存不足数据 here：http://cran.r-project.org/web/views/HighPerformanceComputing.html

来源

2012-05-18 16:16:21 jthetzel

故障与cbc.read.table功能中的R

回答

相关问题