读取R中的大数据by read.big.matrix

我正在使用read.big.matrix读取维数为3131875 * 5的数据。我的数据包含字符和数字列，包括日期变量。这是我应该使用的命令是读取R中的大数据by read.big.matrix

as1 <- read.big.matrix("C:/Documents and Settings/Arundhati.Mukherjee/My Documents/Arundhati/big data/MB07_Arundhati/sample2.txt", 
         header=TRUE, 
         backingfile="session.bin", 
         descriptorfile="session.desc", 
         type = NA)

但type = NA没有在R的情况下接受了，我得到一个错误：

Error in filebacked.big.matrix(nrow = nrow, ncol = ncol, type = type, : 
    Problem creating filebacked matrix. 
In addition: Warning messages: 
1: In na.omit(as.integer(firstLineVals)) : NAs introduced by coercion 
2: In na.omit(as.double(firstLineVals)) : NAs introduced by coercion 
3: In read.big.matrix("C:/Documents and Settings/Arundhati.Mukherjee/My Documents/Arundhati/big data/MB07_Arundhati/sample2.txt", : 
    Because type was not specified, we chose double based on the first line of data.

我需要知道应该在这里的type。我试着用double这样的选项，但那是抛出我相同的错误。

请帮帮我。

来源

2012-10-04 user1702490

从?read.big.matrix：

Files must contain only one atomic type (all integer, for example).

因此，您将无法在数据与字符，数字，整数，日期等的组合来读取你可以做一些工作的文件，例如使用不同的程序将字符变量转换为整数表示（如转换为R中的因子）。

编辑：

在bigmemory website有使用Python脚本以改变字符信息为整数预处理的数据的一例。该脚本是为特定数据集编写的，但也许可以将其用作数据的指导原则。

来源

2012-10-04 11:50:11 BenBarnes

@ user1702490，这可能不是你得到错误信息的原因。你能从你的数据创建一个非文件支持的'big.matrix'吗？ – BenBarnes

读取R中的大数据by read.big.matrix

回答

相关问题