2014-02-28 78 views
4

自动启动我使用下面的代码读取与data.table库文件:跳过和FREAD

fread(myfile, header=FALSE, sep=",", skip=100, colClasses=c("character","numeric","NULL","numeric")) 

,但我得到了以下错误:

The supplied 'sep' was not found on line 80. To read the file as a single character column set sep='\n'. 

它说,它确实没有找到第80行sep,但我设置skip = 100,所以它不应该注意前100行。

UPDATE: 我试图跳跃= 101和它的工作,但它跳过其中数据开始

我使用data.table包的版本1.9.2和R版本3.02 64的第一行bit 7

+0

你有没有试过不通过'sep'和'colClasses'? 'fread'应该能够自动确定这些。 – Roland

+0

是的,我试过 – Enrique

+0

请你先说出软件的版本,所以我们不需要问。 v1.9.2刚刚发布。你升级了吗? –

回答

4

我们不知道您使用的版本号,但我可以在这种情况下做出猜测。

尝试设置autostart=101

注意细节的?fread第一段:

Once the separator is found on line autostart , the number of columns is determined. Then the file is searched backwards from autostart until a row is found that doesn't have that number of columns. Thus, the first data row is found and any human readable banners are automatically skipped. This feature can be particularly useful for loading a set of files which may not all have consistently sized banners. Setting skip>0 overrides this feature by setting autostart=skip+1 and turning off the search upwards step.

skip参数有:

If -1 (default) use the procedure described below starting on line autostart to find the first data row. skip>=0 means ignore autostart and take line skip+1 as the first data row (or column names according to header="auto"|TRUE|FALSE as usual). skip="string" searches for "string" in the file (e.g. a substring of the column names row) and starts on that line (inspired by read.xls in package gdata).

autostart参数有:

Any line number within the region of machine readable delimited text, by default 30. If the file is shorter or this line is empty (e.g. short files with trailing blank lines) then the last non empty line (with a non empty line above that) is used. This line and the lines above it are used to auto detect sep, sep2 and the number of fields. It's extremely unlikely that autostart should ever need to be changed, we hope.

你的情况可能是人类可读的标题非常大呃比30行,这就是为什么我猜设置autostart=101可能工作。无需使用skip

当文件包含多个表格时,一个动机是为了方便。通过将autostart设置为表格中您想从文件中提取的任何行,它会自动为您找到第一个数据行和标题行,然后只读取该表。您不必担心在数据的起始处得到确切的行号,就像您使用skip所做的那样。 fread目前只能读取一张表。它可以从一个文件中可靠地返回一张表格列表,但这有点复杂,没有人要求这样做。

+0

感谢您的回复。我试着用autostart = 102,它工作。 – Enrique

+0

嗨马特,我遇到了类似的问题与'fread'这个答案,但是,答案似乎并不适用。如果你有足够的时间,我会很感激你的帮助在这里http://stackoverflow.com/questions/24759346/fread-skip-and-autostart-issue – user1477388

+0

@Matt Dowle有没有办法跳过一些行(例如前1000 ),同时通过'colClasses'改变列类?因为这是我在尝试以块读取文件 – EDC