跳过在fread中产生错误的行/行R

我尝试将大文件读入r。在尝试阅读时发生此错误。即使当我跳过第一条800607线时，它也不会消失。我也尝试用命令删除终端中的行。跳过在fread中产生错误的行/行R

sed '800608d' filename.csv

它没有解决我的问题。如果你能帮助我，我将不胜感激。

原来的错误，我自R得到的是：

> data<-fread("filename.csv") 
Read 2.0% of 34143409 rows 
Error in fread("filename.csv") : 
Field 16 on line 800607 starts with quote (") but then has a problem. It can contain balanced unescaped quoted subregions but if it does it can't contain embedded \n as well. Check for unbalanced unescaped quotes: """The attorney for Martin's family, Benjamin Crump, says the evidence is ""irrelevant\"""" """".","NULL","NULL","NULL","NULL","NULL","NULL","NULL","Negative" 
In addition: Warning message: 
In fread("filename.csv") : 
Starting data input on line 8 and discarded previous non-empty line: done

来源

2015-09-07 Carlo

这是一个非常棘手的问题。问题在于你的文件中有一列使用与文件结构相同的特殊字符（“用于引用”，“作为分隔符等），所以它完全混淆了文件格式。理想的方法是更改文件格式，如果您有权访问源文件，例如，将默认引号字符设置为'而不是“。否则，提供实际的文件将会很有帮助，这样我们也可以看看它 –

不幸的是，我不允许访问，并且更改文件格式需要很长时间。 – Carlo

我目前在解决这类问题我自己的中间。我不确定这是否适用于所有情况 - 更不用说我正在使用的所有文件。但现在我似乎得到了一些成绩有：

skip.list <- c() 

for (i in 1:length(dir(input.dir))){ # i=3 
    file <- dir(input.dir)[i] 
    ingested.file <- NULL 
    ingested.file <- try(fread(paste0(input.dir,file), header=T, stringsAsFactors=F)) 
    if (class(ingested.file)=="try-error") { 
    error.line <-as.integer(sub(" .*","",sub(".*but line ","",as.character(ingested.file)))) 
    app.reviews.input <- try(fread(paste0(input.dir,file), header=T, stringsAsFactors=F,skip=error.line)) 
    if (class(ingested.file)=="try-error") { 
     skip.list_by.downloads <- c(skip.list_by.downloads, file) 
     next 
    } 
    } 
}

我目前有大约750每1000行的文件工作 - 约50的有同样的问题。然而，用这种方法，我可以阅读其中的30个;其余20个似乎在多行中有错误，但我无法指定多个跳过值。

如果可以指定更多的跳跃，那么你可以尝试一个while语句。即

while (class(ingested.file)=="try-error") ...然后根据需要自动更新error.list。

我希望这有助于！

来源

2016-09-19 20:52:03 mjfred

对不起，最后一个附录：您可能需要更改error.line值，具体取决于您获得的错误。 – mjfred

除了将此附录作为注释添加外，最好将其编辑到答案中。不保证评论得到保留，有些人不会阅读。谢谢！ –

跳过在fread中产生错误的行/行R

回答

相关问题