读线附近与评论函数read.table

我读了一些包含在包含数据信息上有一些标题行的数据线，像这样的文本文件：读线附近与评论函数read.table

Test file 
# 
File information 
1 2 3 4 
# 
a 2 
b 4 
c 6 
d 8

我想从这个文件中单独阅读各种信息。我能像这样就好了实现这一目标：

file <- read.table(txt, nrow = 1) 
name <- read.table(txt, nrow = 1, skip = 2) 
vals <- read.table(txt, nrow = 1, skip = 3) 
data <- read.table(txt,   skip = 5)

由于两个空白注释行，我也可能读取的数据是这样的：

file <- read.table(txt, nrow = 1) 
name <- read.table(txt, nrow = 1, skip = 1) # Skip changed from 2 
vals <- read.table(txt, nrow = 1, skip = 3) 
data <- read.table(txt,   skip = 4) # Skip changed from 5

这是很好，但文本文件并不总是具有相同数量的空白注释行;有时他们在场，有时他们不在。如果我在示例文本文件中丢失了（或两者）注释行，我的解决方案都不能继续工作。

在文本文件中，skip变量永远不会计算注释行吗？

来源

2017-01-13 hfisch

类似于'lines < - readLines（txt）; lines_clean < - lines [substr（lines，1，1）！=“＃”]' –

（假设：在顶部的文件元数据，一旦数据开始，没有更多的评论后）。

（采用textConnection(...)是欺骗功能期待文件连接到处理字符串替换函数调用文件名）

一种技术是读取文件的第一行n行（某些数字“保证”包含所有注释/非数据行），找到最后一行，然后之前和之后全部处理：

txt <- "Test file 
# 
File information 
1 2 3 4 
# 
a 2 
b 4 
c 6 
d 8" 
max_comment_lines <- 8 
(dat <- readLines(textConnection(txt), n = max_comment_lines)) 
# [1] "Test file"  "#"    "File information" "1 2 3 4"   
# [5] "#"    "a 2"    "b 4"    "c 6"    
(skip <- max(grep("^\\s*#", dat))) 
# [1] 5

（顺便说一句：或许应该做一个检查，以确保有实际上的意见...这将返回integer(0)否则，和read*功能不一样，作为参数）

现在我们“知道”，最后找到的评论是在第5行，我们可以用前4行获得的头信息...

meta <- readLines(textConnection(txt), n = skip - 1) 
meta <- meta[! grepl("^\\s*#", meta) ] # remove the comment rows themselves 
meta 
# [1] "Test file"  "File information" "1 2 3 4"

...并跳过5行获取数据。

dat <- read.table(textConnection(txt), skip = skip) 
str(dat) 
# 'data.frame': 4 obs. of 2 variables: 
# $ V1: Factor w/ 4 levels "a","b","c","d": 1 2 3 4 
# $ V2: int 2 4 6 8

来源

2017-01-13 23:01:01 r2evans

当然......呃。谢谢。 – r2evans

感谢'textConnection'技巧，这是一个很好的奖金信息！ – hfisch

从技术上讲，'read.table'和朋友有一个'text ='参数，它将接受字符串而不是查找文件。由于'readLines'没有'text ='，为了保持一致性，我使用了'textConnection'，尽管这不是必须的。 – r2evans

读线附近与评论函数read.table

回答

相关问题