使用嵌套forloop创建tar文件R

我有一堆文件。它们被命名如下：使用嵌套forloop创建tar文件R

BNT_20170301131740322_123456.csv, BNT_20170301131740322_7891011.csv

在这个文件名，从第5个字符至12个字符开始是日期，第13和14字符是小时。其余的都是动态生成的，它们不断变化。在上面的例子中，日期是2017年3月1日和小时为13

任务1： 我有荏苒所有符合特定的日期和时间的文件创建tar文件。因此，根据文件生成的日期和时间，我将有多个tar文件作为输出。

任务2： 下一个任务是以特定模式命名tar文件。每个tar文件应在以下模式命名为：

BNT_2017030111_2.tar

在上面的名字，我们可以看到“BNT_”留存之后的日期和时间以及_（下划线）之后的2指示的文件数量在与日期和时间匹配的焦油中。在上面的示例中，名称表示文件的日期为2017年3月1日，小时参数11一起涂抹，tar内有2个文件。

我迄今所做的：

#set the working directory 
setwd("/home/mycomp/Documents/filestotar/") 

#list all files 
files <- list.files(pattern = ".csv")

我列出的所有文件的名称重现

files <- c("BNT_20170301000000790_123456.csv", "BNT_20170301000000887_7891011.csv", 
"BNT_20170301000000947_7430180.csv", "BNT_20170301000001001_2243094.csv", 
"BNT_20170301000001036_14195326.csv", "BNT_20170301000001036_14770776.csv", 
"BNT_20170301000001078_10692013.csv", "BNT_20170301000001089_2966772.csv", 
"BNT_20170301000001100_10890506.csv", "BNT_20170301000001576_7430180.csv")

我的代码：

library(stringr) 
#extract date and time and set the pattern to identify the files in the folder 
#extracts date from the file name 
d <- substr(files, 5,12) 

#extracts hour from the file name 
e <- substr(files, 13,14) 

#creates a pattern that can be used to identify the files matching the pattern. 
pat <- paste("BNT","_",unique(d),unique(e),sep="") 

#creates the count of files with unique hour parameter. This will be used to create the name for the tar file. 
f <- table(paste(d,e,sep="")) 

#create unique names for the tar files 
g <- unique(paste("BNT",unique(d),unique(e),f,sep="_")) 

#pasting the extension .tar to the name of the file 
h <- paste(g,".tar",sep="") 



#create a nested forloop to tar the files recursively 
for (name in h) { 
    for (i in seq_along(pat)) { 
    filestotar = for (i in seq_along(pat)) {list.files(path = "/home/mycomp/Documents/filestotar/", pattern = pat[i])} 
    } 
    tar(tarfile = name, files = filestotar) 
}

上面创建所需的tar文件数量。但是tar文件包含第一个tar本身的文件夹中的所有文件，并递归地将所有新tar文件与所有后续tar文件中文件夹中的原始文件包括在一起。

例如，第一tar文件具有所有的CSV文件，而不是只有那些图案pat

第二tar文件具有第一tar文件+它拥有所有的CSV文件，而不是只有那些匹配匹配模式pat。

现在这将继续为每个创建的tar文件和最后一个tar文件包含所有已创建的tar文件+与pat匹配的所有文件。

所需的输出是：

焦油只有那些与BNT_ +日期+文件+的.tar这看起来就像为一小时+号相匹配的文件名，日期和时间，并将其命名文件如下：

BNT_2017030111_2.tar

已创建文件夹与虚拟文件...以防万一，如果这能帮助：

https://drive.google.com/open?id=0BwPrNXRo3C1aaUN2WmMtS3dpZ1U

来源

2017-08-18 Apricot

为了避免环路（不是必须的，但在这里，我的选择），您可以创建其持有的文件中的所有信息data.frame。反过来，您可以切片并将其切成所需的文件名。

xy <- data.frame(files = files, date = d, hour = e) 

out <- split(xy, f = list(xy$date, xy$hour)) 

result <- sapply(out, FUN = function(x) { 
    nfiles <- nrow(x) 
    name <- paste("BNT_", unique(x$date), unique(x$hour), "_", nfiles, ".tar", sep = "") 

    ### just for show, you can remove ### 
    message(sprintf("For %s extracting %s files:", name, nfiles)) 
    for (i in x$files) { 
    message(i) 
    } 
    ### end just for show ### 
    tar(tarfile = name, files = x$files) 
})

来源

2017-08-18 04:55:19

我的要求之一是重命名文件名以包含tar文件的数量......并且strsplit也会提取完整的字符串，而不仅仅是BNT之后下划线的前10个字符。由于我的文件名中的其余字符不断变化，我试着'strsplit'包装一个'substring'并得到前10个字符......但无法循环其他文件名称的第9和第10个字符（小时）更改...如果我错过了代码中的某些内容，我很抱歉...您能否帮我理解 – Apricot

@Apricot我的解决方案只对代码的最后部分说。 'pat'与你创建它的时候一样（可能有效）。剩下的就是如何找到相似的文件名并将它们放入同一个tar文件中。也许你可以编辑你的问题，并指定哪个文件应该在哪里配。 –

@akrun你能帮我解决这个问题吗？ – Apricot

使用嵌套forloop创建tar文件R

回答

相关问题