2016-01-28 50 views
1

我试图从许多csv文件中提取相同的前16列数据,这些csv文件位于不同的子目录中,并将csv文件名添加到最终的每行CSV。我的代码:选择特定的列并将csv名称添加到最终的csv文件

getwd() 
root<-list.dirs(".", recursive=TRUE) 
# get list of files ending in csv in directory root 
dir(root, pattern='csv$', recursive = TRUE, full.names = TRUE) %>% 
# read files into data frames 
lapply(FUN = read.csv) %>% 
# bind all data frames into a single data frame 
rbind_all %>% 
# write into a single csv file 
write.csv("all.csv") 

我想知道在哪里放置选择列和添加文件名的代码。

答:

getwd() 
root<-list.dirs(".", recursive=TRUE) 
# get list of files ending in csv in directory root 
dir(root, pattern='csv$', recursive = TRUE, full.names = TRUE) %>% 
# read files into data frames, select first 16 columns and add filename 
lapply(FUN = function(p) read.csv(p) %>% select(1:16) %>%  

mutate(file_name=p)) %>%  
# bind all data frames into a single data frame 
rbind_all %>% 
# write into a single csv file 
write.csv("all.csv") 
+1

我会做在'lapply'步骤,这是您最后一次访问文件名/路径。可能是这样的:'lapply(FUN = function(p)read.csv(p)%>%select(1:16)%>%mutate(file_name = p))%>%' – scoa

+0

谢谢scoa!我修改了回答 – EJrandom

回答

2

你应该在您使用lapply的时候做,因为这是最后一步,您可以访问文件名/路径:

dir(root, pattern='csv$', recursive = TRUE, full.names = TRUE) %>% 
    lapply(FUN = function(p) read.csv(p) %>% select(1:16) %>% mutate(file_name=p)) %>% 
    bind_rows() %>% 
    write.csv("all.csv") 
+0

使用'bind_rows'而不是'rbind_all',看到这个http://rpackages.ianhowson.com/cran/dplyr/man/bind.html –

相关问题