我有几千个*.csv
文件(所有文件都有唯一的名称),但文件中的标题列相同 - 比如"Timestamp"
,"System_Name"
,"CPU_ID"
等...
我的问题是我怎么能取代"System_Name"
(这是一个系统名称像"as12535.org.at"
或任何其他字符组合,并匿名此?我很感激任何提示或点右方向...
下面的CSV文件的结构...R - 通过列表中的data.frames循环 - 修改列(列表元素)的字符
"Timestamp","System_Name","CPU_ID","User_CPU","User_Nice_CPU","System_CPU","Idle_CPU","Busy_CPU","Wait_IO_CPU","User_Sys_Pct"
"1161025010002000","as06240.org.xyz:LZ","-1","1.83","0.00","0.56","97.28","2.72","0.33","3.26"
"1161025010002000","as06240.org.xyz:LZ","-1","1.83","0.00","0.56","97.28","2.72","0.33","3.26"
"1161025010002000","as06240.org.xyz:LZ","-1","1.83","0.00","0.56","97.28","2.72","0.33","3.26"
我试过用R包anonymizer
,它在矢量级别上工作正常,但是我遇到了这样的问题,因为我在R中读取了数千个csv文件 - 我尝试的是以下内容 - 创建包含所有csv文件作为列表中的数据框。
initialize a list
r.path <- setwd("mypath")
ldf <- list()
# creates the list of all the csv files in my directory - but filter for
# files with Unix in the filename for testing.
listcsv <- dir(pattern = ".UnixM.")
for (i in 1:length(listcsv)){
ldf[[i]] <- read.csv(file = listcsv[i])
}
我扭我的大脑死亡,因为我无法匿名的System_Name
列,甚至可以通过列表(ldf
)和该数据帧的元素替换某些字符(伪匿名)和环路很名单。
我的目录ldf
(包含单CSV文件DF)是这样的:
summary(ldf)
Length Class Mode
[1,] 5 data.frame list
[2,] 5 data.frame list
[3,] 5 data.frame list
如何我现在可以在所有的CSV文件,更改阅读或匿名的整个或甚至是"System_Name"
列的一部分,并且为我的目录中的每个CSV执行此操作,在R中进行循环?不需要是超级优雅的 - 很高兴当它:-)
使用'lapply'到你想要的功能列表中。我不知道anonymizer如何工作,在假设的情况下,函数就像'anonymizer(column)':'lapply(list,function(x)anonymizer(x $ System_Name))' –