下面的代码可以做得更像“R like”吗?该代码如何压缩?
鉴于data.frame INDF:
V1 V2 V3 V4
1 a ha 1;2;3 A
2 c hb 4 B
3 d hc 5;6 C
4 f hd 7 D
里面DF我想
- 找到所有行这为 “V3” 列有由分隔的多个值 “;”
- 然后复制各行的次数相等的单独的值中的“V3”列中的数字,
- 然后每个复制的行中的“V3”列仅接收一个初始值
不久,输出data.frame(= outDF)看起来像:
V1 V2 V3 V4
1 a ha 1 A
1 a ha 2 A
1 a ha 3 A
2 c hb 4 B
3 d hc 5 C
3 d hc 6 C
4 f hd 7 D
所以,如果从INDF我想要去outDF,我会写下面的代码:
#load inDF from csv file
inDF <- read.csv(file='example.csv', header=FALSE, sep=",", fill=TRUE)
#search in inDF, on the V3 column, all the cells with multiple values
rowlist <- grep(";", inDF[,3])
# create empty data.frame and add headers from "headDF"
xDF <- data.frame(matrix(0, nrow=0, ncol=4))
colnames(xDF)=colnames(inDF)
#take every row from the inDF data.frame which has multiple values in col3 and break it in several rows with only one value
for(i in rowlist[])
{
#count the number of individual values in one cell
value_nr <- str_count(inDF[i,3], ";"); value_nr <- value_nr+1
# replicate each row a number of times equal with its value number, and transform it to character
extracted_inDF <- inDF[rep(i, times=value_nr[]),]
extracted_inDF <- data.frame(lapply(extracted_inDF, as.character), stringsAsFactors=FALSE)
# split the values in V3 cell in individual values, place them in a list
value_ls <- str_split(inDF[i, 3], ";")
#initialize f, to use it later to increment both row number and element in the list of values
f = 1
# replace the multiple values with individual values
for(j in extracted_inDF[,3])
{
extracted_inDF[f,3] <- value_ls[[1]][as.integer(f)]
f <- f+1
}
#put all the "demultiplied" rows in xDF
xDF <- merge(extracted_inDF[], xDF[], all=TRUE)
}
# delete the rows with multiple values from the inDF
inDF <- inDF[-rowlist[],]
#create outDF
outDF <- merge(inDF, xDF, all=TRUE)
您能否请
从正确编码规则:永远不要重新发明轮子。浪费时间,你很可能犯错误,或至少拿出一个非最佳解决方案。 –
我同意不重新发明轮子。我不确定我是否确切知道如何使用车轮。对于上面的例子,你有任何关于正确使用R的建议吗? – CLM
看看str_split和strsplit。一般来说,基R有一些有用的字符串函数,'stringr'包有更多。 –