2017-06-16 43 views
0

我有一个包含64个数据帧的列表。 数据帧1和数据帧5必须具有相同的行名称。 与2和6,3和7相同,依此类推。 我能够运行一个for循环并创建一个新列表,但某些工作不正常:我最终得到的行数不正确。过滤存储在列表中的数据帧的行并创建新列表

这里一个简单的例子来重现:

# Create dataframes and store in list 
dfA <- data.frame(v1=c(1:6), v2=c("x1","x2","x3","x4","x5","x6")) 
dfB <- data.frame(v1=c(1:6), v2=c("x1","x2","x3","x4","x5","x6")) 
dfC <- data.frame(v1=c(1:5), v2=c("x1","x2","x3","x4","x5")) 
dfD <- data.frame(v1=c(1:4), v2=c("x1","x2","x3","x4")) 
example_dataframes = list(dfA, dfB, dfC, dfD) 

# These vectors give the order of the process 
vectorA = c(1,2) 
vectorB = c(3,4) 

# Create new list and start for loop 
filtered_dataframes = list() 
for (i in vectorA) { 
    for (j in vectorB) { 
df1 = example_dataframes[[i]] 
df2 = example_dataframes[[j]] 
test = intersect(df1$v2, df2$v2) 
filtered_dataframes[[i]] <- df1[which(df1$v2 %in% test),] 
filtered_dataframes[[j]] <- df2[which(df2$v2 %in% test),] 
} 
} 

在这个例子中,我希望能获得:

sapply(filtered_dataframes, nrow) 
> 5 4 5 4 
+0

我想你只需要一个for-cycle来遍历vectorA和vectorB的索引,而不是一个两层递归的for-cycle。 – mt1022

回答

0

这个修改后的版本应该努力得到你需要的结果。

dfA <- data.frame(v1=c(1:6), v2=c("x1","x2","x3","x4","x5","x6")) 
dfB <- data.frame(v1=c(1:6), v2=c("x1","x2","x3","x4","x5","x6")) 
dfC <- data.frame(v1=c(1:5), v2=c("x1","x2","x3","x4","x5")) 
dfD <- data.frame(v1=c(1:4), v2=c("x1","x2","x3","x4")) 
example_dataframes = list(dfA, dfB, dfC, dfD) 

# Put the comparison vectors into a list. Exampl: To compare dataframes 1 and 3, put in c(1,3) 
vector.list <- list(c(1,3),c(2,4)) 

# Create new list and start for loop 
filtered_dataframes = list() 

# Loop through the list of vectors 
for (i in vector.list) { 
    # Get the first dataframe from the current vector being processed 
    df1 = example_dataframes[[i[1]]] 

    # Get the second dataframe from the current vector being processed 
    df2 = example_dataframes[[i[2]]] 

    # Get the intersection of the two dataframes 
    test = intersect(df1$v2, df2$v2) 

    # Add the first filtered dataframe to the list of filtered dataframes 
    filtered_dataframes[[i[1]]] <- df1[which(df1$v2 %in% test),] 

    # Add the second filtered dataframe to the list of filtered dataframes 
    filtered_dataframes[[i[2]]] <- df2[which(df2$v2 %in% test),] 
    } 
+0

谢谢,但我没有得到我的预期。在新列表中,dfA和dfC应该有nrow = 5,dfB和dfD应该有nrow = 4。我想保持它们的顺序。 – fibar

+0

我修改了我的答案以解决您遇到的问题。设置要在'vector.list'中比较的数据框时,只需在同一个向量中分配两个数据帧索引,'c(1,3)'将比较'example.dataframes'列表中索引1和3的数据帧。我也修改了它,这样当数据帧输出到'filtered_dataframes'列表时,它们将保持原始索引位置。 –

相关问题