2016-05-21 29 views
1

我有一个列表“simil”,它包含7个载体:as.matrix()和as.dist()有不同的结果

> dput(simil) 
structure(list(Monday = structure(c(0.889987253484581, 0.882957894295089, 
0.882232353177177, 0.874080268021168, 0.851760771472629, 0.811536071048775 
), .Names = c("Sunday", "Tuesday", "Friday", "Wednesday", "Thursday", 
"Saturday")), Tuesday = structure(c(0.901682757072732, 0.882957894295089, 
0.874716806575548, 0.869202937572079, 0.855248496101086, 0.818659253763272 
), .Names = c("Sunday", "Monday", "Wednesday", "Friday", "Thursday", 
"Saturday")), Wednesday = structure(c(0.88354911311872, 0.874716806575548, 
0.874080268021168, 0.853293126413937, 0.851921112754124, 0.841170795359615 
), .Names = c("Sunday", "Tuesday", "Monday", "Friday", "Thursday", 
"Saturday")), Thursday = structure(c(0.86579834238668, 0.855248496101086, 
0.851921112754124, 0.851760771472629, 0.851384896045153, 0.836732564057725 
), .Names = c("Sunday", "Tuesday", "Wednesday", "Monday", "Friday", 
"Saturday")), Friday = structure(c(0.882232353177177, 0.869202937572079, 
0.856441568566172, 0.853293126413937, 0.851384896045153, 0.80098779448239 
), .Names = c("Monday", "Tuesday", "Sunday", "Wednesday", "Thursday", 
"Saturday")), Saturday = structure(c(0.866654844262859, 0.841170795359615, 
0.836732564057725, 0.818659253763272, 0.811536071048775, 0.80098779448239 
), .Names = c("Sunday", "Wednesday", "Thursday", "Tuesday", "Monday", 
"Friday")), Sunday = structure(c(0.901682757072732, 0.889987253484581, 
0.88354911311872, 0.866654844262859, 0.86579834238668, 0.856441568566172 
), .Names = c("Tuesday", "Monday", "Wednesday", "Saturday", "Thursday", 
"Friday"))), .Names = c("Monday", "Tuesday", "Wednesday", "Thursday", 
"Friday", "Saturday", "Sunday"), class = c("similMatrix", "list" 
)) 

我现在想将它转变成一个DIST对象,然后使用它为hclust()。所以我用as.dist()和我计算:

> as.dist(simil,diag = TRUE, upper = TRUE) 
      Monday Sunday Tuesday Friday Wednesday Thursday Saturday 
Monday 0.0000000 0.8899873 0.8829579 0.8822324 0.8740803 0.8517608 0.8115361 
Sunday 0.8899873 0.0000000 1.0000000 0.8692029 0.8747168 0.8552485 0.8186593 
Tuesday 0.8829579 1.0000000 0.0000000 0.8532931 1.0000000 0.8519211 0.8411708 
Friday 0.8822324 0.8692029 0.8532931 0.0000000 0.8519211 1.0000000 0.8367326 
Wednesday 0.8740803 0.8747168 1.0000000 0.8519211 0.0000000 0.8513849 0.8009878 
Thursday 0.8517608 0.8552485 0.8519211 1.0000000 0.8513849 0.0000000 1.0000000 
Saturday 0.8115361 0.8186593 0.8411708 0.8367326 0.8009878 1.0000000 0.0000000 

但是,这是从当我使用as.matrix()稍有不同的结果:

> as.matrix(simil) 
      Monday Tuesday Wednesday Thursday Friday Saturday Sunday 
Monday 1.0000000 0.8829579 0.8740803 0.8517608 0.8822324 0.8115361 0.8899873 
Sunday 0.8899873 0.9016828 0.8835491 0.8657983 0.8564416 0.8666548 1.0000000 
Tuesday 0.8829579 1.0000000 0.8747168 0.8552485 0.8692029 0.8186593 0.9016828 
Friday 0.8822324 0.8692029 0.8532931 0.8513849 1.0000000 0.8009878 0.8564416 
Wednesday 0.8740803 0.8747168 1.0000000 0.8519211 0.8532931 0.8411708 0.8835491 
Thursday 0.8517608 0.8552485 0.8519211 1.0000000 0.8513849 0.8367326 0.8657983 
Saturday 0.8115361 0.8186593 0.8411708 0.8367326 0.8009878 1.0000000 0.8666548 

随着as.dist(),矩阵是不完全对称,有些对会出错,这与as.matrix()不会发生。这是为什么?我该如何纠正它?

+0

如上所述,如果它是一个'list','sapply/lapply'是循环列表的方法。如果你发布了这个例子的输出结果会更好; – akrun

+0

我用dput()更新了这个问题。但我不明白,我应该如何使用sapply/lapply将我的列表转换为dist对象?不是as.dist()应该这样做吗? –

+0

根据你的输入,你使用的代码没有给出你显示的输出,但是,“simplify2array(simil)”给出了一个矩阵 – akrun

回答

1

所以最终我设法先转化成矩阵,则swaping行顺序,并最终改变成DIST对象来解决这个问题:

simil = as.matrix(simil) 
simil = simil[ c(1,3,5,6,4,7,2),] 
simil = as.dist(1-simil,diag = TRUE, upper = TRUE) 

> simil 
       Monday Tuesday Wednesday Thursday  Friday Saturday  Sunday 
Monday 0.00000000 0.11704211 0.12591973 0.14823923 0.11776765 0.18846393 0.11001275 
Tuesday 0.11704211 0.00000000 0.12528319 0.14475150 0.13079706 0.18134075 0.09831724 
Wednesday 0.12591973 0.12528319 0.00000000 0.14807889 0.14670687 0.15882920 0.11645089 
Thursday 0.14823923 0.14475150 0.14807889 0.00000000 0.14861510 0.16326744 0.13420166 
Friday 0.11776765 0.13079706 0.14670687 0.14861510 0.00000000 0.19901221 0.14355843 
Saturday 0.18846393 0.18134075 0.15882920 0.16326744 0.19901221 0.00000000 0.13334516 
Sunday 0.11001275 0.09831724 0.11645089 0.13420166 0.14355843 0.13334516 0.00000000 

这可能是由于这样的事实“ simil“是根据quanteda软件包的similarity()函数创建的。