2
我在这里问一个问题,这是非常难以对付how can I group based on similarity in strings发生。我发现了一个好主意,我想尝试一下。我怎么能执行一个函数一次在所有对
这是我的思想和数据(相同的数据作为问题)
df <-structure(list(label = structure(c(5L, 6L, 7L, 8L, 3L, 1L, 2L,
9L, 10L, 4L), .Label = c(" holand", " holandindia", " Holandnorway",
" USAargentinabrazil", "Afghanestan ", "Afghanestankabol", "Afghanestankabolindia",
"indiaAfghanestan ", "USA", "USAargentina "), class = "factor"),
value = structure(c(5L, 4L, 1L, 9L, 7L, 10L, 6L, 3L, 2L,
8L), .Label = c("1941029507", "2367321518", "2849255881",
"2913128511", "2927576083", "4550996370", "457707181.9",
"637943892.6", "796495286.2", "89291651.19"), class = "factor")), .Names = c("label",
"value"), class = "data.frame", row.names = c(NA, -10L))
1-我尝试计算每行中每每个串字母的数目 2-我试图执行adist
每对
如果adist
输出类似于1之间,它们属于一个组,如果没有它们是在两个不同的组
为了解决上述问题,我需要知道如何执行adjst
我的数据的第一列的所有字符串。
所以我的问题是下面
1是有,做相反adjst的功能? 2-我怎样才能在所有组合执行adjst(基于最长的一个时间到最短,例如,
adist("Afghanestankabolindia","Afghanestan")
adist("Afghanestankabolindia","Afghanestankabol")
adist("Afghanestankabolindia","indiaAfghanestan")
adist("Afghanestankabolindia","Holandnorway")
adist("Afghanestankabolindia","holand")
adist("Afghanestankabolindia","holandindia")
.
.
.
棘手的部分是,它应该参考,另一个例如之间发生一次,它应该只计算一次
Afghanestankabolindia and Afghanestan
,而不是
Afghanestan and Afghanestankabolindia
之间的距离是指参考始终是最长的字符串
非常感谢你,我喜欢并接受你的答案 –