2016-07-14 60 views
10

想象一下如下因素序列:排序二进制序列有R

0000 
0001 
0010 
0011 
0100 
0101 
0110 
0111 
1000 
1001 
1010 
1011 
1100 
1101 
1110 
1111 

我想是因为相似的排序顺序的序列:

0000 
0001 
0010 
0100 
1000 
0011 
... 

2,3,4,5号线与第1行具有相同的相似性,因为它们仅相差一位。所以第2,3,4,5行的顺序也可以是3,2,5,4。

接下来是第6行,因为它与第1行相差2位。

这可以用R来完成吗?

回答

7

x <- c("0000", "0001", "0010", "0011", "0100", "0101", "0110", "0111", 
     "1000", "1001", "1010", "1011", "1100", "1101", "1110", "1111") 

1)使用digitsum功能从this答案:

digitsum <- function(x) sum(floor(x/10^(0:(nchar(x) - 1))) %% 10) 
x[order(sapply(as.numeric(x), digitsum))] 
# [1] "0000" "0001" "0010" "0100" "1000" "0011" "0101" "0110" "1001" "1010" "1100" 
# [12] "0111" "1011" "1101" "1110" "1111" 

2)使用正则表达式:

x[order(gsub(0, "", x))] 
# [1] "0000" "0001" "0010" "0100" "1000" "0011" "0101" "0110" "1001" "1010" "1100" 
# [12] "0111" "1011" "1101" "1110" "1111" 
+0

而不是digitum函数,难道你不这样做:'x [order(sapply(strsplit(x,“”),function(x)sum(x == 1)))] ' – eipi10

+1

@ eipi10,当然,但可能正则表达式的解决方案将会比其他涉及数字求和的任何其他解决方案更加整洁。 – Julius

+0

我同意。但是,找出所有第二好的方式去做R的事情确实很有趣。 – eipi10

1

嗯,这是我的尝试。试试看看它是否适合你的需求。它不依赖于stringr

library('stringr') 
# Creates a small test data frame to mimic the data you have. 
df <- data.frame(numbers = c('0000', '0001', '0010', '0011', '0100', '0101', '0111', '1000'), stringsAsFactors = FALSE) 
df$count <- str_count(df$numbers, '1') # Counts instances of 1 occurring in each string 
df[with(df, order(count)), ] # Orders data frame by number of counts. 

    numbers count 
1 0000  0 
2 0001  1 
3 0010  1 
5 0100  1 
8 1000  1 
4 0011  2 
6 0101  2 
7 0111  3 
+0

这只能如果第一个条目是'0000'。 OP可能需要更通用的解决方案 –

3

因为我们正在谈论串的距离,你可能想使用stringdist功能从stringdist包来完成:

library(stringdist) 
x <- c("0000", "0001", "0010", "0011", "0100", "0101", "0110", "0111", 
     "1000", "1001", "1010", "1011", "1100", "1101", "1110", "1111") 

#stringdistmatrix(x) will calculate the pairwise distances from the lowest value 
#0000 in this case 
distances <- stringdistmatrix(x, '0000') 

#use the distances to order the vector 
x[order(distances)] 
#[1] "0000" "0001" "0010" "0100" "1000" "0011" "0101" "0110" 
# "1001" "1010" "1100" "0111" "1011" "1101" "1110" "1111" 

或者一气呵成:

x[order(stringdist(x, '0000'))]