2017-07-07 41 views
1

我每次尝试在已使用seqformat转换为STS格式的数据上运行seqdef时,都会看到此错误。我的数据帧的样品看起来像seqdef在R中的错误级别

head(df.new, 10) 
    user_id orderdate   cart to 
1  8   1  produce 30 
2  8  31  produce 60 
3  8  61  produce 70 
4  8  71  produce 92 
5  10   1  produce 30 
6  10  31  produce 42 
7  10  43 meat seafood 56 
8  10  57   deli 77 
9  17   1 beverages 3 
10  17   4 beverages 8 

它具有总共14000行的订单并有一些命令在每个用户的同一天(即订购日期==到),其发生。以下是我用来创建用作seqdef输入的STS数据的代码。

df.form <- seqformat(df.new, id='user_id', begin='orderdate', end='to', status='cart', from='SPELL', to='STS', process=FALSE) 
df.seq <- seqdef(df.form, left='DEL', right = 'unknown', xtstep=10, void = 'unknown') 

错误信息运行seqdef时,我得到的是

[>] found missing values ('NA') in sequence data 
[>] preparing 35000 sequences 
[>] coding void elements with 'unknown' and missing values with '*' 
[>] 21 distinct states appear in the data: 
    1 = alcohol 
    2 = babies 
    3 = bakery 
    4 = beverages 
    5 = breakfast 
    6 = bulk 
    7 = canned goods 
    8 = dairy eggs 
    9 = deli 
    10 = dry goods pasta 
    11 = frozen 
    12 = household 
     ... 
[>] adding special state(s) to the alphabet: unknown 
Error in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels) else paste0(labels, : 
    factor level [24] is duplicated 

我试着删除其中订购日期==来和相同的错误仍然出现这些订单。我将不胜感激任何帮助,我可以解决这个问题。谢谢。

回答

0

发生错误是因为您使用相同的代码('未知')进行右侧错误和空白。

当序列包含“missings”,当你在功能如seqdistseqdplot设置with.missing = TRUE这些missings将被视为一个独立的状态,而空隙用来调整行长度和绘制序列时被简单地忽略(seqplot)或计算差异(seqdist)。