2015-06-15 57 views
3

我正在通过mlogit让我的数据集形状中进行多项式逻辑分析。我的数据集可从以下代码中的url获得。mlogit的格式化数据

,我发现了以下错误:

Error in row.names<-.data.frame (*tmp* , value = c("1.Accessible", "1.Accessible", : duplicate 'row.names' are not allowed

我在其他地方检查,这个问题似乎就上来了。我试过玩alt.levels而不是alt.var的说法,这是行不通的。

#Loadpackages 
library(RCurl) 
library(mlogit) 
library(tidyr) 
library(dplyr) 
#URL where data is stored 
dat.url<- 'https://raw.githubusercontent.com/sjkiss/Survey/master/mlogit.out.csv' 
#Get data 
dat<-read.csv(dat.url) 
#Complete cases only as it seems mlogit cannot handle missing values or tied data which in this case you might get because of median imputation 
dat<-dat[complete.cases(dat),] 
#Tidy data to get it into long format 
dat.out<-dat %>% 
gather(Open, Rank, -c(1,9:12)) 
#Try to replicate code on pp.26-27 of http://cran.r- project.org/web/packages/mlogit/vignettes/mlogit.pdf 
mlogit.out<-mlogit.data(dat.out, shape='long',alt.var='Open',choice='Rank', id.var='X',ranked=TRUE) 
#Try this option as per a discussion on stackexchange 
mlogit.out<-mlogit.data(dat.out,  shape='long',alt.levels='Open',choice='Rank', id.var='X',ranked=TRUE) 
+0

啊。 **使用* reshape/reshape2/cast *包**。当我花费两三天的时间尝试将数据按摩到mlogit的形式时,你会给我模糊的倒叙,与* reshape/reshape2/cast *进行争吵。最后,我发现在我的特定问题上,mlogit表现不及其他算法。哦,我笑了。美好时光,美好时光。 – smci

回答

0
dat.out<-dat %>% 
gather(Open, Rank, -c(1,9:12)) %>%  
arrange(X, Open, Rank) 
    mlogit.out<-mlogit.data(dat.out, shape='long',alt.var='Open',choice='Rank', ranked=TRUE,child.var='X') 

head(mlogit.out) 
       X economic gender age      Job  Open Rank 
1.Accessible 1  5 Male 1970 Professional journalist Accessible FALSE 
1.Information 1  5 Male 1970 Professional journalist Information FALSE 
1.Responsive 1  5 Male 1970 Professional journalist Responsive TRUE 
1.Debate  1  5 Male 1970 Professional journalist  Debate FALSE 
1.Officials 1  5 Male 1970 Professional journalist Officials FALSE 
1.Social  1  5 Male 1970 Professional journalist  Social FALSE 
0

我的建议是,你可以试试nnet包中的multinom()函数。它不需要特殊格式的mlogit或mnlogit。

library(RCurl) 
library(nnet) 

Data<-getURL("https://raw.githubusercontent.com/sjkiss/Survey/master/mlogit.out.csv") 
Data<-read.csv(text=Data,header=T) 
Data<-na.omit(Data) # Get rid of NA's 
Data<-as.data.frame(Data) 
# relevel the dependent variable (must be a factor) 
Data$Job<-factor(Data$Job) 
# Using "Online Blogger" as the reference, substitute with your choice 
Data$Job<-relevel(Data$Job,ref="Online blogger") 
# Run the multinomial logistic regression 
# (seems like an awful lot of variables btw) 
Data<-multinom(formula=Job~Accessible+Information+Responsive+Debate+Officials+Social+Trade.Offs+economic+gender+age,data=Data) 
+0

我想使用mlogit包的原因之一是因为它明确地处理了小插曲中的排名数据。我的数据是排名数据,尽管所有协变量都是个体水平的协变量。没有替代具体的协变量。在这种情况下,我能否将其视为一个直多项式逻辑回归? – spindoctor

+0

你的意思是因变量是序数?如果是这样,你仍然可以使用多项式逻辑回归,与序数逻辑回归相比,只会有轻微的功率损失。如果你想运行一个简单的有序逻辑回归,你可以在MASS包中使用polr()。 如果自变量是序数,则无关紧要。 – Andy

+0

您只需按'X,Open,Rank'排序。看到我的答案。 – user227710