2015-05-20 74 views
0

我想分析马拉松数据。我建立一个简单的模型,并创造了一个决策树:R中的决策树公式

fit <- rpart(timeCategory ~ country + age.group + participated.times, data=data) 

我的目标是创建一个通用的公式预测结果,like in this article (page 4)enter image description here

我该如何在R中做到这一点,使用什么技术?因此,我希望有一个提供属性作为刺痛的公式。

数据:我使用的一些真实数据可能是downloaded here。 读取数据如下:

data = read.table("data/processedData.txt", header=T) 
data$timeCategory <- ntile(data$time, 10) 
+0

看看'预测。 rpart'。看起来语法是'predict(fit,newdata)'。如果你想输入图像中的函数,你必须从'fit'中提取系数并进行一些字符串操作。 – Frank

+0

@Frank你能提供更多的细节还是一些虚拟的例子?我不明白。 – Bob

+0

@Bob:你有责任提供这个例子。 –

回答

1

这些是使用时间作为连续值,这是在本例中被提供预测的类型的回归系数。它们可以用来构建您要求的配方类型。

> lmfit <- lm(time ~ country + age.group + particip.time, data=data) 
> lmfit 

Call: 
lm(formula = time ~ country + age.group + particip.time, data = data) 

Coefficients: 
     (Intercept)  countryJõgeva countryLääne-Viru  countryLäti 
     9526.702   345.930   122.513   -73.239 
    countryLeedu  countryPärnu  countryRapla countrySaaremaa 
      120.592   -78.086   -208.882   114.292 
    countryTallinn  countryTartu countryViljandi  age.groupM20 
      -37.536    55.771   -70.417   -142.600 
    age.groupM21  age.groupM35  age.groupM40  age.groupM45 
     -218.225   -218.067   -20.108   -196.331 
    age.groupM50  particip.time 
      88.342    -2.487 

如果你想他们都排着队则:

> as.matrix(coef(lmfit)) 
         [,1] 
(Intercept)  9526.702146 
countryJõgeva  345.930334 
countryLääne-Viru 122.513294 
countryLäti  -73.239333 
countryLeedu  120.591585 
countryPärnu  -78.086107 
countryRapla  -208.882244 
countrySaaremaa 114.291592 
countryTallinn  -37.535659 
countryTartu  55.771326 
countryViljandi -70.416659 
age.groupM20  -142.599598 
age.groupM21  -218.224754 
age.groupM35  -218.066655 
age.groupM40  -20.108242 
age.groupM45  -196.331263 
age.groupM50  88.341978 
particip.time  -2.486818 

进行进一步加工的文字:

> form <- as.matrix(coef(lmfit)) 
> rownames(form) <- gsub("try", "try == ", rownames(form)) 
> rownames(form) <- gsub("oup", "oup == ", rownames(form)) 
> form 
          [,1] 
(Intercept)   9526.702146 
country == Jõgeva  345.930334 
country == Lääne-Viru 122.513294 
country == Läti  -73.239333 
country == Leedu  120.591585 
country == Pärnu  -78.086107 
country == Rapla  -208.882244 
country == Saaremaa 114.291592 
country == Tallinn  -37.535659 
country == Tartu  55.771326 
country == Viljandi -70.416659 
age.group == M20  -142.599598 
age.group == M21  -218.224754 
age.group == M35  -218.066655 
age.group == M40  -20.108242 
age.group == M45  -196.331263 
age.group == M50  88.341978 
particip.time   -2.486818 

几乎完全:

cat(paste(form, paste0("(", rownames(form), ")"), sep="*", collapse="+\n")) 

9526.70214596473*((Intercept))+ 
345.93033373724*(country == Jõgeva)+ 
122.51329418344*(country == Lääne-Viru)+ 
-73.2393326763322*(country == Läti)+ 
120.591584530399*(country == Leedu)+ 
-78.0861070429056*(country == Pärnu)+ 
-208.882244416016*(country == Rapla)+ 
114.291592299937*(country == Saaremaa)+ 
-37.5356589458207*(country == Tallinn)+ 
55.771326363022*(country == Tartu)+ 
-70.4166587941724*(country == Viljandi)+ 
-142.599598141679*(age.group == M20)+ 
-218.224754448193*(age.group == M21)+ 
-218.066655292225*(age.group == M35)+ 
-20.1082422022072*(age.group == M40)+ 
-196.33126335145*(age.group == M45)+ 
88.3419781798024*(age.group == M50)+ 
-2.48681789339678*(particip.time)