2017-07-26 71 views
0

我想通过使用rpy2在我的python脚本中嵌入一些R库。我已成功嵌入“stats.lm”,但现在我想嵌入“randomForest”。使用rpy2从python调用R库“randomForest”

import pandas as pd 
from rpy2.robjects.packages import importr 
from rpy2.robjects import r, pandas2ri 
import rpy2.robjects as robjects 

randomForest=importr('randomForest') 

pandas2ri.activate() 

#read data 
df = pd.read_csv('train.csv',index_col=0) 
rdf = pandas2ri.py2ri(df) 

#check 
print(type(rdf)) 
print(rdf) 

#Random Forest 
formula = 'target ~ .' 
fit_full = randomForest(formula, data=rdf) 

的输出是:

Traceback (most recent call last): 

    File "<ipython-input-5-776f4072f19e>", line 2, in <module> 
    fit_full = randomForest(formula, data=rdf) 

TypeError: 'InstalledSTPackage' object is not callable 

我已经成功地使用这个包中的R,以此数据集的模型。 “train.csv”是几十万个样本(行)和大约94列的矩阵:93个特征(等级整数),1个目标(等级因子)。目标列有9个类(Class_1,...,Class_9)。

-----------------编辑-----------------

部分解决方案可能是直接嵌入代码中包含的模型和预测功能:

import rpy2.robjects as robjects 
import rpy2 
from rpy2.robjects import pandas2ri 

rpy2.__version__ 

robjects.r(''' 
      f <- function() { 

        library(randomForest) 

        train <- read.csv("train.csv") 
        train1 <- train[sample(c(1:60000), 5000, replace = TRUE),2:95] 

        train1.rf <- randomForest(target ~ ., data = train1, 
              importance = TRUE, 
              do.trace = 100) 

        pred <- as.data.frame(predict(train1.rf, train1[1:100,1:93])) 

      } 
      ''') 

r_f = robjects.globalenv['f'] 
pred=pandas2ri.ri2py(r_f()) 

但我仍然不知道是否有更好的解决方案(即存储模式“train1.rf”,太)。

回答

0

这就是我正在寻找:

import rpy2.robjects as robjects 
from rpy2.robjects import pandas2ri 
import pandas as pd 
import random 

pandas2ri.activate() 

df = pd.read_csv('train.csv',index_col=0) 



train=df.iloc[random.sample(range(1,60000), 5000),0:94] 
test=df.iloc[random.sample(range(1,60000), 100),0:93] 


rtrain = pandas2ri.py2ri(train) 
print(rtrain) 
rtest = pandas2ri.py2ri(test) 
print(rtest) 


robjects.r(''' 
      f <- function(train) { 

        library(randomForest) 
        train1.rf <- randomForest(target ~ ., data = train, importance = TRUE, do.trace = 100) 

      } 
      ''') 
r_f = robjects.globalenv['f'] 
rf_model=(r_f(rtrain)) 


robjects.r(''' 
      g <- function(model,test) { 

        pred <- as.data.frame(predict(model, test)) 

      } 
      ''') 

r_g = robjects.globalenv['g'] 
pred=pandas2ri.ri2py(r_g(rf_model,rtest))