2017-10-15 38 views
0

定制变压器我下面的sklearn_pandas穿行在sklearn_pandas README on github发现,我试图修改DateEncoder()定制变压器为例做2个额外的东西:其将日期,新列

  • 转换字符串类型的列以日期格式作为参数时的日期时间
  • 当吐出新列时附加原始列名称。例如:如果输入列:Date1则输出:Date1_year,Date1_month,Date_1日。

这里是我的尝试(与sklearn管道的一个相当基本的了解):

import pandas as pd 
import numpy as np 
from sklearn.base import TransformerMixin, BaseEstimator 
from sklearn_pandas import DataFrameMapper 

class DateEncoder(TransformerMixin): 

    ''' 
    Specify date format using python strftime formats 
    ''' 

    def __init__(self, date_format='%Y-%m-%d'): 
     self.date_format = date_format 

    def fit(self, X, y=None): 
     self.dt = pd.to_datetime(X, format=self.date_format) 
     return self 

    def transform(self, X): 
     dt = X.dt 
     return pd.concat([dt.year, dt.month, dt.day], axis=1) 


data = pd.DataFrame({'dates1': ['2001-12-20','2002-10-21','2003-08-22','2004-08-23', 
           '2004-07-20','2007-12-21','2006-12-22','2003-04-23'], 
        'dates2' : ['2012-12-20','2009-10-21','2016-08-22','2017-08-23', 
           '2014-07-20','2011-12-21','2014-12-22','2015-04-23']}) 

DATE_COLS = ['dates1', 'dates2'] 

Mapper = DataFrameMapper([(i, DateEncoder(date_format='%Y-%m-%d')) for i in DATE_COLS], input_df=True, df_out=True) 
test = Mapper.fit_transform(data) 

但在运行时,我收到以下错误:

AttributeError: Can only use .dt accessor with datetimelike values 

为什么我收到这个错误和如何解决它? 任何帮助与上面提到的原始列重命名列名(Date1_year,Date1_month,Date_1天)将不胜感激!

+0

您在'fit'中将'X'转换为'self.dt'处的日期时间,但'transform()'不能与'self.dt'一起使用。 'X.dt'因为'X'不是datetime类型而失败。 –

回答

0

我能够打破数据格式转换和日期分割成两个单独的变压器,它的工作。

import pandas as pd 
from sklearn.base import TransformerMixin 
from sklearn_pandas import DataFrameMapper 



data2 = pd.DataFrame({'dates1': ['2001-12-20','2002-10-21','2003-08-22','2004-08-23', 
           '2004-07-20','2007-12-21','2006-12-22','2003-04-23'], 
        'dates2' : ['2012-12-20','2009-10-21','2016-08-22','2017-08-23', 
           '2014-07-20','2011-12-21','2014-12-22','2015-04-23']}) 

class DateFormatter(TransformerMixin): 

    def fit(self, X, y=None): 
     # stateless transformer 
     return self 

    def transform(self, X): 
     # assumes X is a DataFrame 
     Xdate = X.apply(pd.to_datetime) 
     return Xdate 


class DateEncoder(TransformerMixin): 

    def fit(self, X, y=None): 
     return self 

    def transform(self, X): 
     dt = X.dt 
     return pd.concat([dt.year, dt.month, dt.day], axis=1) 


DATE_COLS = ['dates1', 'dates2'] 

datemult = DataFrameMapper(
      [ (i,[DateFormatter(),DateEncoder()]) for i in DATE_COLS  ] 
      , input_df=True, df_out=True) 

df = datemult.fit_transform(data2) 

此代码输出:

Out[4]: 
    dates1_0 dates1_1 dates1_2 dates2_0 dates2_1 dates2_2 
0  2001  12  20  2012  12  20 
1  2002  10  21  2009  10  21 
2  2003   8  22  2016   8  22 
3  2004   8  23  2017   8  23 
4  2004   7  20  2014   7  20 
5  2007  12  21  2011  12  21 
6  2006  12  22  2014  12  22 
7  2003   4  23  2015   4  23 

但是我仍然在寻找一种方式来命名新列,同时将DateEncoder()变压器。例如:dates_1_0dates_1_yeardates_2_2dates_2_month。我很乐意选择它作为解决方案。

相关问题