2016-12-16 33 views
1

我的数据集的格式如下所示:如何将数值分类数据转换为张量流中的稀疏张量?

8,2,1,1,1,0,3,2,6,2,2,2,2 
8,2,1,2,0,0,15,2,1,2,2,2,1 
5,5,4,4,0,0,6,1,6,2,2,1,2 
8,2,1,3,0,0,2,2,6,2,2,2,2 
8,2,1,2,0,0,3,2,1,2,2,2,1 
8,2,1,4,0,1,3,2,1,2,2,2,1 
8,2,1,2,0,0,3,2,1,2,2,2,1 
8,2,1,3,0,0,2,2,6,2,2,2,2 
8,2,1,12,0,0,5,2,2,2,2,2,1 
3,1,1,2,0,0,3,2,1,2,2,2,1 

它由所有分类数据,各功能的数字编码的。我试着用下面的代码:

 monthly_income = tf.contrib.layers.sparse_column_with_keys("monthly_income", keys=['1','2','3','4','5','6']) 
     #Other columns are also declared in the same way 

     m = tf.contrib.learn.LinearClassifier(feature_columns=[ 
     caste, religion, differently_abled, nature_of_activity, school, dropout, qualification, 
     computer_literate, monthly_income, smoke,drink,tobacco,sex], 
     model_dir=model_dir) 

但我收到以下错误:

TypeError: Signature mismatch. Keys must be dtype <dtype: 'string'>, got <dtype: 'int64'>. 

回答

4

我认为问题是,你所示的代码外。我的猜测是,csv文件中的功能是作为整数读取的,但您希望它们是字符串,通过keys=['1', '2', ...]

然而,在这种情况下,我建议你使用sparse_column_with_integerized_feature

monthly_income = tf.contrib.layers.sparse_column_with_integerized_feature("monthly_income", bucket_size=7) 
相关问题