2

我是新来的Vowpal Wabbit,它可能是我错过了一些非常明显的东西。使用Vowpal Wabbit获取大量的NaN预测用于分类

我有CSV的培训数据,我已经分为80%的培训和20%的测试。它包含62个特征(x0-x61),总共定义了7个类(0-6)。

x0,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15,x16,x17,x18,x19,x20,x21,x22,x23,x24,x25,x26,x27,x28,x29,x30,x31,x32,x33,x34,x35,x36,x37,x38,x39,x40,x41,x42,x43,x44,x45,x46,x47,x48,x49,x50,x51,x52,x53,x54,x55,x56,x57,x58,x59,x60,x61,y 
190436e528,16a14a2d17,06330986ed,ca63304de0,b7584c2d52,1746600cb0,1,1,0.4376549941058605,5624b8f759,152af2cb2f,91bb549494,e33c63cf35,1178.0,cc69cbe29a,617a4ad3f9,e8a040423a,c82c3dbd33,ee3501282b,199ce7c484,5f17dedd5c,5c5025bd0a,9aba4d7f51,24.94393348850157,-0.8146595838365664,-0.7083080633874904,1.5,-0.5124221809900756,-0.7339666422629345,0.3333333333333333,14.837727864583336,11.0,0.0,24.0,0.0,0.0,1.0,29.0,0.0,3.0,11.0,4.42,0.15,0.161,0.2,1.0,1.0,1.0,1.0,1.0,0.52,0.5329999999999999,0.835,-0.5865396521883026,0.6724356815192951,0.0,0.6060606060606061,0.12121212121212124,0.21212121212121213,0.060606060606060615,0.0,33.0,3 
a4c3095b75,16a14a2d17,06330986ed,ca63304de0,b7584c2d52,1746600cb0,1,1,0.4809765809625592,7e5c97705a,e071d01df5,91bb549494,e33c63cf35,5777.0,6e40247e69,617a4ad3f9,4b9480aa42,e84655292c,527b6ca8cc,dd9c9e0da2,17c99905b6,0fc56ea1f0,9aba4d7f51,31.08028213771883,-0.3717867728837517,-0.3676156090868885,1.6666666666666663,0.2713072335472944,0.013112469951855535,17.333333333333325,1713.439127604167,33.0,0.0,6.0,1.0,0.6666666666666665,8.0,108.0,1.0,4.0,86.0,1.58,0.05,2.032,2.4,0.348,0.762,0.55,0.392,0.489,0.517,1.0,0.642,0.9609909328437232,0.7909897767145201,0.020161290322580645,0.6451612903225806,0.25806451612903225,0.03629032258064517,0.04032258064516129,0.0,248.0,3 
aa2f3cd34a,16a14a2d17,06330986ed,ca63304de0,b7584c2d52,1746600cb0,1,1,-2.9847503675733384,f67f142e40,c28b638881,91bb549494,e33c63cf35,-1.0,fe8fb80553,617a4ad3f9,718c61545b,c26d08129a,cac4fc8eaf,199ce7c484,60299bc448,76ba8f7080,9aba4d7f51,41.40215922501433,-0.043850620710912905,-0.043227755140810106,3.5,0.19464028583619075,-0.2926973864217809,11.333333333333332,732.8046875,106.0,0.0,14.0,0.0,0.0,1.0,21.0,2.0,3.0,14.0,7.17,0.24,0.645,0.6,0.25,0.5,0.5,0.0,0.773,0.899,0.0,0.0,-0.0818699491854678,0.6639345368601952,0.0,0.0,0.0,0.0,1.0,0.0,1.0,4 
bfff7d2d9e,16a14a2d17,06330986ed,ca63304de0,a62168d626,1746600cb0,1,1,0.6542629283893542,7b1f0ca4c1,1d42d0c490,669ea3d319,b38690945d,1602.0,6e40247e69,617a4ad3f9,718c61545b,d3dc404c37,7263b01813,dd9c9e0da2,17c99905b6,2cc3e04172,9aba4d7f51,32.11392568242685,0.2843684594325347,0.23249501198439226,5.0,-0.19979368911718315,0.3743375351985674,1101.0,0.44580078125,16.0,0.16666666666666666,6.0,1.0,0.5,5.0,209.0,3.0,2.0,43.0,12.08,0.4,2.613,2.8,0.5,0.556,0.875,0.612,0.064,0.0,0.435,0.785,0.5158309700290646,-0.1150902907744278,0.05945945945945946,0.8,0.06486486486486487,0.045045045045045036,0.014414414414414416,0.016216216216216217,555.0,2 

我一直在使用phraug2/csv2vw.py转换成CSV格式Vowpal的 转换后的数据如下:

3.0 |n c1_b4d8a653ea c2_16a14a2d17 c3_06330986ed c4_ca63304de0 c5_a62168d626 c6_1746600cb0 7:1 8:1 9:-0.6887062641683063 c10_7e5c97705a c11_e5df3eff9b c12_91bb549494 c13_e33c63cf35 14:3694.0 c15_6e40247e69 c16_617a4ad3f9 c17_718c61545b c18_c26d08129a c19_634e3cf3ac c20_dd9c9e0da2 c21_17c99905b6 c22_513a3e3f36 c23_9aba4d7f51 24:40.57961189718329 25:-0.11269265451935975 26:-0.17219069579806134 27:1.1666666666666663 28:1.6745384722167482 29:0.6308894281294708 30:37.0 31:1.294921875 32:55.0 33:0.16666666666666666 34:10.0 37:1.0 38:9.0 40:1.0 41:23.0 42:3.67 43:0.12 44:1.935 45:2.2 46:0.625 47:0.25 48:0.125 50:0.813 51:0.07400000000000001 52:0.634 53:0.5479999999999999 54:0.2353332208066929 55:0.2649521447821752 57:0.3333333333333333 58:0.3333333333333333 59:0.3333333333333333 62:9.0 
5.0 |n c1_467f9617a3 c2_16a14a2d17 c3_06330986ed c4_ca63304de0 c5_b7584c2d52 c6_1746600cb0 7:1 8:1 9:0.8708708626728477 c10_5624b8f759 c11_fa0b797a92 c12_669ea3d319 c13_f178803074 14:18156.0 c15_01ede04b4b c16_617a4ad3f9 c17_718c61545b c18_d342e2765f c19_bb20e1ca06 c20_8a6c8cef83 c21_1b02793146 c22_992153ed65 c23_9aba4d7f51 24:28.76550293196428 25:2.6122849082704658 26:2.1590908057403015 27:4.0 28:1.7107137612171608 29:1.7135384162978815 30:0.16666666666666666 31:0.027669270833333325 32:109.0 34:31.0 37:1.0 38:244.0 39:1.0 40:1.0 41:68.0 42:17.25 43:0.57 44:3.452 45:4.0 46:0.409 47:0.619 48:0.579 49:0.248 50:0.34600000000000003 51:0.541 52:0.522 54:1.782346041542782 55:1.3224094711633876 56:0.011647254575707157 57:0.39767054908485855 58:0.2396006655574044 59:0.2495840266222961 60:0.06821963394342763 61:0.033277870216306155 62:601.0 
4.0 |n 1:190436e528 c2_16a14a2d17 c3_06330986ed c4_ca63304de0 c5_b7584c2d52 c6_1746600cb0 7:1 8:1 9:0.4376549941058605 c10_5624b8f759 c11_152af2cb2f c12_91bb549494 c13_e33c63cf35 14:1178.0 c15_cc69cbe29a c16_617a4ad3f9 c17_e8a040423a c18_c82c3dbd33 c19_ee3501282b c20_199ce7c484 c21_5f17dedd5c c22_5c5025bd0a c23_9aba4d7f51 24:24.94393348850157 25:-0.8146595838365664 26:-0.7083080633874904 27:1.5 28:-0.5124221809900756 29:-0.7339666422629345 30:0.3333333333333333 31:14.837727864583336 32:11.0 34:24.0 37:1.0 38:29.0 40:3.0 41:11.0 42:4.42 43:0.15 44:0.161 45:0.2 46:1.0 47:1.0 48:1.0 49:1.0 50:1.0 51:0.52 52:0.5329999999999999 53:0.835 54:-0.5865396521883026 55:0.6724356815192951 57:0.6060606060606061 58:0.12121212121212124 59:0.21212121212121213 60:0.060606060606060615 62:33.0 
5.0 |n c1_43859085bc c2_16a14a2d17 c3_06330986ed c4_ca63304de0 c5_a62168d626 c6_1746600cb0 7:1 8:1 9:0.004439125538873309 c10_f67f142e40 c11_c4dd2197c3 c12_91bb549494 c13_e33c63cf35 14:14559.0 c15_6e40247e69 c16_617a4ad3f9 c17_718c61545b c18_c26d08129a c19_9e166b965d c20_466f8951b0 c21_fde72a6d5c c22_acfadc5c01 c23_9aba4d7f51 24:41.57685954242976 25:-0.9078334231173404 26:-0.7617355673740658 27:0.5 28:-0.6275253732641191 29:-0.8058011722835874 30:1.1666666666666663 31:0.00439453125 33:0.5 37:7.0 38:7.0 40:3.0 41:15.0 42:8.92 43:0.29 44:0.226 45:0.8 51:1.0 54:-1.6003257882042399 55:-1.8386800640762528 57:1.0 62:1.0 
3.0 |n c1_bfff7d2d9e c2_16a14a2d17 c3_06330986ed c4_ca63304de0 c5_a62168d626 c6_1746600cb0 7:1 8:1 9:0.6542629283893542 c10_7b1f0ca4c1 c11_1d42d0c490 c12_669ea3d319 c13_b38690945d 14:1602.0 c15_6e40247e69 c16_617a4ad3f9 c17_718c61545b c18_d3dc404c37 c19_7263b01813 c20_dd9c9e0da2 c21_17c99905b6 c22_2cc3e04172 c23_9aba4d7f51 24:32.11392568242685 25:0.2843684594325347 26:0.23249501198439226 27:5.0 28:-0.19979368911718315 29:0.3743375351985674 30:1101.0 31:0.44580078125 32:16.0 33:0.16666666666666666 34:6.0 35:1.0 36:0.5 37:5.0 38:209.0 39:3.0 40:2.0 41:43.0 42:12.08 43:0.4 44:2.613 45:2.8 46:0.5 47:0.556 48:0.875 49:0.612 50:0.064 52:0.435 53:0.785 54:0.5158309700290646 55:-0.1150902907744278 56:0.05945945945945946 57:0.8 58:0.06486486486486487 59:0.045045045045045036 60:0.014414414414414416 61:0.016216216216216217 62:555.0 

然后我试图做多分类,一个反对所有建设型号:

vw ./train_my.text -f predictor.vw --oaa 7 --passes 5 --cache_file cache 

不过,我得到很多,失去了NAN预测:

NAN prediction in example 21643, forcing 0.000000 
NAN prediction in example 21643, forcing 0.000000 
NAN prediction in example 21644, forcing 0.000000 
NAN prediction in example 21644, forcing 0.000000 
NAN prediction in example 21644, forcing 0.000000 
NAN prediction in example 21644, forcing 0.000000 
NAN prediction in example 21644, forcing 0.000000 
NAN prediction in example 21644, forcing 0.000000 
NAN prediction in example 21644, forcing 0.000000 
NAN prediction in example 21705, forcing 0.000000 
NAN prediction in example 21705, forcing 0.000000 
NAN prediction in example 21705, forcing 0.000000 
NAN prediction in example 21705, forcing 0.000000 
NAN prediction in example 21705, forcing 0.000000 
NAN prediction in example 21705, forcing 0.000000 
NAN prediction in example 21705, forcing 0.000000 
NAN prediction in example 21707, forcing 0.000000 
NAN prediction in example 21707, forcing 0.000000 
NAN prediction in example 21707, forcing 0.000000 
NAN prediction in example 21707, forcing 0.000000 
NAN prediction in example 21707, forcing 0.000000 
NAN prediction in example 21707, forcing 0.000000 
NAN prediction in example 21707, forcing 0.000000 
NAN prediction in example 21735, forcing 0.000000 
NAN prediction in example 21735, forcing 0.000000 
NAN prediction in example 21735, forcing 0.000000 
NAN prediction in example 21735, forcing 0.000000 
NAN prediction in example 21735, forcing 0.000000 
NAN prediction in example 21735, forcing 0.000000 
NAN prediction in example 21735, forcing 0.000000 
NAN prediction in example 21790, forcing 0.000000 
NAN prediction in example 21790, forcing 0.000000 
NAN prediction in example 21790, forcing 0.000000 
NAN prediction in example 21790, forcing 0.000000 
NAN prediction in example 21790, forcing 0.000000 
NAN prediction in example 21790, forcing 0.000000 
NAN prediction in example 21790, forcing 0.000000 
NAN prediction in example 21794, forcing 0.000000 
NAN prediction in example 21794, forcing 0.000000 
NAN prediction in example 21794, forcing 0.000000 
NAN prediction in example 21794, forcing 0.000000 
NAN prediction in example 21794, forcing 0.000000 
NAN prediction in example 21794, forcing 0.000000 
NAN prediction in example 21794, forcing 0.000000 
NAN prediction in example 21796, forcing 0.000000 
NAN prediction in example 21796, forcing 0.000000 
NAN prediction in example 21796, forcing 0.000000 
NAN prediction in example 21796, forcing 0.000000 
NAN prediction in example 21796, forcing 0.000000 
NAN prediction in example 21796, forcing 0.000000 

,平均损失说,模型不能真正预测什么

number of examples per pass = 36063 
passes used = 4 
weighted example sum = 144252.000000 
weighted label sum = 0.000000 
average loss = 0.801797 h 
total feature number = 7598612 

我在做什么错?

+0

嗨!你能为每个课程提供至少几个例子吗?我得到和你一样的错误。 –

+0

错误在示例#3中。按照惯例,功能1:190436e528应为c1_190436e528。 –

+0

@xeon嗯..你认为csv2vw脚本可以搞砸数据吗?这里是CSV文件的链接和转换成VW格式的相同文件。 – intellion

回答

0

这是因为在vowpal wabbit中使用变量和权重进行训练。 (即)x1:12334234或x1:1e-30。如果你拿掉你的变量的权重或者扩展它们......这个问题就会消失。你也可能想要扩大变量的逻辑回归值。

+0

检查对此错误的回应:https://github.com/JohnLangford/vowpal_wabbit/issues/756 –

相关问题