我有许多树的xgboost.dump文本文件。 我想查找所有路径以获取每个路径的值。 这是一棵树。找到来自xgboost.dump的二叉树的所有路径
tree[0]:
0:[a<0.966398] yes=1,no=2,missing=1
1:[b<0.323071] yes=3,no=4,missing=3
3:[c<0.461248] yes=7,no=8,missing=7
7:leaf=0.00972768
8:leaf=-0.0179376
4:[a<0.379082] yes=9,no=10,missing=9
9:leaf=0.0146003
10:leaf=0.0454369
2:[b<0.322352] yes=5,no=6,missing=5
5:[c<0.674868] yes=11,no=12,missing=11
11:leaf=0.0497964
12:leaf=0.00953781
6:[f<0.598267] yes=13,no=14,missing=13
13:leaf=0.0504545
14:leaf=0.0867654
我想所有的路径转换成
path1, a<0.966398, b<0.323071, c<0.461248, leaf = 0.00097268
path2, a<0.966398, b<0.323071, c>0.461248, leaf = -0.0179376
path3, a<0.966398, b>0.323071, a<0.379082, leaf = 0.0146003
path4, a<0.966398, b>0.323071, a>0.379082, leaf = 0.0454369
path5, a>0.966398, b<0.322352, c<0.674868, leaf = 0.0497964
path6, a>0.966398, b<0.322352, c>0.674868, leaf = 0.00953781
path7, a>0.966398, b>0.322352, f<0.598267, leaf = 0.0504545
path8, a>0.966398, b>0.322352, f>0.598267, leaf = 0.0864654
我已经尝试列出像
array([[ 0, 1, 3, 7],
[ 0, 1, 3, 8],
[ 0, 1, 4, 9],
[ 0, 1, 4, 10],
[ 0, 2, 5, 11],
[ 0, 2, 5, 12],
[ 0, 2, 6, 13],
[ 0, 2, 6, 14]])
所有可能的路径,但一旦MAX_DEPTH较高这样会导致错误,一些分支将停止增长,路径将错误。 所以我需要解析文本文件中的yes,no来生成真实的,正确的路径。 有什么建议吗? 谢谢!