最近做课程的时候,发sklearn里面的决策模型调整了叶子节点的设计,从早之前value改成了概率分布,叶子节点之前的版本是是样本个数分布,现在直接是概率了,还原回去。
第一版:可视化函数不好用,现在策略提取也不好用了
第二版:更新了可视化库,以及策略结果结果整理
第三版:在第二版的基础上,更新策略提取函数
from sklearn.tree import _tree
from sklearn import tree
def XiaoWuGe_Get_Rules(clf,X):
n_nodes = clf.tree_.node_count
children_left = clf.tree_.children_left
children_right = clf.tree_.children_right
feature = clf.tree_.feature
threshold = clf.tree_.threshold
if clf.tree_.value.mean()<1:
n_value = np.zeros(shape=clf.tree_.value.shape, dtype=int)
for i in range(0,len(clf.tree_.value)):
n_value[i] = clf.tree_.value[i]*clf.tree_.n_node_samples[i]
value = n_value
else:
value = clf.tree_.value
node_depth = np.zeros(shape=n_nodes, dtype=np.int64)
is_leaves = np.zeros(shape=n_nodes, dtype=bool)
stack = [(0, 0)]
while len(stack) > 0:
node_id, depth = stack.pop()
node_depth[node_id] = depth
is_split_node = children_left[node_id] != children_right[node_id]
if is_split_node:
stack.append((children_left[node_id], depth+1))
stack.append((children_right[node_id], depth+1))
else:
is_leaves[node_id] = True
feature_name = [
X.columns[i] if i != _tree.TREE_UNDEFINED else "undefined!"
for i in clf.tree_.feature]
ways = []
depth = []
feat = []
nodes = []
rules = []
for i in range(n_nodes):
if is_leaves[i]:
while depth[-1] >= node_depth[i]:
depth.pop()
ways.pop()
feat.pop()
nodes.pop()
if children_left[i-1]==i:
a='{f}<={th}'.format(f=feat[-1],th=round(threshold[nodes[-1]],4))
ways[-1]=a
last =' & '.join(ways)+':'+str(value[i][0][0])+':'+str(value[i][0][1])
rules.append(last)
else:
a='{f}>{th}'.format(f=feat[-1],th=round(threshold[nodes[-1]],4))
ways[-1]=a
last = ' & '.join(ways)+':'+str(value[i][0][0])+':'+str(value[i][0][1])
rules.append(last)
else:
if i==0:
ways.append(round(threshold[i],4))
depth.append(node_depth[i])
feat.append(feature_name[i])
nodes.append(i)
else:
while depth[-1] >= node_depth[i]:
depth.pop()
ways.pop()
feat.pop()
nodes.pop()
if i==children_left[nodes[-1]]:
w='{f}<={th}'.format(f=feat[-1],th=round(threshold[nodes[-1]],4))
else:
w='{f}>{th}'.format(f=feat[-1],th=round(threshold[nodes[-1]],4))
ways[-1] = w
ways.append(round(threshold[i],4))
depth.append(node_depth[i])
feat.append(feature_name[i])
nodes.append(i)
return rules
最新的课程也写了这个方法,更系统,更美观,有需要的可以看看。结果做了很多指标的优化:命中率、命中量、提升度等等。
也有更系统的理论讲解:
课程链接如下
往期精彩:
SynchroTrap-基于松散行为相似度的欺诈账户检测算法