利用决策树分分钟生成上千条策略-代码更新

文摘 2024-07-25 08:27 浙江

大家好，我是小伍哥，之前写了决策树风控策略自动化挖掘的文章，阅读的人非常多，每篇都有都七八千了。里面代码引用的库有了较大更新，这里变更下代码，如果使用过程中报错，就改用下面的代码。

最近做课程的时候，发sklearn里面的决策模型调整了叶子节点的设计，从早之前value改成了概率分布，叶子节点之前的版本是是样本个数分布，现在直接是概率了，还原回去。

第一版：可视化函数不好用，现在策略提取也不好用了

风控策略的自动化生成-利用决策树分分钟生成上千条策略

第二版：更新了可视化库，以及策略结果结果整理

风控策略的自动化生成-利用决策树分分钟生成上千条策略

基于决策树的【非连续特征】风控策略自动化挖掘

第三版：在第二版的基础上，更新策略提取函数

from sklearn.tree import _treefrom sklearn import tree
def XiaoWuGe_Get_Rules(clf,X):    n_nodes = clf.tree_.node_count    children_left  = clf.tree_.children_left    children_right = clf.tree_.children_right    feature   = clf.tree_.feature    threshold = clf.tree_.threshold        # sklearn 模型调整了value改成了概率分布，之前是样本分布，需要更改    if clf.tree_.value.mean()<1:        n_value = np.zeros(shape=clf.tree_.value.shape, dtype=int)        for i in range(0,len(clf.tree_.value)):            n_value[i] = clf.tree_.value[i]*clf.tree_.n_node_samples[i]        value = n_value    else:        value = clf.tree_.value        node_depth = np.zeros(shape=n_nodes, dtype=np.int64)    is_leaves  = np.zeros(shape=n_nodes, dtype=bool)    stack = [(0, 0)]        while len(stack) > 0:        node_id, depth = stack.pop()        node_depth[node_id] = depth            is_split_node = children_left[node_id] != children_right[node_id]                if is_split_node:            stack.append((children_left[node_id],  depth+1))            stack.append((children_right[node_id], depth+1))        else:            is_leaves[node_id] = True      feature_name = [            X.columns[i] if i != _tree.TREE_UNDEFINED else "undefined!"            for i in clf.tree_.feature]        ways  = []    depth = []    feat = []    nodes = []    rules = []    for i in range(n_nodes):           if  is_leaves[i]:             while depth[-1] >= node_depth[i]:                depth.pop()                ways.pop()                    feat.pop()                nodes.pop()            if children_left[i-1]==i:#当前节点是上一个节点的左节点，则是小于                a='{f}<={th}'.format(f=feat[-1],th=round(threshold[nodes[-1]],4))                ways[-1]=a                              last =' & '.join(ways)+':'+str(value[i][0][0])+':'+str(value[i][0][1])                rules.append(last)            else:                a='{f}>{th}'.format(f=feat[-1],th=round(threshold[nodes[-1]],4))                ways[-1]=a                last = ' & '.join(ways)+':'+str(value[i][0][0])+':'+str(value[i][0][1])                rules.append(last)                       else: #不是叶子节点 入栈            if i==0:                ways.append(round(threshold[i],4))                depth.append(node_depth[i])                feat.append(feature_name[i])                nodes.append(i)                         else:                 while depth[-1] >= node_depth[i]:                    depth.pop()                    ways.pop()                    feat.pop()                    nodes.pop()                if i==children_left[nodes[-1]]:                    w='{f}<={th}'.format(f=feat[-1],th=round(threshold[nodes[-1]],4))                else:                    w='{f}>{th}'.format(f=feat[-1],th=round(threshold[nodes[-1]],4))                              ways[-1] = w                  ways.append(round(threshold[i],4))                depth.append(node_depth[i])                 feat.append(feature_name[i])                nodes.append(i)    return rules

最新的课程也写了这个方法，更系统，更美观，有需要的可以看看。结果做了很多指标的优化：命中率、命中量、提升度等等。

也有更系统的理论讲解：

课程链接如下

《风控策略自动化挖掘》课程目录

《风控策略自动化》课程上线

注意：长按上面二维码或者文末阅读原文获取课程链接。

往期精彩：

[课程]万物皆网络-风控中的网络挖掘方法

风控中的复杂网络-学习路径图

【实战】从原始数据开始构建GCN算法

信用卡欺诈孤立森林实战案例分析，最佳参数选择、可视化等

风控策略的自动化生成-利用决策树分分钟生成上千条策略

SynchroTrap-基于松散行为相似度的欺诈账户检测算法

20大风控文本分类算法之6-基于BERT的文本分类实战

长按关注本号 长按加我咨询

http://mp.weixin.qq.com/s?__biz=MzA4OTAwMjY2Nw==&mid=2650196089&idx=1&sn=7456b8942f5b29fe70daa8adc8244b5f

小伍哥聊风控

风控策略&算法，内容风控、复杂网络挖掘、图神经网络、异常检测、策略自动化、黑产挖掘、反欺诈、反作弊等