上海交大姚韬教授在顶刊MS上发表:多臂老虎机算法用于高维数据决策

学术   2024-11-16 11:37   北京  

据MS官网显示,来自阿里巴巴达摩院的Xue Wang、纽约州立大学布法罗分校的Mike Mingcheng Wei、上海交通大学的姚韬,合作的论文“Online Learning and Decision Making Under Generalized Linear Model with High-Dimensional Data”在国际管理学顶刊《Management Science》线上正式发表。



Title: Online Learning and Decision Making Under Generalized Linear Model with High-Dimensional Data

在线学习和决策制定在广义线性模型下的高维数据



作者简介


Xue Wang

阿里巴巴达摩院

Mike Mingcheng Wei

纽约州立大学布法罗分校

姚韬

上海交通大学 安泰经管学院




摘要


We propose a minimax concave penalized multiarmed bandit algorithm under the generalized linear model (G-MCP-Bandit) for decision-makers facing high-dimensional data in an online learning and decision-making environment. We demonstrate that in the data-rich regime, the G-MCP-Bandit algorithm attains the optimal cumulative regret in the sample size dimension and a tight bound in the covariate dimension and the significant covariate dimension. In the data-poor regime, the G-MCP-Bandit algorithm maintains a tight regret upper bound. In addition, we develop a local linear approximation method, the two-step weighted Lasso procedure, to identify the minimax concave penalty (MCP) estimator for the G-MCP-Bandit algorithm when samples are not independent and identically distributed. Under this procedure, the MCP estimator can match the oracle estimator with high probability and converge to the true parameters at the optimal convergence rate. Finally, through experiments based on both synthetic and real data sets, we show that the G-MCP-Bandit algorithm outperforms other benchmarking algorithms in terms of cumulative regret and that the benefits of the G-MCP-Bandit algorithm increase in the data’s sparsity level and the size of the decision set.


本文提出了一种在广义线性模型(G-MCP-Bandit)下的最小最大凹形惩罚多臂老虎机算法,用于在线学习和决策环境中面对高维数据的决策者。本文证明,在数据丰富的环境下,G-MCP-Bandit算法在样本大小维度上达到了最优的累积遗憾,并在协变量维度和显著协变量维度上得到了紧密的界限。在数据匮乏的环境下,G-MCP-Bandit算法保持了紧密的遗憾上界。此外,本文开发了一种局部线性近似方法,即两步加权Lasso过程,用于在样本不是独立同分布的情况下,为G-MCP-Bandit算法识别最小最大凹形惩罚(MCP)估计器。在这一过程中,MCP估计器可以以高概率匹配到神谕估计器,并以最优的收敛速率收敛到真实参数。最后,通过基于合成数据和真实数据集的实验,本文展示了G-MCP-Bandit算法在累积遗憾方面优于其他基准算法,并且G-MCP-Bandit算法的优势随着数据的稀疏度和决策集的大小而增加。




Tips:机器学习在经济金融领域的应用”研讨会即将举办,欢迎对机器学习方法及其应用感兴趣的学者和学生报名!





为了我们不走散,学说请你加星标


疯狂暗示↓↓↓↓↓↓↓↓↓↓↓

学说平台
“学说”平台(www.51xueshuo.com)是清华大学孵化的专业知识传播平台,平台利用学术大数据和人工智能技术,通过学术直播、音视频分享和个性化推送,推动经济金融领域的学术交流和普惠,促进中国科技创新传播与最佳商业实践分享。
 最新文章