在线学术报告 | 骆威研究员:通过数据增强确定聚类的数量

学术   教育   2024-11-16 07:03   广东  


  

  


摘要

Determining the number of clusters is crucial for the successful application of clustering. In this paper, we propose a new order-determination method called the data augmentation estimator (DAE), for the general model-based clustering. The estimator is based on a novel idea that augments data with an independently generated small cluster, which enables us to justify how the instability of clustering changes with the number of clusters assumed in clustering. The pattern of instability provides an alternative characterization of the true number of clusters to the commonly used goodness-of-fit measure. By combining the two sources of information appropriately, the proposed estimator reaches asymptotic consistency under general conditions and is easily implementable. It is also more efficient than the conventional BIC-type approaches that use the goodness-of-fit measure only. These properties are illustrated by the simulation studies and real data examples at the end.

嘉宾介绍

骆威于2014年毕业于美国宾夕法尼亚州立大学,之后任职于美国Baruch College,于2018年加入浙江大学。骆威的研究方向包括充分降维和因果推断,在Annals of Statistics, Biometrika, JRSSB, JMLR等统计和机器学习国际学术期刊上发表了多篇论文,目前主持国家优秀青年科学基金项目。


狗熊会线上学术报告厅向数据科学及相关领域的学者及从业者开放,非常期待各位熊粉报名或推荐报告人。相关事宜,请联系:常莹,ying.chang@clubear.org


数据分析从入门到精通,狗熊学习卡助您一臂之力!69元/年,狗熊会所有视频课程无限看,代码轻松学。欢迎小伙伴们扫码购入~



狗熊会
狗熊会,统计学第二课堂!传播统计学知识,培养统计学人才,推动统计学在产业中的应用!
 最新文章