作者介绍禹舜,中央财经大学数理统计专业硕博连读,师从杨玥含教授和盖玉洁教授。主要研究方向包括具有非稀疏、相关性结构的高维线性模型的建模研究等,相关成果发表在Statistical Methods in Medical Research期刊上。今天要跟大家分享的论文是非稀疏回归模型的结构化迭代划分方法及其在生物数据分析中的应用,原论文为:Shun Yu and Yuehan Yang, A structured iterative division approach for non-sparse regression models and applications in biological data analysis. Statistical Methods in Medical Research. 2024;33(7):1233-1248.
[1] Aoshima, M. and Yata, K. (2019). High-dimensional quadratic classifiers in non-sparse settings. Methodology and computing in applied probability, 21:663–682.[2] Belloni, A., Chernozhukov, V., and Hansen, C. (2014). Inference on treatment effects after selection among high-dimensional controls. Review of Economic Studies, 81(2):608–650.[3] Biswas, A., Chakraborty, S., and Baruah, V. J. (2022). Estimation of the proportion of true null hypotheses under sparse dependence: Adaptive fdr controlling in microarray data. Statistical Methods in Medical Research, 31(5):917–927.[4] Boyle, E. A., Li, Y. I., and Pritchard, J. K. (2017). An expanded view of complex traits: from polygenic to omnigenic. Cell, 169(7):1177–1186.[5] Bradic, J., Fan, J., and Zhu, Y. (2022). Testability of high-dimensional linear models with nonsparse structures. Annals of statistics, 50(2):615.[6] Cai, T. T. and Guo, Z. (2017). Confidence intervals for high-dimensional linear regression: Minimax rates and adaptivity. The Annals of Statistics, 45(2):615 – 646.[7] Chen, X. and Yang, Y. (2023). Local linear approximation with laplacian smoothing penalty and application in biology. Statistical Methods in Medical Research, 32(6):1145–1158.[8] Cheng, S. H., Horng, C.-F., West, M., Huang, E., Pittman, J., Tsou, M.-H., Dressman, H., Chen, C.-M., Tsai, S. Y., Jian, J. J., et al. (2006). Genomic prediction of locoregional recurrence after mastectomy in breast cancer. Journal of Clinical Oncology, 24(28):4594–4602.[9] Cheng, S. H.-C., Huang, T.-T., Cheng, Y.-H., Tan, T. B. K., Horng, C.-F., Wang, Y. A., Brian, N. S., Shih, L.-S., and Yu, B.-L. (2017). Validation of the 18-gene classifier as a prognostic biomarker of distant metastasis in breast cancer. Plos One, 12(9):e0184372.[10] Chesler, E. J., Lu, L., Shou, S., Qu, Y., Gu, J., Wang, J., Hsu, H. C., Mountz, J. D., Baldwin, N. E., Langston, M. A., et al. (2005). Complex trait analysis of gene expression uncovers polygenic and pleiotropic networks that modulate nervous system function. Nature genetics, 37(3):233–242.[11] Consortium, G., Ardlie, K. G., Deluca, D. S., Segr`e, A. V., Sullivan, T. J., Young, T. R., Gelfand, E. T., Trowbridge, C. A., Maller, J. B., Tukiainen, T., et al. (2015). The genotype-tissue expression (gtex) pilot analysis: multitissue gene regulation in humans. Science, 348(6235):648–660.[12] Fan, J., Li, R., Zhang, C.-H., and Zou, H. (2020). Statistical foundations of data science. CRC press.[13] Fan, J. and Lv, J. C. (2008). Sure independence screening for ultrahigh dimensional feature space. Journal of the Royal Statistical Society: Series B, 70(5):849–911.[14] Giannone, D., Lenza, M., and Primiceri, G. E. (2021). Economic predictions with big data: The illusion of sparsity. Econometrica, 89(5):2409–2437.[15] He, Y., Jaidee, S., and Gao, J. (2023). Most powerful test against a sequence of high dimensional local alternatives. Journal of Econometrics, 234(1):151–177.[16] Huang, J., Morehouse, C., Streicher, K., Higgs, B. W., Gao, J., Czapiga, M., Boutrin, A., Zhu, W., Brohawn, P., Chang, Y., et al. (2011). Altered expression of insulin receptor isoforms in breast cancer. PloS one, 6(10):e26177.[17] Kristensen, L. S., Jakobsen, T., Hager, H., and Kjems, J. (2022). The emerging roles of circrnas in cancer and oncology. Nature reviews Clinical oncology, 19(3):188–206.[18] McKhann, G. M., Knopman, D. S., Chertkow, H., Hyman, B. T., Jack Jr, C. R., Kawas, C. H., Klunk, W. E., Koroshetz, W. J., Manly, J. J., Mayeux, R., et al. (2011). The diagnosis of dementia due to alzheimer’s disease: Recommendations from the national institute on aging-alzheimer’s association workgroups on diagnostic guidelines for alzheimer’s disease. Alzheimer’s & dementia, 7(3):263–269.[19] Mungas, D. (1991). In-office mental status testing: a practical guide. Geriatrics, 46(7). Nagele, E., Han, M., DeMarshall, C., Belinka, B., and Nagele, R. (2011). Diagnosis of alzheimer’s disease based on disease-specific autoantibody profiles in human sera. PloS one, 6(8):e23112.[20] Negahban, S., Ravikumar, P., Wainwright, M. J., and Yu, B. (2012). A unified framework for high-dimensional analysis of m-estimators with decomposable regularizers. Statistical Science, 27(4):1348–1356.[21] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B, 58(1):267–288.[22] Tseng, P. (2001). Convergence of a block coordinate descent method for nondifferentiable minimization. Journal of Optimization Theory and Applications, 109(3):475 – 494.[23] Tsuda, T. and Imaizumi, M. (2023). Benign overfitting of non-sparse high-dimensional linear regression with correlated noise. arXiv preprint arXiv:2304.04037.[24] Van Dam, S., Vosa, U., van der Graaf, A., Franke, L., and de Magalhaes, J. P. (2018). Gene co-expression analysis for functional classification and gene–disease predictions. Briefings in bioinformatics, 19(4):575–592.[25] Xiao, W., Zhang, G., Chen, B., Chen, X., Wen, L., Lai, J., Li, X., Li, M., Liu, H., Liu, J., et al. (2021). Characterization of frequently mutated cancer genes and tumor mutation burden in chinese breast cancer. Frontiers in Oncology, 11:618767.[26] Ye, L., Guo, L., He, Z., Wang, X., Lin, C., Zhang, X., Wu, S., Bao, Y., Yang, Q., Song, L., et al. (2016). Upregulation of e2f8 promotes cell proliferation and tumorigenicity in breast cancer by modulating g1/s phase transition. Oncotarget, 7(17):23757.[27] Yu, S. and Yang, Y. (2023). An iterative algorithm for high-dimensional linear models with both sparse and non-sparse structures. arXiv preprint arXiv:2311.05339.[28] Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society Series B: Statistical Methodology, 68(1):49–67.[29] Zhang, L. and Lin, X. (2013). Some considerations of classification for high dimension low-sample size data. Statistical methods in medical research, 22(5):537–550.[30] Zhao, B. and Zou, F. (2022). On polygenic risk scores for complex traits prediction. Biometrics, 78(2):499–511.[31] Zhao, J., Zhou, Y., and Liu, Y. (2023). Estimation of linear functionals in highdimensional linear models: From sparsity to nonsparsity. Journal of the American Statistical Association, 0(0):1–13.[32] Zheng, Z., Lv, J., and Lin, W. (2021). Nonsparse learning with latent variables. Operations Research, 69(1):346–359.[33] Zhu, Y. and Bradic, J. (2016). Two-sample testing in non-sparse high-dimensional linear models. arXiv preprint arXiv:1610.04580.[34] Zhu, Y. and Bradic, J. (2018). Significance testing in non-sparse high-dimensional linear models. Electronic Journal of Statistics, 12(2):3312 – 3364.[35] Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American statistical association, 101(476):1418–1429.[36] Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society Series B: Statistical Methodology, 67(2):301– 320.