1. Zhang S, Metaxas D. On the challenges and perspectives of foundation models for medical image analysis. Med Image Anal 2024;91:102996.
2. Wang X, Wang D, Li X, et al. Editorial for special issue on Foundation models for Medical Image Analysis. Med Image Anal 2025;100:103389.
3. Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. Advances in Neural Information Processing Systems, 2017.
4. Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv:2010.11929.
5. Radford A, Kim JW, Hallacy C, et al. Learning transferable visual models from natural language supervision. International conference on machine learning, PMLR 2021;8748-63.
6. Kirillov A, Mintun E, Ravi N, et al. Segment anything. Proceedings of the IEEE/CVF International Conference on Computer Vision 2023;4015-26.
7. Wu S, Koo M, Blum L, et al. Benchmarking open-source large language models, GPT-4 and Claude 2 on multiple-choice questions in nephrology. NEJM AI 2024;1:AIdbp2300092.
8. Deng Z, Shen Y, Kim H, et al. Foundation Models for General Medical AI. Second International Workshop, MedAGI 2024, Held in Conjunction with MICCAI 2024, Marrakesh, Morocco, October 6, 2024, Proceedings. Lecture Notes in Computer Science 15184, Springer 2025
9. Tu T, Azizi S, Driess D, et al. Towards generalist biomedical AI. NEJM AI 2024;1:AIoa2300138.
10. Zhao T, Gu Y, Yang J, et al. A foundation model for joint segmentation, detection and recognition of biomedical objects across nine modalities. Nat Methods 2024 Nov 18. (Epub Ahead of Print)
11. Zhang K, Zhou R, Adhikarla E, et al. A generalist vision–language foundation model for diverse biomedical tasks. Nat Med 2024;30:3129-41.
12. Zhou Y, Chia MA, Wagner SK, et al. A foundation model for generalizable disease detection from retinal images. Nature 2023;622:156-63.
13. Kim C, Gadgil SU, DeGrave AJ, et al. Transparent medical image AI via an image–text foundation model grounded in medical literature. Nat Med 2024;30:1-12.
14. Huang Z, Bianchi F, Yuksekgonul M, et al. A visual–language foundation model for pathology image analysis using medical twitter. Nat Med 2023;29: 2307-16.
15. Lu MY, Chen B, Williamson DFK, et al. A visual-language foundation model for computational pathology. Nat Med 2024;30: 863-74.
16. Chen RJ, Ding T, Lu MY, et al. Towards a general-purpose foundation model for computational pathology. Nat Med 2024;30:850-62.
17. Xu H, Usuyama N, Bagga J, et al. A whole-slide foundation model for digital pathology from real-world data. Nature 2024;630:181-8.
18. Huang W, Li C, Zhou HY, et al. Enhancing representation in radiography-reports foundation model: A granular alignment algorithm using masked contrastive learning. Nat Commun 2024;15: 7620.
19. Chen J, Mei J, Li X, et al. TransUNet: Rethinking the U-Net architecture design for medical image segmentation through the lens of transformers. Med Image Anal 2024;97:103280.
20. Liu J, Yang H, Zhou HY, et al. Swin-umamba: Mamba-based unet with imagenet-based pretraining. International Conference on Medical Image Computing and Computer-Assisted Intervention 2024;615-25.
21. Liu Z, Lin Y, Cao Y, et al. Swin transformer: Hierarchical vision transformer using shifted windows. Proceedings of the IEEE/CVF international conference on computer vision 2021;10012-10022.
22. Song M, Wang J, Yu Z, et al. PneumoLLM: Harnessing the power of large language model for pneumoconiosis diagnosis. Med Image Anal 2024;97:103248.
23. Gong S, Zhong Y, Ma W, et al. 3DSAM-adapter: Holistic adaptation of SAM from 2D to 3D for promptable tumor segmentation. Med Image Anal 2024;98:103324.
24. Chen C, Miao J, Wu D, et al. Ma-sam: Modality-agnostic sam adaptation for 3d medical image segmentation. Med Image Anal 2024;98:103310.
25. Paranjape JN, Nair NG, Sikder S, et al. Adaptivesam: Towards efficient tuning of sam for surgical scene segmentation. Annual Conference on Medical Image Understanding and Analysis. Cham: Springer Nature Switzerland, 2024: 187-201.
26. Peng L, Cai S, Wu Z, et al. MMGPL: Multimodal medical data analysis with graph prompt learning. Med Image Anal 2024;97:103225.
27. Zu W, Xie S, Zhao Q, et al. Embedded prompt tuning: Towards enhanced calibration of pretrained models for medical images. Med Image Anal 2024;97:103258.
28. Lu J, Yan F, Zhang X, et al. Pathotune: Adapting visual foundation model to pathological specialists. International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer Nature Switzerland, 2024:395-406.
29. Li W, Qu C, Chen X, et al. AbdomenAtlas: A large-scale, detailed-annotated, & multi-center dataset for efficient transfer learning and open algorithmic benchmarking. Med Image Anal 2024;97:103285.
30. Gu H, Colglazier R, Dong H, et al. SegmentAnyBone: A universal model that segments any bone at any location on MRI. arXiv:2401.12974.
31. Hu X, Gu L, Kobayashi K, et al. Interpretable medical image visual question answering via multi-modal relationship graph learning. Med Image Anal 2024;97:103279.
32. Holste G, Zhou Y, Wang S, et al. Towards long-tailed, multi-label disease classification from chest X-ray: Overview of the CXR-LT challenge. Med Image Anal 2024;97:103224.
33. Ikezogwo W, Seyfioglu S, Ghezloo F, et al. Quilt-1m: One million image-text pairs for histopathology. Advances in neural information processing systems, 2024.
34. Dippel J, Feulner B, Winterhoff T, et al. RudolfV: a foundation model by pathologists for pathologists. arXiv:2401.04079.
35. Lei W, Xu W, Li K, et al. Medlsam: Localize and segment anything model for 3d ct images. Med Image Anal 2025;99:103370.
36. Jiao J, Zhou J, Li X, et al. USFM: A universal ultrasound foundation model generalized to tasks and organs towards label efficient image analysis. Med Image Anal 2024;96:103202.
37. Kang Q, Lao Q, Gao J, et al. Deblurring masked image modeling for ultrasound image analysis. Med Image Anal 2024;97:103256.
38. Hua S, Yan F, Shen T, et al. Pathoduet: Foundation models for pathological slide analysis of H&E and IHC stains. Med Image Anal 2024;97:103289.
39. Chen RJ, Ding T, Lu MY, et al. Towards a general-purpose foundation model for computational pathology. Nat Med 2024;30:850-62.
40. Wang X, Zhao J, Marostica E, et al. A pathology foundation model for cancer diagnosis and prognosis prediction. Nature 2024;634:970-8.
41. Vorontsov E, Bozkurt A, Casson A, et al. A foundation model for clinical-grade computational pathology and rare cancers detection. Nat Med 2024;30:2924-35.
42. Xiang J, Wang X, Zhang X, et al. MUSK: a vision-language foundation model for precision oncology. Nature 2024, in press.
43. Xie Y, Gu L, Harada T, et al. Rethinking masked image modelling for medical image representation. Med Image Anal 2024;98:103304.
44. Li C, Huang W, Yang H, et al. Enhancing the vision-language foundation model with key semantic knowledge-emphasized report refinement. Med Image Anal 2024;97:103299.
45. Zhang T, Lin M, Guo H, et al. Incorporating Clinical Guidelines Through Adapting Multi-modal Large Language Model for Prostate Cancer PI-RADS Scoring. International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer Nature Switzerland, 2024:360-370.
46. Liu J, Zhang Y, Wang K, et al. Universal and extensible language-vision models for organ segmentation and tumor detection from abdominal computed tomography. Med Image Anal 2024;97:103226.
47. Kondepudi A, Pekmezci M, Hou X, et al. Foundation models for fast, label-free detection of glioma infiltration. Nature 2024 November 13. (Epub Ahead of Print)
48. Ma J, He Y, Li F, et al. Segment anything in medical images. Nat Commun 2024;15:654.
49. Dippel J, Prenißl N, Hense J, et al. AI-based anomaly detection for clinical-grade histopathological diagnostics. NEJM AI 2024;1:AIoa2400468.
50. Fang X, Lin Y, Zhang D, et al. Aligning Medical Images with General Knowledge from Large Language Models. International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer Nature Switzerland, 2024: 57-67.
51. Cox J, Liu P, Stolte S E, et al. BrainSegFounder: towards 3D foundation models for neuroimage segmentation. Med Image Anal 2024;97:103301.
52. Qiu J, Wu J, Wei H, et al. Development and validation of a multimodal multitask vision foundation model for generalist ophthalmic artificial intelligence. NEJM AI 2024;1: AIoa2300221.
53. Huang W, Li C, Zhou HY, et al. Enhancing representation in radiography-reports foundation model: A granular alignment algorithm using masked contrastive learning. Nat Commun 2024;15:7620.
54. Lu MY, Chen B, Williamson DFK, et al. A multimodal generative AI copilot for human pathology. Nature 2024;634:466-73.
55. Christensen M, Vukadinovic M, Yuan N, et al. Vision–language foundation model for echocardiogram interpretation. Nat Med 2024;30:1481–8.
56. Qiu P, Wu C, Zhang X, et al. Towards building multilingual language model for medicine. Nat Commun 2024;15:8384.
57. Wan Z, Liu C, Zhang M, et al. Med-unic: Unifying cross-lingual medical vision-language pre-training by diminishing bias. Advances in Neural Information Processing Systems, 2024.
58. Ong JCL, Chang SYH, William W, et al. Medical ethics of large language models in medicine. NEJM AI 2024;1:AIra2400038.
59. OpenAI. Learning to Reason with LLMs. https://openai.com/index/learning-to-reason-with-llms/.
60. Wang X, Zhang X, Wang G, et al. OpenMEDLab: An open-source platform for multi-modality foundation models in medicine. arXiv:2402.18028.