基于术前CT的肺段切除AI预后模型——值得信赖的模型将在何时出现?

文摘   科学   2024-04-19 20:54   北京  


点击上方“知识城邦”关注我们吧!
前言

各位好!今日与大家分享一篇近期发表在Radiology上分析亚肺叶切除术影像组学的文章。诸葛亮5次北伐,次次有进步,但是我们自己读文章怎么就做不到次次有进步呢?!明明很多点是不是都知道,但真正做的时候没能做到极致?早期肺癌在JCOG0804之后又有哪些可以破局的点?一起来学习下!


本 文 约3989字 多图预警

 


认真阅读 需 要 5-10 min


Clinical Utility of a CT-based AI Prognostic Model for Segmentectomy in Non–Small Cell Lung Cancer

Kwon Joong Na, Young Tae Kim, Jin Mo Goo, Hyungjin Kim 

Radiology 16 Apr 2024

Background: Currently, no tool exists for risk stratification in patients undergoing segmentectomy for non–small cell lung cancer (NSCLC).

背景:前对于接受肺段切除术的非小细胞肺癌尚无风险分层的工具。

Purpose: To develop and validate a deep learning (DL) prognostic model using preoperative CT scans and clinical and radiologic information for risk stratification in patients with clinical stage IA NSCLC undergoing segmentectomy.

目的:对接受肺段切除术的IA期NSCLC,基于术前CT扫描和临床、影像学信息开发并验证深度学习预后模型。

Materials and MethodsIn this single-center retrospective study, transfer learning of a pretrained model was performed for survival prediction in patients with clinical stage IA NSCLC who underwent lobectomy from January 2008 to March 2017. The internal set was divided into training, validation, and testing sets based on the assignments from the pretraining set. The model was tested on an independent test set of patients with clinical stage IA NSCLC who underwent segmentectomy from January 2010 to December 2017. Its prognostic performance was analyzed using the time-dependent area under the receiver operating characteristic curve (AUC), sensitivity, and specificity for freedom from recurrence (FFR) at 2 and 4 years and lung cancer–specific survival and overall survival at 4 and 6 years. The model sensitivity and specificity were compared with those of the Japan Clinical Oncology Group (JCOG) eligibility criteria for sublobar resection.

法:本研究是一项单中心回顾性研究,基于2008年1月至2017年3月期间接受肺叶切除术的IA期患者CT构建的训练模型进行迁移学习。根据训练前集合中的任务,内部集合被分为训练集、验证集和测试集。该模型在2010年1月至2017年12月接受肺切除术的临床IA期NSCLC患者的独立测试集上进行了测试。使用受试者工作特征曲线下的时间依赖面积(AUC)、2年和4年无复发生存(FFR)的敏感度和特异度以及4年和6年的肺癌特异性生存率和总生存率来分析其预后表现。模型的敏感度和特异度与日本临床肿瘤学会(JCOG)亚肺叶切除的纳入排除标准进行了比较。

Results: The pretraining set included 1756 patients. Transfer learning was performed in an internal set of 730 patients (median age, 63 years [IQR, 56–70 years]; 366 male), and the segmentectomy test set included 222 patients (median age, 65 years [IQR, 58–71 years]; 114 male). The model performance for 2-year FFR was as follows: AUC, 0.86 (95% CI: 0.76, 0.96); sensitivity, 87.4% (7.17 of 8.21 patients; 95% CI: 59.4, 100); and specificity, 66.7% (136 of 204 patients; 95% CI: 60.2, 72.8). The model showed higher sensitivity for FFR than the JCOG criteria (87.4% vs 37.6% [3.08 of 8.21 patients], P = .02), with similar specificity.

结果:预训练集包括1756名患者。在730名患者(中位年龄63岁[四分位间距,56-70岁];366名男性)和节段切除测试组包括222名患者(中位年龄65岁[四分位间距,58-71岁];114名男性)中进行转移学习。2年FFR的模型表现如下:AUC,0.86(95 %CI:0.76,0.96);敏感性,87.4%(8.21例患者中7.17例;95% CI:59.4,100例);特异度66.7%(204例患者中136例;95% CI:60.2,72.8例)。该模型对FFR的敏感性高于JCOG标准(87.4% vs 37.6%[8.21例患者中的3.08例],P=0.02),具有相似的特异性。

Conclusions: The CT-based DL model identified patients at high risk among those with clinical stage IA NSCLC who underwent segmentectomy, outperforming the JCOG criteria.

论:基于CT的DL模型在接受肺段切除的临床IA期非小细胞肺癌患者中确定了高危患者,表现优于JCOG标准。

Keywords:  Lung Cancer; surgerycomplete resectionresidual disease


Summary: A CT-based deep learning model identified individuals at high risk among patients with clinical stage IA non–small cell lung cancer who underwent segmentectomy.

Key Results

■ In this retrospective study of 222 patients who underwent segmentectomy for clinical stage IA non–small cell lung cancer, a CT-based deep learning (DL) model showed a time-dependent area under the receiver operating characteristic curve of 0.86, sensitivity of 87.4%, and specificity of 66.7% for recurrence within 2 years.

■ The model showed higher sensitivity for 2-year recurrence than the Japan Clinical Oncology Group eligibility criteria for sublobar resection (87.4% vs 37.6%, P = .02).



学习笔记

1.首先这里先对使用了Deep learning或machine learning的临床预测模型类文章进行下知识更新。TRIPOD AI早就立项了,虽迟但到,列出来的细节对于相关领域研究的大有脾益。


doi: 10.1136/bmj-2023-078378 


2.上细节:

首先文章设计目标较为先进,就抓住亚肺叶目前指征和预后分层上的问题。目前大家都知道JCOG在入组测量实性成分百分比上有潜在的偏倚(不同测量者结果差异较大),临床和学术界都需要更稳健更精准的术前Biomarker。此文基于肺叶切除术的IA期数据模型进行迁移学习,这样做的好处是能够尝试在肺叶队列中发现高危/保护性影像组学特征,再在肺段队列中进行验证

其次这篇文章从影像学方法学进行了系统探索,首尔大学医学院长期在肺癌队列的付出可见一斑。但筛选出来的模型分层效果显然并不强。用我一个大哥的话说,Table 4中的DL risk score 都不敢上二分类。而且模型的肿瘤大小都不敢放三分类(T1a/T1b/T1c)。此外按照Radiology的品味,文章仍然缺乏深入的探索(譬如此文附件里未呈现常规的高危病理特征)。

最后作为一项有外科医生参与的paper,没有记录200多例验证集的肺段的详细位置和切除范围,也没进行后续深入的亚组/敏感性分析。这显然不符合韩国压抑剥削的风格啊?学理上虽然CT scans contain more data beyond tumor dimension, density, and location. 究竟什么是clinical meaningful factors?需要更细致、更大规模数据的披露。



3. 最后分享个有意思的理论「耶基斯–多德森定律」——不要过度准备,先学会做一个垃圾出来,太用力的人走不远。无论你手头正在做哪些伟大的多组学项目,持之以恒的渐进式比毕其功于一役的成败式可能更有益。

动机强度与工作效率之间的关系不是线性的,而是呈倒 U 状折线

不同难度的任务中,也并不是动机越高工作效率就越高动机过高会导致焦虑、延迟行动等问题,进而影响效率和解决问题的能力,而过低的动机又不足以让人行动起来。

实验表明:

越容易做的事,动机保持较高水平效率最高

中等难度的事,动机保持在中等水平效率最高

高难度的事,动机保持在相对低的水平效率最高,因为过高的动机会导致焦虑等负面情绪。


https://x.com/Aiims1742/status/1769937917927911789






目录

1. INTRODUCTION

2. Materials and Methods (Figure 1)

    2.1 Study Patients (Appendix S1)

    2.2 Model Pretraining and Transfer Learning

    2.3 Study Outcomes

    2.4 Statistical Analysis (Appendix S1)

3. Results

    3.1 Study Patients (Appendix S2 and Tables 1 and S1-2)(Figure 2)

    3.2 Prognostication Using the DL-driven Risk Scores (Table 2Fig S1)(Figures 3 and S2)

    3.3 Benchmarking of the DL Model against the Randomized Clinical Trial Eligibility Criteria (Table 3)

    3.4 Subgroup Analyses in the Randomized Clinical Trial–eligible Patients (Figure 4)(Table S3)

    3.5 Multivariable Cox Regression Analyses (Table 4, S4)(Figure 5)

    3.6 Multivariable Cox Regression Analyses in Patients with Adenocarcinoma (Appendix S2Tables S5, S6)

4. DISCUSSION



 图表汇总

2. Materials and Methods

Figure 1: Schematic shows the overall study design.

The pretraining set included patients with non–small cell lung cancer with any tumor size and lymph node involvement but without metastasis, confirmed at pathologic examination (pTanyNanyM0).

LN = lymph node, LVI = lymphovascular invasion, 3D = three-dimensional, VPI = visceral pleural invasion.




3. Results

    3.1 Study Patients

Appendix S2 and 


Table 1.


Table S1. Patient and Tumor Characteristics of the Pretraining Set 

Note.—Data are numbers of patients with percentages in parentheses, unless otherwise specified. Data were missing for some variables, as follows: history of malignancy other than lung cancer in 0.2% (4/1756); family history of lung cancer in 0.2% (3/1756); and smoking status in 0.1% (2/1756). NA = not available. 

* Data are medians with IQRs in parentheses. 

† Data in brackets are ranges. 

‡ Pathologic stage was determined according to the seventh edition staging system. 


Table S2. Survival Rates According to the Study Outcomes


Figure 2: Flowcharts show patient inclusion and exclusion for the (A) pretraining and internal set and (B) independent segmentectomy test set.

The pretraining set (A) (including patients with any tumor size and lymph node involvement, but without metastasis, confirmed at pathologic examination [pTanyNanyM0]) was split randomly for training, validation, and testing at a ratio of 6:2:2. Transfer learning was applied to a subset of the pretraining set, specifically in patients with clinical stage IA non–small cell lung cancer (ie, the internal set). The internal set was divided based on the assignments from the pretraining set, such that the ratio of the internal training set (ie, for transfer learning) to the internal validation set to the internal testing set was 6:1.8:2.2. FEV1 = forced expiratory volume in first second of exhalation.




    3.2 Prognostication Using the DL-driven Risk Scores

Table 2


Fig S1: Prognostication performances of the deep learning-driven risk scores. Time-dependent receiver operating characteristic curves for (A) overall survival in the internal test set, (B) freedom from recurrence, (C) lung cancer–specific survival, and (D) overall survival in the independent segmentectomy test set.





Figures 3: Kaplan-Meier survival curves stratified according to the dichotomized deep learning (DL)–driven risk scores show (A) overall survival (OS) in the internal test set using the DL-driven 4-year risk scores and (B) freedom from recurrence, (C) lung cancer–specific survival, and (D) OS in the segmentectomy test set using the DL-driven 2-year, 4-year, and 6-year risk scores.

The cutoffs were determined empirically as the median values in the internal validation set, which were 1.36% for the DL-driven 2-year risk score and 4.36% for the 4-year risk score. The cutoffs remained unchanged regardless of the study outcome.


Fig S2: Kaplan-Meier survival curves stratified by the dichotomized DL-driven risk scores. (A) Overall survival in the internal test set using the DL-driven 6-year risk scores. (B) Freedom from recurrence, (C) lung cancer–specific survival, and (D) overall survival in the segmentectomy test set using the DL-driven 4-year, 6-year, and 6-year risk scores, respectively.

The cutoffs were determined empirically as the median values in the internal validation set, which were 4.36% for the DL-driven 4-year risk score and 7.74% for the 6-year risk score. The cutoffs remained unchanged regardless of the study outcome. In the segmentectomy test set, the same patients were consistently classified into the DL-based low-risk group across different time points (ie, 2-year, 4-year, and 6-year). Therefore, figure components (B–D) are identical to those in Figure 3. DL = deep learning.


    3.3 Benchmarking of the DL Model against the Randomized Clinical Trial Eligibility Criteria

Table 3


    3.4 Subgroup Analyses in the Randomized Clinical Trial–eligible Patients

Figure 4: Kaplan-Meier survival curves according to the dichotomized deep learning (DL)–driven risk scores in segmentectomy subgroups of patients who met clinical trial eligibility. (A–C) Graphs show freedom from recurrence (FFR) (A), lung cancer–specific survival (LCSS) (B), and overall survival (OS) (C) in patients eligible for the Cancer and Leukemia Group B 140503 trial. (D–F) Graphs show FFR (D), LCSS (E), and OS (F) in patients eligible for the Japan Clinical Oncology Group (JCOG) trials (JCOG0802, JCOG1211, and JCOG0804).

The DL-driven 2-year risk score was used for FFR, and the 4-year risk score was used for LCSS and OS. The cutoffs were determined empirically as the median values in the internal validation set, which were 1.36% for the DL-driven 2-year risk score and 4.36% for the 4-year risk score. The cutoffs were not altered according to the study outcomes. In the segmentectomy test set, the same patients were consistently classified into the DL-based low-risk group across different time points (2, 4, and 6 years).


Table S3. Subgroup Analyses in the Randomized Clinical Trial-eligible Patients

Note.—FFR = freedom from recurrence, LCSS = lung cancer-specific survival, NA = not available, OS = overall survival. 

* Log-rank test was done. P values were adjusted for multiple comparisons across time and different outcomes. 



    3.5 Multivariable Cox Regression Analyses

Table 4


Table S4. Multivariable Cox Regression Analyses for the Independent Segmentectomy Test Set Using the DL-driven 4-year and 6-year Risk Scores


Figure 5: Representative CT images with heat map visualization. From left to right: Axial nonenhanced CT images show a preoperative scan with overlaid gradient-weighted activation maps for visceral pleural invasion, lymphovascular invasion, lymph node, and survival prediction, respectively. 

(A) Images in an 83-year-old male patient with clinical stage IA3 adenocarcinoma. The deep learning (DL)–driven 2-year risk score was 3.65% and the 4-year risk score was 10.2%. The tumor recurred 36.5 months after surgery. 

(B) Images in a 79-year-old female patient with clinical stage IA2 adenocarcinoma. The DL-driven 2-year risk score was 8.60% and the 4-year risk score was 20.2%. Tumor recurrence was observed 25.9 months after surgery. 

(C) Images in a 71-year-old male patient with clinical stage IA2 adenocarcinoma. The DL-driven 2-year risk score was 0.16% and the 4-year risk score was 0.74%. There was no evidence of disease recurrence until 60.8 months of postoperative follow-up. 

(D) Images in a 76-year-old female patient with clinical stage IA3 adenocarcinoma. The DL-driven 2-year risk score was 0.63% and the 4-year risk score was 2.28%. No recurrence was noted at a 22-month follow-up visit. The DL model predicted the cumulative overall survival probability in patients with clinical stage IA lung cancer, and the prediction was enhanced by the multitask learning of CT features for visceral pleural invasion, lymphovascular invasion, and lymph node metastasis. The color bar transitions from dark blue to dark red, indicating pixel activation ranging from a low to high degree on the heat maps.


    3.6 Multivariable Cox Regression Analyses in Patients with Adenocarcinoma

Appendix S2


Tables S5. Multivariable Cox Regression Analyses in Adenocarcinomas of the Independent Segmentectomy Test Set 


Tables S6. Multivariable Cox Regression Analyses in Adenocarcinomas of the Independent Segmentectomy Test Set Using the DL-driven 4-year and 6-year Risk Scores 





龙马精神~




知识城邦
每周一篇最新最有价值的医学学术文献。 关注知识城邦,我们一起在了解疾病本质,改善患者预后的道路上前进。(关注领域:恶性肿瘤、胸部肿瘤、肿瘤综合治疗、肿瘤微环境、围术期并发症、麻醉、意识、心理、疼痛、重要器官保护、康复、大数据、人工智能等)
 最新文章