各位好!今日与大家分享一篇近期发表在Journal of pathology 上分析组织病理切片特征的免疫分型与免疫治疗预后相关性的研究~!虽然大家都知道目前常说的免疫微环境三大分型,但是具体在临床操作上、临床研究中,免疫分型怎样进行量化宽松?精准诊断才能可能推进精准干预,免疫亚型的病理特征到底该如何进行划分?
Automated tumor immunophenotyping predicts clinical benefit from anti-PD-L1 immunotherapy
Xiao Li, Jeffrey Eastham, Jennifer M Giltnane, Wei Zou, Andries Zijlstra, Evgeniy Tabatsky, Romain Banchereau, Ching-Wei Chang, Barzin Y Nabet, Namrata S Patil, Luciana Molinero, Steve Chui, Maureen Harryman, Shari Lau, Linda Rangell, Yannick Waumans, Mark Kockx, Darya Orlova, Hartmut Koeppen
Journal of Pathology 2024 March 26
Backgrounds: Cancer immunotherapy has transformed the clinical approach to patients with malignancies, as profound benefits can be seen in a subset of patients. To identify this subset, biomarker analyses increasingly focus on phenotypic and functional evaluation of the tumor microenvironment to determine if density, spatial distribution, and cellular composition of immune cell infiltrates can provide prognostic and/or predictive information. Attempts have been made to develop standardized methods to evaluate immune infiltrates in the routine assessment of certain tumor types; however, broad adoption of this approach in clinical decision-making is still missing.
背景:癌症免疫疗法改变了癌症患者的临床方法,因为在一小部分患者中可以看到深远的好处。为了识别这一亚群,生物标记物分析越来越重视对肿瘤微环境的表型和功能评估,以确定免疫细胞浸润物的密度、空间分布和细胞组成是否可以提供预后和/或预测信息。在某些肿瘤类型的常规评估中,已经尝试开发标准化的方法来评估免疫浸润物;然而,在临床决策中广泛采用这种方法仍然缺乏。
Methods: We developed approaches to categorize solid tumors into 'desert', 'excluded', and 'inflamed' types according to the spatial distribution of CD8+ immune effector cells to determine the prognostic and/or predictive implications of such labels. To overcome the limitations of this subjective approach, we incrementally developed four automated analysis pipelines of increasing granularity and complexity for density and pattern assessment of immune effector cells.
方法:本研究开发了一种方法,根据CD8+免疫效应细胞的空间分布将实体瘤分类为“沙漠”、“排除”和“炎症”类型,以确定这些标记的预后和/或预测意义。为了克服这种主观方法的局限性,我们逐步开发了四条自动化分析流程,这些流程的精细度和复杂性不断增加,用于免疫效应细胞的密度和模式评估。
Results: We show that categorization based on 'manual' observation is predictive for clinical benefit from anti-programmed death ligand 1 therapy in two large cohorts of patients with non-small cell lung cancer or triple-negative breast cancer. For the automated analysis we demonstrate that a combined approach outperforms individual pipelines and successfully relates spatial features to pathologist-based readouts and the patient's response to therapy.
结果:我们发现,在非小细胞肺癌或三阴性乳腺癌大样本队列患者中,基于“手动”观察的分类可以预测PD-L1治疗的临床益处。对于自动化分析,我们证明了组合方法的性能优于单独的流程,并成功地将空间特征与基于病理学家的读数和患者对治疗的反应联系起来。
Conclusions: Our findings suggest that tumor immunophenotype generated by automated analysis pipelines should be evaluated further as potential predictive biomarkers for cancer immunotherapy.
结论:本研究表明,由自动分析管道产生的肿瘤免疫表型应进一步作为癌症免疫治疗的潜在预测生物标志物进行评估。
1.首先大环境就是经济增速放缓,科技增速放缓,增长点都想往AI、免疫上靠。这算是天下大势吧~!基于影像学的人工智能已经为我们拉开了人工智能应用的序幕,病理角度复杂多变的形态学完全足以承载更广阔而深邃的医学研究。
10.1001/jamaoncol.2023.7263
10.1016/j.cell.2024.02.041
2.上细节:
首先研究纳入了多项RCT研究的组织病理切片(POPLAR 、OAK 、IMpassion130),并且对可获取的病理切片进行了人工智能自动化免疫表型评估和手动勾画免疫表型评估的对比。ps虽然引入了较多队列,但是没有规划明确临床场景,想解决的问题越多,引入的混杂因素反而越多。并且没有基于WSI,本身效力可能会打折扣。
其次方法上采用4种不同建模方式对自动化免疫分型进行了拟合。围绕CD8+T细胞入手的好处是临床作用相对明确,但仅针对该特征可能会忽略潜在的其他复杂免疫组分的作用。背景里形容的真不赖The vast number of known interactions between immune cells and an established tumor, together with the cellular complexity and plasticity of such an immune response, makes it difficult to identify a single parameter with sufficient predictive power.
再者研究更加注重提供一种自动化处理带组化病理切片的流程,为不同大小的样本都能进行相对可靠的免疫分型提供思路。为免疫治疗的病理评估提供了一种相对简便的思路。
10.1136/jitc-2023-008655corr1
3.这篇文章可以跟近期发表在Nature medicine的文章互相对比学习。有助于思考免疫肿瘤学的病理AI评估领域该如何开展后续的单个临床场景的探索性研究。真正的病理AI应用还有赖于更深刻的可解释性和机制探索。
10.1101/2024.03.25.586460
目录
1. INTRODUCTION
2. Materials and methods
2.1 Patient cohorts and histology slides
2.2 Immunohistochemistry
2.3 Immunophenotype assessment – manual approach
2.4 Immunophenotype assessment – automated approach
2.4.1 WSI segmentation and tiling
2.4.2 Pipeline #1 – density cut-off
2.4.1 Pipeline #2 – binned CD8 density
2.4.1 Pipeline #3 – randoM fOrest Classifier witH spAtial statistics (MOCHA)
2.4.1 Pipeline #4 – randoM fOrest Classifier witH spAtial statistics and BInned CD8 T-cell dEnsity (MOCHA-BITE)
3. Results
3.1 Manual tumor immunophenotype classification
3.2 Automated tumor immunophenotype classification
4. DISCUSSION
— 图表汇总—
2. Materials and methods
2.1 Patient cohorts and histology slides
2.2 Immunohistochemistry
2.3 Immunophenotype assessment – manual approach
Figure 1. Manual immunophenotype assessment: examples and association with treatment outcome.
(A) Representative immunophenotype examples of (i) desert, (ii) excluded, and (iii, iv) inflamed; intra-tumoral heterogeneity with areas of (v) desert and (vi) excluded in the same section. Shown are five cases of NSCLC stained for CD8 and panCK. 200x magnification; scale bar, 40 μm.
(B) Schematic of the three immunophenotype categories: desert with rare CD8+ effector cells (green), excluded with CD8+ cells in stroma surrounding a tumor cell nest, and inflamed with numerous CD8+ cells colocalizing with tumor cells (pink).
(C) Kaplan–Meier curves with log-rank p values for overall survival for OAK and IMpassion130 according to manually assigned immunophenotypes for patients receiving atezolizumab.
2.4 Immunophenotype assessment – automated approach
Figure 2A. Workflow and output of automated tumor immunophenotype classification pipelines.
(A) Input for all four pipelines are WSI with manually annotated tumor areas (excluding artifacts) and manual immunophenotype calls from POPLAR.
(B) CD8+ and CKpos pixels are automatically identified in the manually annotated tumor area. CD8 density measurements within CKpos regions are the only input for the slide-level density cut-off pipeline (#1). CD8 and CK regions are then incorporated in a tile-based analysis, with individual tile densities resulting in a multidimensional readout classified by a RF into one of three immunophenotype classes, binned CD8 density pipeline (#2).
(C) Data analysis workflow in the MOCHA pipeline (#3): based on 50 spatial features (e.g. colocalization of CD8 T cells and tumor cells) extracted for each tiled WSI, an RF classifier with manual immunophenotype call is trained in a supervised learning framework. See Methods for additional details. Parts of this figure (A) were created with BioRender.com
2.4.1 WSI segmentation and tiling
2.4.2 Pipeline #1 – density cut-off
2.4.1 Pipeline #2 – binned CD8 density
2.4.1 Pipeline #3 – randoM fOrest Classifier witH spAtial statistics (MOCHA)
Table S1. The 50 spatial features used in MOCHA and MOCHA-BITE pipelines.
Figure S1. Features extracted in the automated pipelines reasonably approximate the manually assigned tumor IP.
(A) MOCHA-BITE pipeline features, such as CD8-CK ratio in CKpos regions and Bhattacharyya coefficient (BC), represent the manual tumor IP well. Each dot corresponds to a unique patient sample. Manual tumor IP category is represented using color-code.
(B) Top five out of fifty total MOCHA-BITE features ranked according to their contribution towards the IP classes separation using POPLAR as a training data set. The feature ranking was done via permutation of the feature of interests. The permutation feature importance is the decrease in a model performance (e.g. classification accuracy) when a single feature value is randomly shuffled. The larger decrease of model performance drops, the more important the feature is.
(C) Model performance using only CD8-CK ratio in CKpos regions and BC as features (Sub-model) versus MOCHA and MOCHA-BITE.
2.4.1 Pipeline #4 – randoM fOrest Classifier witH spAtial statistics and BInned CD8 T-cell dEnsity (MOCHA-BITE)
3. Results
3.1 Manual tumor immunophenotype classification
Figure S2. OS log-rank tests for tumor IP subgroups, as classified by MOCHA-BITE and manual methods.
Kaplan–Meier curves for OS for the atezolizumab and control treatment arms for (A) OAK and (B) IMpassion130 with median survival times for the three IP categories as determined by MOCHA-BITE or manually.
Figure S3. Inter-observer concordance for manual IP calls between two pathologists (HK, MK).
Confusion matrix of concordance data for a large subset of OAK (n=634).
3.2 Automated tumor immunophenotype classification
Figure 3. Overlay of the manual tumor immunophenotype calls with the features generated by the respective automated analysis pipelines for POPLAR, OAK, and IMpassion130.
(A) Distribution of patient samples based on the averaged slide-level CD8 density readout in CKpos regions (pipeline #1); counts (y-axis) refer to the number of unique patient samples.
(B–D) UMAP plots showing overlays of manual immunophenotype calls with features generated by automated pipelines #2–4: binned CD8+ cell density in CKpos and CKneg tiles (pipeline #2; B); a set of spatial features derived from tile-based measurements (pipeline #3; C); a combination of spatial features with binned tile-based CD8 density (pipeline #4; D). Cases are color-coded based on manually assigned tumor immunophenotype categories.
Figure 4. Concordance between manually and algorithmically assigned tumor immunophenotype calls and association with treatment outcome.
(A) Confusion matrices showing the distribution of manually assigned tumor immunophenotype calls (rows) among the automated immunophenotype calls (columns) for the four automated pipelines for OAK and IMpassion130.
(B) Kaplan–Meier curves for overall survival for atezolizumab-containing treatment arms for OAK and IMpassion130 according to immunophenotype categories determined by MOCHA-BITE or manually (see supplementary material, Figures S2–S9 for progression-free survival curves).
(C) Performance of the four automated pipelines based on agreement of immunophenotype classification using manual immunophenotype calls as ground truth and overall survival/progression-free survival log-rank tests; listed for each pipeline are the classification macro-accuracy, AUC, and precision/recall as well as median overall survival and progression-free survival times for desert (D), excluded (E), and inflamed (I) immunophenotypes in descending order for the atezolizumab-containing and control arms. Blue and red values indicate statistically significant and non-significant differences, respectively, according to the log-rank test; separation of overall survival (or progression-free survival) Kaplan–Meier curves with the given immunophenotype categorization.
Figure S4. Performance of the unsupervised IP calls assignment approach.
(A) UMAP plots based on the MOCHA-BITE pipeline features. Cases are color-coded based on the k-means (k=3) clustering decision using TIL’s density feature value to assign the tumor IP to k-means cluster id.
(B) OS and PFS log-rank tests results for Desert (D), Excluded (E) and Inflamed (I) IP in descending order for the atezolizumab-containing and control arms. Blue and red values indicate statistically significant and non-significant differences, respectively according to the log-rank test, separation of OS (or PFS) Kaplan–Meier curves with the given IP categorization.
Figures S5. PFS log-rank tests for tumor IP subgroups, as classified by MOCHA-BITE and manual methods.
Kaplan–Meier curves for PFS for the atezolizumab and control treatment arms for (A) OAK and (B) IMpassion130 with median survival times for the three IP categories as determined by MOCHA-BITE or manually.
Figures S6. Kaplan–Meier curves for OS in atezolizumab and docetaxel arms in OAK.
Log-rank p-values are shown for tumor IP classes identified by the automated pipelines #1–3 and the manual method for (A) the atezolizumab arm and (B) the docetaxel arm.
Figures S7. Kaplan–Meier curves for PFS in atezolizumab and docetaxel arms in OAK.
Log-rank p-values are shown for tumor IP classes identified by the automated pipelines #1–3 and the manual method for (A) the atezolizumab arm and (B) the docetaxel arm.
Figures S8. Kaplan–Meier curves for OS in atezolizumab and placebo arms in IMpassion130.
Log-rank p-values are shown for tumor IP classes identified by the automated pipelines #1–3 and the manual method for (A) the atezolizumab arm and (B) the placebo arm.
Figures S9. Kaplan–Meier curves for PFS in atezolizumab and placebo arms in IMpassion130.
Log-rank p-values are shown for tumor IP classes identified by the automated pipelines #1–3 and the manual method for (A) the atezolizumab arm and (B) the placebo arm.
Figure 5. Association of immunophenotype with atezolizumab outcome in OAK and Impassion130.
(A) Forest plot analysis (upper portion) and Kaplan–Meier curves for patients in OAK with ‘inflamed’ versus ‘non-inflamed’ tumors based on manually assigned immunophenotype (left side) or immunophenotype based on MOCHA-BITE (right side).
(B) The same analysis as in (A) but for Impassion130. The vertical line of no effect in the forest plots indicates a hazard ratio of 1; shown are the number of patients in the control (CON) and treatment (TRT) groups for each immunophenotype category, median survival times (MST) with confidence intervals (CI), and p values.
4. DISCUSSION
Figure S10. OS log-rank tests for tumor IP subgroups, as classified by MIL and manual methods.
Kaplan–Meier curves for OS for the atezolizumab and control treatment arms for (A) OAK and (B) IMpassion130 with median survival times for the three IP categories as determined by MIL or manually.
Table S2. Classification performance of automated pipelines.
Performance of the five automated pipelines based on agreement of IP classification using manual IP calls as ground truth. Performance on POPLAR was assessed based on using randomly split 80% of POPLAR data for training and the remaining 20% of POPLAR data for testing. The optimized MIL model was then identified based on the performance on the POPLAR test set, in exploring various models with number of instances in [100, 200, 400, 1000, 2000] and batch_size in [1, 20] in increments of 5 and with/without color augmentation. MIL has a “NaN” Recall value in IMpassion130 since the model did not assign any of the samples in the test cohort to the Desert class.