各位好!今日斗胆向大家分享一篇Cell主刊的文章,文章向大家介绍了3D病理扫描样本的弱监督学习技术原理及应用场景。虽然是一篇偏向于技术性的文章,但是请不要轻易划走。因为Nature主刊上也有这方面相关的文献。一起来互相参看学习下!
世界上只有一种真正的英雄主义,那就是在认清生活的真相后依然热爱生活。就像看不太懂文献,但依然热爱看文献。。。。
Analysis of 3D pathology samples using weakly supervised AI
Andrew H. Song 1 2 3 4, Mane Williams 1 3 4 5 18, Drew F.K. Williamson 1 2 3 4 18, Sarah S.L. Chow 6, Guillaume Jaume 1 2 3 4, Gan Gao 6, Andrew Zhang 1 3 4 7, Bowen Chen 1 2 3 4, Alexander S. Baras 8 9, Robert Serafin 6, Richard Colling 10 11, Michelle R. Downes 12, Xavier Farré 13, Peter Humphrey 14, Clare Verrill 10 11 15, Lawrence D. True 16, Anil V. Parwani 17, Jonathan T.C. Liu 6 19, Faisal Mahmood 1 2 3 4 19 20
Cell 9 May 2024
Human tissue, which is inherently three-dimensional (3D), is traditionally examined through standard-of-care histopathology as limited two-dimensional (2D) cross-sections that can insufficiently represent the tissue due to sampling bias. To holistically characterize histomorphology, 3D imaging modalities have been developed, but clinical translation is hampered by complex manual evaluation and lack of computational platforms to distill clinical insights from large, high-resolution datasets. We present TriPath, a deep-learning platform for processing tissue volumes and efficiently predicting clinical outcomes based on 3D morphological features. Recurrence risk-stratification models were trained on prostate cancer specimens imaged with open-top light-sheet microscopy or microcomputed tomography. By comprehensively capturing 3D morphologies, 3D volume-based prognostication achieves superior performance to traditional 2D slice-based approaches, including clinical/histopathological baselines from six certified genitourinary pathologists. Incorporating greater tissue volume improves prognostic performance and mitigates risk prediction variability from sampling bias, further emphasizing the value of capturing larger extents of heterogeneous morphology.
人体组织本质上是三维的(3D),传统上通过标准组织病理学检查为有限的二维(2D)横断面,由于采样偏差,无法充分代表组织。为了全面描述组织形态,既往已经开发了3D成像方法,但由于复杂的手动评估和缺乏从大型高分辨率数据集中提取临床信息的计算平台,临床应用受到阻碍。我们提出了TriPath,一个基于3D形态特征处理组织体积和有效预测临床结果的深度学习平台。对前列腺癌标本进行复发风险分层模型的训练,这些标本用开顶式光片显微镜或微型计算机断层扫描进行成像。通过全面捕获3D形态,基于3D体积的预测实现了优于传统的基于2D切片的方法,包括来自六个经认证的泌尿生殖系统病理学家的临床/组织病理学基线。纳入更大的组织体积可以改善预后表现,减少抽样偏差带来的风险预测变异性,进一步强调捕捉更大范围的异质形态的价值。
Highlights
TriPath is a 3D pathology deep learning platform for clinical endpoint prediction
Patient prognostication with 3D tissue volume outperforms 2D slice-based approaches
3D prognostication outperforms pathologist baselines, suggesting its clinical potential
Larger tissue volume mitigates sampling bias and accounts for tissue heterogeneity
1.首先由于技术方面的咱也不是特别懂,不在此献丑。就努力看下应用场景。十分推荐这个视频,感兴趣的小伙伴一定小本本拿出来准备学习。
https://www.youtube.com/watch?v=rcNCHQnK454&ab_channel=TIAWarwick
2.视频了解下后可以再看看同期上配发的胰腺癌相关的研究。当然如果你没空看,看看他们配的小视频也不错。
毕竟多数预后模型需要像素级注释或切片级注释,该研究的特点在于TriPath主要处理患者级别的标签(临床终点),不需要临床医生手动注释。(ps某大佬不是说在搞手术视频的标注么?有没有智能标注来解决下)
研究将若监督学习模型与病理医生进行了Gleason grading的对比,结果表明人类很难分析一大堆 2D 切片(每次活检的切片数量增加 100×)并保留基本信息,尤其是在没有现有解释 3D 病理学的指南的情况下(没错,人脑对三视图的三维重建计算是有极限的)。
3. 最后的最后,我们再来点技术性文档参考学习!
https://www.nature.com/articles/s41551-020-00681-x
目录
1. INTRODUCTION (Figure 1)
2. Methods
2.1 Key resources table
2.2 Resource availability
2.2.1 Lead contact
2.2.2 Materials availability
2.2.3 Data and code availability
2.2.4 Experimental model and subject details
2.2.5 Human participants
2.2.6 Method details
2.3 Patient cohorts
2.4 Data acquisition
Simulation data
MicroCT
OTLS
2.5 Volumetric image preprocessing
Volume segmentation
3D patching & 2D patching
2.6 Clinical validation
OTLS cohort
MicroCT cohort
2.7 Model architecture
Feature encoder choice
Feature encoding step
Aggregation module
Classification module
Training & evaluation
2.7 Integrated gradients interpretability analysis
2.8 Cross-modal analysis
2.9 Visualization
False-coloring the raw input
Integrated gradients heatmap
2.10 Quantification and statistical analysis
2.11 Computational hardware and software
2.12 Additional resources
3. Results
3.1 Computational platform for weakly supervised analysis of 3D pathology samples (Figures 1)(Tables S1 and S2)
3.2 Validation with simulated 3D data (Figure S1-2)
3.3 Evaluation on the OTLS cohort (Figures 2)(Figures S3-5)
3.4 Evaluation on the microCT cohort (Figure 3)(Figures S6-7)
3.5 Comparison with clinical baselines (Figures 4)(Figure S8)
3.6 Mitigation of sampling bias with 3D volume analysis (Figure 5-6)
3.7 Cross-modal evaluation between the OTLS and microCT cohorts (Figures 7)
4. DISCUSSION
— 图表汇总—
3.1 Computational platform for weakly supervised analysis of 3D pathology samples
Figures 1. TriPath computational workflow
(A) The 3D imaging modalities can capture high-resolution volumetric images of tissue specimens.
(B) TriPath accepts raw volumetric tissue images from diverse imaging modalities as inputs. TriPath first separates the volumetric image of tissue from the background. In a common version of the workflow, the segmented volume is then treated as a stack of cuboids (3D planes) and further tessellated into smaller 3D patches (instances). Alternatively, the segmented volume can be treated as a stack of 2D planes and tessellated into smaller 2D patches.
(C) The patches are processed with a pretrained feature encoder network of choice, e.g., 3D convolutional neural network (CNN) or 3D vision transformer, leveraging transfer learning to produce a set of compact and representative features. Feature encoding with 3D CNN is illustrated in the figure. The encoded features are compressed with a domain-adapted shallow, fully connected network. Next, an aggregator module aggregates the set of features representing all instances, automatically weighting them according to their importance toward contributing to a volume-level feature to render a patient-level prediction. TriPath also provides saliency heatmaps for clinical interpretation and validation. The computational workflow of TriPath with 2D processing is identical. Further details are described in the STAR Methods. NN, generic neural network layers dependent on the feature encoder choice; channel C, K, intermediate channels in feature encoder; Attn, attention module; Fc1, Fc2, fully connected layers.
Tables S1 Clinical data summary for OTLS and microCT cohort, related to Figure 2,3 Gleason grades are based on the standard histologic examination of prostatectomy specimens found in original medical reports.
Table S2: Data characteristic for OTLS and microCT images, related to Figure 2,3
3.2 Validation with simulated 3D data
Figure S1. 3D phantom datasets and analysis with TriPath, related to Figure 1
(A) Examples of single-channel 3D phantom data samples for the binary classification task (n = 100), false-colored for different cell types. Samples from the first class are dominated by normal cells (blue), while samples from the second class are dominated by abnormal cells with large eccentricity (red).
(B) Binary classification task AUC for a TriPath model trained and tested on a random plane from each volume (random plane), the targeted plane that contains both cell types (targeted plane) from each volume, all planes, and cuboids within the whole volume (whole-volume 2D and 3D). Statistical significance was assessed with the unpaired t test. ∗∗∗p ≤ 0.001 and ∗∗∗∗p ≤ 0.0001.
(C) Principal-component feature space plot for the sample-level attention-aggregated volume features for the whole-volume 3D approach, with the colors indicating ground truth labels. The good separation between the two classes supports the observed high AUC performance.
(D) Kaplan-Meier survival analysis stratified at 50th percentile by TriPath-predicted risk on survival prediction phantom dataset (n = 150) for 2D targeted single-plane and whole-volume 3D approaches. Error bars indicate one standard deviation from the mean, over five different experiments.
3.3 Evaluation on the OTLS cohort
Figure S2. Additional metrics for high-risk and low-risk patient classification task, related to Figures 2 and 3
Cohort-level balanced accuracy and F1 score for high-risk and low-risk patient classification task for (A) simulation, (B) OTLS (development cohort), and (C) microCT dataset. For OTLS and microCT cohorts, we also display performance metrics for the clinical baseline based on histologic examination of the prostatectomy specimen. In all three datasets and metrics, the 3D treatment of the whole volume and 3D patching are superior to the 2D plane-based alternatives. Error bars indicate one standard deviation from the mean, over five different experiments.
Figure S3. Comparison between different feature encoders in OTLS and microCT cohort, related to Figures 2 and 3
TriPath uses transfer learning to extract representative and compressed features from 2D patches and 3D patches. Consequently, the feature encoder architecture (e.g., CNN and ViT) and the pretraining dataset (e.g., radiology, videos, and images) affect the downstream performance.
(A) In the OTLS cohort, five encoders of different architectures and pretraining datasets were considered. ResNet-2D and SwinViT (2D) are defined by averaging 2D features of each level across depth within each 3D patch.
(B) The 2D slice with the largest tissue area and slices at ±20 μm from the level were selected for feature extraction with 2D feature encoders.
(C) Even with a different 3D feature encoder, we observe increased AUC as larger tissue volume is used for risk prediction.
(D–F) The same set of experiments for the microCT cohort. Results demonstrate that different feature encoders and pretraining datasets lead to varying performance levels. CNNs and ViTs trained on natural images or videos lead to better performance than those pretrained on domain-specific datasets (radiology and histology). We attribute the low performance of the radiology-pretrained SwinViT (3D) to a large difference in image resolution (3D radiology: 1–2 mm/voxel vs. 3D pathology: 1–4 μm/voxel) and inherently different morphology. This suggests that a feature encoder pretrained on datasets from the same data domain is warranted, which we leave for future work. For the main analyses, we use spatiotemporal CNN111 (i.e., ResNet-(2 + 1)D) for 3D analysis as it provides consistently good performance across both cohorts. For 2D analysis, we use ResNet-2D since it shares the same residual network backbone as ResNet-(2 + 1)D, thereby allowing for a fair comparison between 2D and 3D tasks. Error bars indicate one standard deviation from the mean, over five different experiments.
Figure S4. Integrated gradient (IG) analysis for open-top light-sheet microscopy (OTLS) dataset, related to Figure 2
(A) Patches from the high IG cluster (top 10%) exhibit infiltrative carcinoma that resembles predominantly poorly differentiated glands (Gleason pattern 4), exhibiting cribriform architecture. Patches from the middle IG cluster (middle 10% around 0) exhibit infiltrative carcinoma that resembles mixtures of Gleason patterns 3 and 4. Patches from the low IG cluster (bottom 10%) predominantly exhibit large, benign glands, with occasional corpora amylacea.
(B) Scatterplot of the normalized IG patch scores averaged within each sample as a function of predicted risk (the predicted probability for the high-risk group).
(C) The scatterplot of the proportion of the number of high, middle, and low IG group patches in each sample as a function of predicted risk, which shows that a sample with a higher predicted risk profile has a larger (smaller) fraction of high (low) IG patches.
(D) Kaplan-Meier curve for the cohort stratified (50%) by the ratio of the number of patches in the high and low IG group. The good stratification suggests that the extent to which prognostic morphologies manifest in each sample is also important. For survival curve comparison, statistical significance was assess with the log-rank test. The scale bar is 100 μm.
Figure S5. Examples of integrated gradient (IG) heatmaps for open-top light-sheet microscopy (OTLS) cohort, related to Figure 2
The IG scores are assigned to each patch with high IG (low IG) patches, indicating that patch contributes to an unfavorable (favorable) prognosis.
(A) High IG areas in the high-risk sample contain cancerous glands that resemble poorly differentiated tumor glands (Gleason pattern 4).
(B) In the low-risk sample, the high IG areas are those with cancerous glands that are smaller, more tortuous, and more closely resemble Gleason pattern 4, as well as regions with a cellular stroma. All scale bars are 200 μm. The heatmaps can also be visualized in our interactive demo.
3.4 Evaluation on the microCT cohort
Figure 3. TriPath analysis of microcomputed tomography (microCT) prostate cancer cohort
The microCT cohort contains volumetric tissue images of prostatectomy tissue from prostate cancer patients with 4 μm/voxel resolution.
(A) Cohort level AUC on 45-patient cohort for TriPath model trained and tested on the 3 planes separated by 20 μm, with the middle plane representing the largest tissue area within the biopsy (2D planes), and the 3D patches within the whole volume are processed with 2D and 3D feature encoders (whole-volume 2D and 3D, respectively). A clinical baseline based on the Gleason grade diagnosis of the whole prostatectomy specimen (prostatectomy grade) is also displayed. All baselines are repeated over five different experiments.
(B) Kaplan-Meier survival analysis with median BCR diagnosis date specified for each risk group, stratified at 50th percentile based on TriPath-predicted risk, for 2D planes and whole-volume 3D approaches.
(C) Ablation analysis with training and testing on increasing portions from the top of each volume.
(D) Principal-component feature space plot for 3D patches with high (unfavorable outcome), middle (no influence), and low (favorable outcome) 10% integrated gradient (IG) scores aggregated across the entire cohort. Representative 3D patches and 2D horizontal slices within the cuboid are displayed for each cluster.
(E and F) 3D IG heatmap with representative 2D horizontal planes displaying unfavorable (red) and favorable (blue) prognostic regions.
For AUC comparison, statistical significance was assessed with the unpaired t test with respect to the whole-volume 3D performance. ∗∗p ≤ 0.01 and ∗∗∗∗p ≤ 0.0001. For survival curve comparison, statistical significance was assessed with the log-rank test. Error bars indicate one standard deviation from the mean, over five different experiments. All scale bars are 250 μm.
See also Figures S2, S3, S6, and S7.
Figure S6. Integrated gradient (IG) heatmaps for the microcomputed tomography (microCT) cohort, related to Figure 3
The IG scores are assigned to each patch with high IG (low IG) patch, indicating that patch contributes to unfavorable (favorable) prognosis.
(A) In this high-risk sample, high IG values are localized in areas with the smallest and densest cancerous glands, especially when they are in or adjacent to the capsule of the prostate, as well as dense stroma that resembles the prostate capsule.
(B) Similar to the high-risk case, high IG regions in this low-risk sample correspond to areas with small, dense cancerous glands and dense stroma. The juxtaposition of these two morphologies has particularly high IG values. All scale bars are 500 μm. The heatmaps can also be visualized in our interactive demo.
Figure S7. Integrated gradient (IG) analysis for microcomputed tomography (microCT) dataset, related to Figure 3
(A) The high IG cluster (top 10%) consists of patches with infiltrative carcinoma that most closely resembles Gleason pattern 4; however, the lower resolution and lack of H&E staining make definitive grading infeasible by visual inspection of the microCT images alone. In the middle IG cluster (middle 10% around 0), most patches contain infiltrating carcinoma that resembles Gleason patterns 3 and 4. The low IG cluster (bottom 10%) consists mostly of patches containing benign prostatic tissue with occasional foci of infiltrative carcinoma that resemble Gleason pattern 3.
(B) Scatterplot of the normalized IG patch scores averaged within each sample as a function of predicted risk (the predicted probability for the high-risk group).
(C) The scatterplot of the proportion of the number of high, middle, and low IG group patches in each sample as a function of predicted risk, which shows that the sample with higher predicted risk has a larger (smaller) fraction of high (low) IG patches.
(D) Kaplan-Meier curve for the cohort stratified (50%) by the ratio of the number of patches in the high and low IG group. The good stratification suggests that the extent to which prognostic morphologies manifest in each sample is also important. For survival curve comparison, statistical significance was assess with the log-rank test. The scale bar is 250 μm.
3.5 Comparison with clinical baselines
Figure 4. Clinical validation of TriPath for 3D pathology
TriPath is validated against clinical baselines separately on the OTLS and microCT cohorts.
(A) For each biopsy sample in the OTLS cohort, 3 image slices (levels) taken from the center and ±20 μm of the 3D OTLS dataset (1 μm/voxel resolution) were presented to a panel of 6 certified pathologists. Each pathologist provided a biopsy-level Gleason grade diagnosis.
(B) Quadratic weighted kappa to assess agreement between pair of pathologists. Each point (black dot) represents an agreement between two pathologists.
(C) Cohort-level (n = 50) BCR status prediction AUCs are shown based on 6 pathologists’ diagnoses of 3 image slices (individual and consensus), the diagnosis from standard post-operative histopathology of the whole prostatectomy specimen, and TriPath-predicted risks (3D pathology). Each dot represents the cohort-level AUC repeated over five different random data splits.
(D) For each tissue block that was imaged with microCT, we obtained the adjacent tissue section and prepared an H&E-stained whole-slide image (WSI). The resulting WSI and ROI (where the ROI matches the lateral field of view of the microCT scan) were used for risk prediction with 2D TriPath.
(E) Cohort-level (n = 45) BCR status prediction AUC based on the diagnosis of the whole prostatectomy specimen (original pathology report) and TriPath-predicted risks from H&E-histology (WSI and ROI) and microCT datasets are shown.
Whiskers extend to data points within 1.5× the interquartile range. Statistical significance was assessed with the unpaired t test with respect to TriPath performance. ∗p ≤ 0.05, ∗∗p ≤ 0.01, ∗∗∗p ≤ 0.001, and ∗∗∗∗p ≤ 0.0001.
See also Figure S8.
Figure S8. Clinical validation of 3D pathology for OTLS cohort, related to Figure 4
TriPath’s performance is further validated in OTLS cohort (development dataset) with the second part of the reader study.
(A) The web interface for the reader study, where a pathologist could scroll through images of OTLS biopsy (three and all slices for the first and second round, respectively).
(B) After 2 months of washout period from the first part, each pathologist was shown all slices of the OTLS biopsy image to provide per-biopsy diagnoses.
(C) Cohort-level (n = 50) BCR status prediction AUCs are shown based on 6 pathologists’ diagnoses of all image slices (individual and consensus) and the diagnosis from standard post-operative histopathology of the whole prostatectomy specimen. We employ two versions of TriPath: whole-volume 2D slices (blue), where 2D patches are generated from the 2D slices across the whole volume, and whole-volume 3D (orange), where 3D patches are generated from the whole volume (our 3D pathology baseline).
(D) Quadratic weighted kappa to assess agreement between pair of pathologists. The median kappa value of 0.662 is slightly lower than that of 2D reader study (median kappa: 0.677). Although the pathologists’ consensus performance increases compared with that of the first reader study (all slices AUC: 0.799 vs. three slices AUC: 0.744), we observe that whole-volume 3D TriPath still outperforms all clinical baselines. Combined with the fact that the median Kappa value did not change significantly, the results suggest that it is non-trivial for humans to process a huge stack of 2D slices (100-fold increase in number of slices per biopsy). In addition, the fact that whole-volume 3D TriPath outperforms both the pathologist baselines and whole-volume 2D slices TriPath, both of which use the entire volume and rely on interpreting 2D morphology, suggests the importance of encoding 3D morphology. Whiskers extend to data points within 1.5× the interquartile range.
3.6 Mitigation of sampling bias with 3D volume analysis
Figure 5. Plane variability analysis for open-top light-sheet microscopy (OTLS) dataset
A TriPath model with 2D feature encoder is trained on 2D patches from all planes of whole volume and predicts risk (the probability for the high-risk group) of individual planes of the test sample. This yields the predicted risk profile at the plane-level granularity.
(A) Given the plane-level predicted risks for each sample, the difference between the lower 5% and upper 95% value is computed (risk difference).
(B) An arbitrary risk decision threshold (e.g., 0.5) falls within the 90% risk interval for several patients, for whom the associated risk group can change depending on the plane chosen within the tissue volume.
(C) Plane-level predicted risk, which fluctuates from low risk to high risk, as a function of depth within the volume for a patient.
(D) Principal-component feature space for attention-aggregated plane-level features for the sample. The separation into two clusters along the risk group reflects the risk variation observed in (C).
(E) Morphological analysis of the low-risk (depth 10) and high-risk plane (depth 275). The higher-risk plane contains a larger proliferation of glands resembling Gleason pattern 4 than the lower-risk plane, which is dominated by Gleason pattern 3.
Figure 6. Comparison between whole-volume and partial-volume analysis
Given the model trained on whole-volume 3D, the cohort-level AUC is computed with 5-fold cross-validation for the whole volume (whole volume) or for 15% of the tissue volume randomly sampled (partial volume). For partial volume, we repeat the experiment 50 times, randomly sampling different portions of tissue volumes each time while keeping the data split the same.
(A) OTLS cohort AUC spread for the partial-volume analysis (teal, each dot representing an experiment) and AUC for the whole-volume analysis (red). Whiskers extend to data points within 1.5× the interquartile range.
(B) IG score ranking for 3D patches when tested on the whole volume and partial volume of a given OTLS sample, where a higher ranking corresponds to a larger integrated gradient (IG) score.
(C and D) The same analyses for the microCT cohort. All scale bars are 100 μm.
3.7 Cross-modal evaluation between the OTLS and microCT cohorts
Figure 7. Cross-modal and cross-institutional evaluation between OTLS and microCT cohorts
A TriPath model was trained with the whole-volume 3D on one cohort and tested on the other, to assess whether the model learns generalizable prostate cancer prognostic morphologies across imaging modalities and institutions. To match the 4 μm/voxel resolution and single-channel characteristics of the microCT dataset, the OTLS dataset is downsampled by a factor of 4, and only the nuclear channel is retained, resulting in a converted OTLS dataset.
(A) Test AUC for the microCT cohort with TriPath trained on converted OTLS or microCT cohorts and the cross-modal Kaplan-Meier curve for cohort stratification of high-risk and low-risk groups.
(B) Identical analyses to (A) but tested on converted OTLS with the TriPath model trained on microCT or converted OTLS cohorts.
(C and D) Integrated gradient (IG) heatmaps for cross-modal experiments. Despite the difference in train and test modalities, the TriPath model identifies poorly differentiated glands (C) and infiltrative carcinoma (D) as unfavorable prognostic morphologies, concurring with IG heatmaps from the same-modality setting. For survival curve comparison, statistical significance was assessed with the log-rank test. Error bars indicate one standard deviation from the mean, over five different experiments. All scale bars are 250 μm.