Nat Med.作者提供全文的绘图代码,对于学习作图很有帮助,但你不一定能成功绘制......

文摘   2024-09-23 17:40   云南  

一边学习,一边总结,一边分享!

由于微信改版,一直有同学反映。存在长时间接收不到公众号的推文。那么请跟随以下步骤,将小杜的生信筆記设置为星标,不错过每一条推文教程。

欢迎关注《小杜的生信笔记》!!

如何加入社群

小杜的生信笔记仅有微信社群

1. 微信群:付费社群。添加小杜好友,加友请知:加友须知!!,加入社群请查看小杜生笔记付费加友入群声明

入群声明

2. 小杜个人微信:若你有好的教程或想法,可添加小杜个人微信。值得注意的是,小杜个人微信并不支持免费咨询长时间咨询,但支持小问题2-3个免费咨询。

小杜微信:

知识星球:

本期教程


获得本期教程全文代码:在订阅号后台回复关键词:20240923

2022年教程总汇

https://mp.weixin.qq.com/s/Lnl258WhbK2a8pRZFuIyVg

2023年教程总汇

https://mp.weixin.qq.com/s/wCTswNP8iHMNvu5GQauHdg

引言

今天分享的文章是2024发表在Nat Med.期刊中,来自上海交通大学医学院的文章,作者提供了全文的绘图代码,确实,对于学习绘图提供充分的素材。也是一个学习作图具重大意义的文章。但是,依旧是基于你有一定基础技能之上。

文中并未给你提供作图原数据,因此,这些都是需要自己去摸索。

文章

Prospective observational study on biomarkers of response in pancreatic ductal adenocarcinoma

文章图形亮点:

1. 全文的图形并不复杂,因此,我们在制图中,并不需要追求那些很复杂的图形,除非,你很牛X,可以自己绘制。
2. 图形中网络图。这个是一个相对值得我们学习的图形之一,WGNCA和DEG都可以做网路图。
3. 提供你全部的绘图代码。

文章图形

<<<>>>

Code

1. Figure 1

1.1 PCA

library(scatterplot3d)
library(tidyverse)
library(openxlsx)
library(factoextra) # Extract and Visualize the Results of Multivariate Data Analyses
color.bin <- c("#00599F","#D01910")
dir.create("./results/Figure1",recursive = T)
#----------------------------------------------------------------------------------
# Step 1: Load Proteomics data
#----------------------------------------------------------------------------------
pro <- readRDS("./data/proteomics/20230412_PDAC_PRO_exp.rds") # 4787 protein * 281 samples
meta <- readRDS("./data/proteomics/20230412_PDAC_PRO_meta.rds")
identical(rownames(meta),colnames(pro)) # check names
#----------------------------------------------------------------------------------
# Step 2: PCA
#----------------------------------------------------------------------------------
res.pca.comp <- prcomp(pro, scale = F)
#----------------------------------------------------------------------------------
# Step 3: Plot
#----------------------------------------------------------------------------------
plot.data <- as.data.frame(res.pca.comp$rotation[, 1:10])
plot.data <- plot.data %>%
mutate(ID=rownames(plot.data),
Type=meta$Type,
TypeColor=color.bin[as.numeric(as.factor(Type))])
pdf("./results/Figure1/1B.PRO_PCA3d.pdf", width = 7, height = 7)
scatterplot3d(x = plot.data$PC2,
y = plot.data$PC1,
z = plot.data$PC3,
color = plot.data$TypeColor,
pch = 16, cex.symbols = 1,
scale.y = 0.7, angle = 45,
xlab = "PC2", ylab = "PC1", zlab = "PC3",
main="3D Scatter Plot of proteomics",
col.axis = "#444444", col.grid = "#CCCCCC")
legend("bottom", legend = levels(as.factor(meta$Type)),
col = color.bin, pch = 16,
inset = -0.15, xpd = TRUE, horiz = TRUE)
dev.off()
write.xlsx(plot.data, "./results/Figure1/1B.PRO_PCA3d.xlsx", overwrite = T)
# variance percent of each PC
p <- fviz_eig(res.pca.comp)
var_explained <- get_eig(res.pca.comp)
# var_explained <- res.pca.comp$sdev^2 / sum(res.pca.comp$sdev^2)
ggsave("./results/Figure1/1B.PRO_PCA_percent.pdf",p,width = 5, height = 5)
write.xlsx(var_explained, "./results/Figure1/1B.PRO_PCA_percant.xlsx",rowNames=T, overwrite = T)

1.2 Volcano plot

library(limma)
library(tidyverse)
library(openxlsx)
library(ggpubr)
library(ggthemes)
#----------------------------------------------------------------------------------
# Step 1: Load Proteomics data
#----------------------------------------------------------------------------------
exp <- readRDS("./data/proteomics/20230412_PDAC_PRO_exp.rds") # 4787 protein * 281 samples
meta <- readRDS("./data/proteomics/20230412_PDAC_PRO_meta.rds")
identical(rownames(meta),colnames(exp)) # check names
#----------------------------------------------------------------------------------
# Step 2: limma
#----------------------------------------------------------------------------------
meta <- meta %>% mutate(contrast=as.factor(Type))
design <- model.matrix(~ 0 + contrast , data = meta) # un-paired
fit <- lmFit(exp, design)
contrast <- makeContrasts( Tumor_Normal = contrastT - contrastN , levels = design)
fits <- contrasts.fit(fit, contrast)
ebFit <- eBayes(fits)
limma.res <- topTable(ebFit, coef = "Tumor_Normal", adjust.method = 'fdr', number = Inf)
## result
limma.res <- limma.res %>% filter(!is.na(adj.P.Val)) %>%
mutate( logP = -log10(adj.P.Val) ) %>%
mutate( tag = "Tumor -vs- Normal")%>%
mutate( Gene = ID)
# cutoff: FC:1.5 adj.p:0.05
limma.res <- limma.res %>% mutate(group=case_when(
(adj.P.Val < 0.05 & logFC > 0.58) ~ "up",
(adj.P.Val < 0.05 & logFC < -0.58) ~ "down",
.default = "not sig"))
table(limma.res$group) # UP:1213 ; DOWN:864 ; not:2710
## output
write.xlsx( limma.res, "./results/Figure1/1C.PRO_Limma_fc1.5.xlsx", overwrite = T, rowNames = F)
#----------------------------------------------------------------------------------
# Step 3: Vasualization
#----------------------------------------------------------------------------------
## volcano
limma.res <- limma.res %>% mutate(group=factor(group,levels = c("up","down","not sig")))
my_label <- paste0( "FC:1.5 ; AdjP:0.05 ; ",
"Up:",table(limma.res$group)[1]," ; ",
"Down:",table(limma.res$group)[2])
p <- ggscatter(limma.res,
x = "logFC", y = "logP",
color = "group", size = 2,
main = paste0("Tumor -vs- TAC"), # ***
xlab = "log2FoldChange", ylab = "-log10(adjusted P.value)",
palette = c("#D01910","#00599F","#CCCCCC"),
ylim = c(-1, 70),xlim=c(-8,8))+
theme_base()+
geom_hline(yintercept = -log10(0.05), linetype="dashed", color = "#222222") +
geom_vline(xintercept = 0.58 , linetype="dashed", color = "#222222") +
geom_vline(xintercept = -0.58, linetype="dashed", color = "#222222") +
labs(subtitle = my_label)+
theme(plot.background = element_blank())
ggsave("./results/Figure1/1C.PRO_Limma_fc1.5.pdf", p, width = 10, height = 10)

1.3 (d) Enrichment plot

# data : enriched pathways table
plot.data <- read.xlsx("./data/Extended Data Table 3.xlsx", sheet = 2, startRow = 2)
plot.pathway <- c("GO:0006730~one-carbon metabolic process","GO:0006888~ER to Golgi vesicle-mediated transport","hsa00020:Citrate cycle (TCA cycle)","hsa00071:Fatty acid degradation","hsa00190:Oxidative phosphorylation","hsa00250:Alanine, aspartate and glutamate metabolism","hsa00260:Glycine, serine and threonine metabolism","hsa00280:Valine, leucine and isoleucine degradation","hsa00480:Glutathione metabolism","hsa00520:Amino sugar and nucleotide sugar metabolism","hsa00620:Pyruvate metabolism","hsa00630:Glyoxylate and dicarboxylate metabolism","hsa00640:Propanoate metabolism","hsa01100:Metabolic pathways","hsa01200:Carbon metabolism","hsa01212:Fatty acid metabolism","hsa01240:Biosynthesis of cofactors","hsa03010:Ribosome","hsa03060:Protein export","hsa04141:Protein processing in endoplasmic reticulum","hsa04972:Pancreatic secretion","GO:0001916~positive regulation of T cell mediated cytotoxicity","GO:0006096~glycolytic process","GO:0007165~signal transduction","GO:0007266~Rho protein signal transduction","GO:0045087~innate immune response","GO:0050778~positive regulation of immune response","GO:0050853~B cell receptor signaling pathway","GO:0050870~positive regulation of T cell activation","GO:0071346~cellular response to interferon-gamma","hsa04015:Rap1 signaling pathway","hsa04062:Chemokine signaling pathway","hsa04066:HIF-1 signaling pathway","hsa04151:PI3K-Akt signaling pathway","hsa04512:ECM-receptor interaction","hsa04610:Complement and coagulation cascades","hsa04621:NOD-like receptor signaling pathway","hsa04666:Fc gamma R-mediated phagocytosis")
plot.data <- plot.data %>% filter(Term %in% plot.pathway) %>%
mutate(LogFDR= -log10(FDR))
# plot
color.bin <- c("#00599F","#D01910")
p <- ggscatter(plot.data, x = "LogFDR", y = "Fold.Enrichment", color = "Type",
main = "Enrichment of tumor/TAC protein",
size = "Ratio", shape = 16,
label = plot.data$Term, palette = color.bin) + theme_base()
p <- p + scale_size(range = c(4,20)) +
scale_x_continuous(limit = c(-10, 40)) +
theme(plot.background = element_blank())
ggsave(paste0("./results/Figure1/1D.PRO_deg.tn_enrich.pdf"), p, width = 11, height = 10)

2. Figure 2

2.1 (a) WGCNA

library(WGCNA)
dir.create("./results/Figure2/WGCNA",recursive = T)
#----------------------------------------------------------------------------------
# Step 1: Loading expression data and set parameters
#----------------------------------------------------------------------------------
protein.nona.tumor <- readRDS("./data/proteomics/wgcna/20230412_PDAC_PRO_Tumor_exp.rds")
corType = "pearson"
corFnc = ifelse(corType=="spearman", cor, bicor)
maxPOutliers = ifelse(corType=="pearson",1, 0.05)
# input data normalization
plot.mat <- t(protein.nona.tumor - rowMeans(protein.nona.tumor)) # row: samples | col: genes
#----------------------------------------------------------------------------------
# Step 2: identification of outlier samples
#----------------------------------------------------------------------------------
sampleTree = hclust(dist(plot.mat), method = "ward.D")
pdf(paste0("./results/Figure2/WGCNA/1.tree.pdf"), width = 20, height = 9)
plot(sampleTree, main = "Sample clustering to detect outliers", sub="", xlab="")
dev.off()
#----------------------------------------------------------------------------------
# Step 3: analysis of network topology
#----------------------------------------------------------------------------------
# Choose a set of soft-thresholding powers
powers = c(c(1:10), seq(from = 12, to = 30, by = 2))
# Call the network topology analysis function
sft = pickSoftThreshold(plot.mat, powerVector=powers,
networkType="unsigned", verbose=5)
# Plot the results:
pdf(paste0("./results/Figure2/WGCNA/2.power.pdf"), width = 12, height = 8)
par(mfrow = c(1,2))
cex1 = 0.9
plot(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2],
xlab="Soft Threshold (power)",
ylab="Scale Free Topology Model Fit, signed R^2",type="n",
main = paste("Scale independence"))
text(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2],
labels=powers,cex=cex1,col="red")
abline(h=0.85,col="red")
plot(sft$fitIndices[,1], sft$fitIndices[,5],
xlab="Soft Threshold (power)",ylab="Mean Connectivity", type="n",
main = paste("Mean connectivity"))
text(sft$fitIndices[,1], sft$fitIndices[,5], labels=powers,
cex=cex1, col="red")
dev.off()
#----------------------------------------------------------------------------------
# Step 4: soft threshold : power
#----------------------------------------------------------------------------------
power = sft$powerEstimate
power
#----------------------------------------------------------------------------------
# Step 5: One-step network construction and module detection
#----------------------------------------------------------------------------------
net = blockwiseModules(plot.mat, power = power, maxBlockSize = ncol(plot.mat),
TOMType = "signed", minModuleSize = 20,
reassignThreshold = 0, mergeCutHeight = 0.0001,
numericLabels = TRUE, pamRespectsDendro = FALSE,
saveTOMs=TRUE, corType = corType,
maxPOutliers = 1, loadTOMs=TRUE,
randomSeed = 931, # seed
saveTOMFileBase = paste0("./results/Figure2/WGCNA/WGCNA.tom"),
verbose = 3, pearsonFallback = "none", deepSplit = 3 )
# module:
table(net$colors) # 0 corresponds to unclassified genes
# Convert [number] labels to [colors] for plotting
moduleLabels = net$colors
moduleColors = labels2colors(moduleLabels)
# plot
pdf(paste0( "./results/Figure2/WGCNA/3.module.pdf"), width = 8, height = 6)
plotDendroAndColors(net$dendrograms[[1]], moduleColors[net$blockGenes[[1]]],
"Module colors",
dendroLabels = FALSE, hang = 0.03,
addGuide = TRUE, guideHang = 0.05)
dev.off()
#----------------------------------------------------------------------------------
# Step 6: Vasualization
#----------------------------------------------------------------------------------
## data
node.data <- readRDS("./data/proteomics/wgcna/2D-node.data.rds")
edge.data <- readRDS("./data/proteomics/wgcna/2D-edge.data.rds")
color.module <- c("#CCCCCC","#ecb888","#af88bb","#a032cb","#efbed6","#fc496a","#b6d37f","#589336","#7fd68e","#52c465","#3372e0","#84d7f6","#5394c3","#6376b3","#7f6cd7","#c4ceff","#fc9d40","#5c95e0","#cd7560","#ff70e4", "#ff8738", "#ffcead","#1cbf8b", "#b76d38", "#1584ff", "#7f006d", "#ffd35f","#E66F73","#F57F20","#1DBB95","#9CB79F","#F0B8D2","#A0485E","#A0688E","#C7E1DF","#51B1DF","#6D97D7","#5D6193","#CEC3E0","#A9917E","#7C7D80","#F4E192","#ADD666")
names(color.module) <- paste0("ME", seq(0,length(color.module)-1) %>% str_pad(width = 2,side = "left",pad = "0"))
## plot network
gg <- ggplot()
gg <- gg + geom_segment(mapping = aes(x = from.x, y = from.y, xend = to.x, yend = to.y),
color = "#CCCCCC", size = 0.01, data = edge.data) # draw a straight line
gg <- gg + geom_point(mapping = aes(x = pos.x, y = pos.y, color = Module),
size = 2, data = node.data) # add point
gg <- gg + scale_size(range = c(0, 6) * 2) # specifies the minimum and maximum size
gg <- gg + theme_void()
gg <- gg + labs(x = "", y = "", title = paste0("PPI"))
gg <- gg + scale_colour_manual(values = color.module)
ggsave(paste0("./results/Figure2/2A-PRO-modules.png"), gg, width = 10, height = 8)

2.2 (b) Significantly up/down-regulated proteins

#----------------------------------------------------------------------------------------------
# Overlay the log2 fold change between tumor and TAC on PRO in the WGCNA network
#----------------------------------------------------------------------------------------------
## data
node.data <- readRDS("./data/proteomics/wgcna/2D-node.data.rds") # DEG & module genes
edge.data <- readRDS("./data/proteomics/wgcna/2D-edge.data.rds")
## plot
gg <- ggplot()
gg <- gg + geom_segment(mapping = aes(x = from.x, y = from.y, xend = to.x, yend = to.y),
color = "#CCCCCC", size = 0.01, data = edge.data)
gg <- gg + geom_point(mapping = aes(x = pos.x, y = pos.y, color = proSig), size = 1,
data = node.data[which(node.data$proSig == "zz"), ])
gg <- gg + geom_point(mapping = aes(x = pos.x, y = pos.y, color = proSig), size = 2.5,
data = node.data[which(node.data$proSig != "zz"), ])
gg <- gg + scale_size(range = c(0, 6) * 2)
gg <- gg + theme_void()
gg <- gg + labs(x = "", y = "", title = paste0("color by proSig"))
gg <- gg + scale_colour_manual(values = c("#00599F","#D01910","#AAAAAA"))
ggsave(paste0("./results/Figure2/2B-proSig.png"), gg, width = 8.5, height = 8)

2.3 (c) Modules enrichment score

color.bin <- c("#00599F","#d80700")
#----------------------------------------------------------------------------------
# Step 1: Load the RJ-cohort 1 Data
#----------------------------------------------------------------------------------
# Scores of the 32 modules in TACs and PDACs of RJ-cohort 1
plot.data <- read.xlsx("./data/Extended Data Table 4.xlsx", sheet = 2, startRow = 2, rowNames = T)
# ME01 ME02 ME03 ME04
# RJ-01-0143-N_PRO 0.3474261 -0.114677437 -0.04291552 0.06708347
# RJ-01-0768-N_PRO 0.2532539 0.009051615 -0.03585508 0.03486392
# RJ-01-0697-N_PRO 0.2839048 -0.032966990 -0.02943711 0.05803330
# RJ-01-0609-N_PRO 0.3206755 -0.087559353 -0.05749618 0.10310726
plot.info <- NULL
module.name <- colnames(plot.data)
plot.stat <- data.frame(Module = module.name,
Wilcox.P = NA,
Wilcox.Padj = NA)
#----------------------------------------------------------------------------------
# Step 2: wilcox.test
#----------------------------------------------------------------------------------
for (i in 1:ncol(plot.data)) {
sub <- data.frame(Sample = rownames(plot.data),
Score = as.numeric(plot.data[, i]),
ScoreScale = scale(as.numeric(plot.data[, i])),
Module = colnames(plot.data)[i],
Type = substr(rownames(plot.data), 12, 16))
plot.stat$Wilcox.P[i] <- wilcox.test(sub$ScoreScale ~ sub$Type)$p.value
plot.info <- rbind(plot.info, sub)
}
plot.info <- plot.info %>% mutate(Type=factor(as.character(Type), levels = c("N_PRO","T_PRO")))
plot.stat <- plot.stat %>% mutate(Wilcox.Padj=p.adjust(Wilcox.P, method = "fdr"))
#----------------------------------------------------------------------------------
# Step 3: plot
#----------------------------------------------------------------------------------
p <- ggbarplot(plot.info, x = "Module", y = "ScoreScale",
color = "Type", fill = "Type",
palette = color.bin, width = 0.5, size = 0,
add = c("mean_se"), add.params = list(width = 0.5),
order = module.name,
position = position_dodge(0.6),
xlab = "", ylab = "Module Score of Protein")
p <- p + theme_base()
p <- p + geom_hline(yintercept = 0, color = "black")
p <- p + theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))
p <- p + stat_compare_means(aes(group = Type, label = ..p.format..), size = 1, method = "wilcox.test", label.y = 1.2) + theme(plot.background = element_blank())
#----------------------------------------------------------------------------------
# Step 4: output
#----------------------------------------------------------------------------------
ggsave(paste0("./results/Figure2/Figure2c-1.pdf"), p, width = 12, height = 4)
write.xlsx(plot.stat, paste0("./results/Figure2/Figure2c.padj-1.xlsx"))

3. Figure 3

3.1 (a) LASSO model for PDAC

library(glmnet)
library(openxlsx)
library(survminer)
library(survival)
library(tidyverse)
library(RColorBrewer)
library(ComplexHeatmap)
library(forestmodel)
library(ROCR)
dir.create("./results/Figure3/lasso",recursive = T)
#----------------------------------------------------------------------------------
# Step 1: load data
#----------------------------------------------------------------------------------
## PDAC Tumor proteomics
pdac.pro <- readRDS('./data/proteomics/20230412_PDAC_PRO_exp.rds')
pdac.pro <- pdac.pro [,grep('_T_',colnames(pdac.pro))] %>% t() %>% as.data.frame() # 191 tumor samples * 4787 protein
## WGCNA protein
wgcna <- readRDS("./data/proteomics/wgcna/2D-node.data.rds")
pdac.pro <- pdac.pro [,intersect(wgcna$Gene,colnames(pdac.pro))] # 191 samples * 3906 protein
## PDAC clinical data
cli.pdac <- readRDS('./data/clinical/20230412_PDAC_191_patients_clinical.rds')
rownames(cli.pdac) <- cli.pdac$ID
## Lasso data
y.train <- data.frame(time=cli.pdac$Time_OS,
status=cli.pdac$Censoring_OS,
row.names = row.names(cli.pdac)) %>%
na.omit(y.train) %>%
as.matrix()
x.train <- pdac.pro[row.names(y.train),] %>% as.matrix() %>% na.omit()
y.train <- y.train[row.names(x.train),]
#----------------------------------------------------------------------------------
# Step 2: Fit the Lasso Regression Model
#----------------------------------------------------------------------------------
set.seed(1000)
fit <-glmnet(x=x.train,
y=y.train,
family = "cox", # cox : cox , binomial : logistic
alpha = 1) # 1 : lasso penalty, 0 : ridge penalty.
pdf('./results/Figure3/lasso/0.Coefficients.pdf',width = 5,height = 4)
plot(fit,xvar="lambda",label=F)
dev.off()
# Cross-validation
cvfit <- cv.glmnet(x=x.train,
y=y.train,
family = "cox",
type.measure="deviance",
alpha = 1,
nfolds = 10)
png(paste0('./results/Figure3/lasso/S5A-Coeff.png'), width = 5,height = 5,res=1200,units="in")
plot(fit,xvar="lambda",label=F)
abline(v = log(cvfit$lambda.min), col = "black",lty=2)
text(-4,-4, labels = paste0("lambda.min: ", round(cvfit$lambda.min,4)),col = "black",cex = 0.75)
dev.off()
png(paste0('./results/Figure3/lasso/S5B-cvfit.png'), width = 5,height = 5,res=1200,units="in")
plot(cvfit)
text(-3,250, labels = paste0("lambda.min: ", round(cvfit$lambda.min,4)),col = "red",cex = 0.75)
text(-3,200, labels = paste0("lambda.1se: ", round(cvfit$lambda.1se,4)),col = "black",cex = 0.75)
dev.off()
#----------------------------------------------------------------------------------
# Step 3: Best Lasso model
#----------------------------------------------------------------------------------
lasso_best <- glmnet(x = x.train,
y = y.train,
alpha = 1,
lambda = cvfit$lambda.min,
nfolds = 10,
family = "cox")
gene.coef <- as.matrix(round(coef(lasso_best), 4))
coef.lasso <- gene.coef[which(gene.coef[, 1] != 0), ]
coef.lasso <- data.frame(genes=names(coef.lasso),
coefficient=as.numeric(coef.lasso)) %>% arrange(-coefficient)
write.xlsx(coef.lasso, "./results/Figure3/lasso/2.coef_PDAC.Pro.xlsx", rowNames = F,colNames = T,overwrite = T)
## Lasso score 
y <- as.data.frame(y.train) %>%
mutate(LassoScore=predict(lasso_best,s=cvfit$lambda.min,newx=x.train)) %>%
mutate(LassoScore=as.numeric(LassoScore), ID=rownames(y.train))
y <- y %>% mutate(LASSO.level=ifelse(LassoScore>median(LassoScore),"High" ,"Low")) %>%
mutate(LASSO.level=factor(LASSO.level,levels = c( "Low","High"))) ## cutoff : median
write.xlsx(y,paste0('./results/Figure3/lasso/3.Lasso_Score.xlsx'),rowNames = T,colNames = T,overwrite = T)
#----------------------------------------------------------------------------------
# Step 4: Visualization
#----------------------------------------------------------------------------------
## bar plot : gene cofficient
df.coef <- coef.lasso
rownames(df.coef) <- df.coef$genes
df.coef <- df.coef %>% transmute(gene=genes,cor=coefficient) %>%
mutate(gene=paste0(gene," (",cor,")")) %>%
arrange(cor) %>%
mutate(gene=factor(gene,levels = gene)) %>%
mutate(group=ifelse(cor>0,"A","B"))
p1 <- ggplot(df.coef, aes(gene, cor, fill = group)) +
geom_bar(stat = 'identity') +
ylab('coefficient') +
xlab('') +
guides(colour=F,fill=F) +
theme_bw(base_size = 12) +
scale_fill_manual(values = brewer.pal(9,'Set1')) +
theme(panel.grid =element_blank(),
axis.text = element_text(colour = 'black'),
axis.text.x = element_text(size = 8)) +
coord_flip()
ggsave("./results/Figure3/5A-Lasso_cofficient_barplot.png",p1,width = 3,height = 4)
## dot plot : lasso score
df.score <- y
df.score <- df.score %>% mutate(samples=rownames(df.score)) %>%
arrange(LassoScore) %>%
mutate(samples=factor(samples,levels = samples))
p2 <- ggscatter(df.score, x="samples", y="LassoScore", color = "LASSO.level",
palette = c("#0ac9c9","#ea358d"))+
geom_hline(yintercept=median(df.score$LassoScore), linetype=2,color='black') +
geom_vline(xintercept=median(1:nrow(df.score))+0.5, linetype=2,color='black') +
annotate('text',
x=median(1:nrow(df.score))+15,
y=median(df.score$LassoScore)+0.5,
label=paste0('cutoff: ', round(median(df.score$LassoScore),4)),size=5, color='black') +
ylab('PDAC Score') +
xlab('')
ggsave("./results/Figure3/5A-Lasso_score_dotplot.png",p2,width = 8,height = 4)
## Heatmap : protein abundance
heat.data <- t(x.train)
heat.data <- heat.data[rev(rownames(df.coef)),row.names(df.score)]
heat.data <- t(scale(t(heat.data)))
png("./results/Figure3/5A-Lasso_gene_heatmap.png",width = 8,height = 4,res=1200,units="in")
p3 <- Heatmap(heat.data, name = "Z-score",
col = colorRampPalette(c("#00599F","#1c7ec9","#FFFFFF",
"#e53f39","#D01910"))(100),
cluster_rows = F,cluster_columns = F,show_column_names = F)
draw(p3)
dev.off()
#----------------------------------------------------------------------------------
# Step 5: receiver operating characteristic (ROC)
#----------------------------------------------------------------------------------
# performance of model
lasso.prob <- predict(cvfit, newx=x.train , family = "cox",
s=c(cvfit$lambda.min, cvfit$lambda.1se) )
re <- cbind(as.data.frame(y.train),lasso.prob) %>% as.data.frame()
colnames(re) <- c('time','status','prob_min','prob_1se')
# model: lambda.min
pred_min <- prediction(predictions=re$prob_min, labels=re$status) # predictions: containing the prediction. labels: true class labels
perf_min <- performance(pred_min,"tpr","fpr")
auc_min <- performance(pred_min,"auc")@y.values[[1]]
# mocel: lambda.1se
pred_1se <- prediction(re$prob_1se, re$status)
perf_1se <- performance(pred_1se,"tpr","fpr")
auc_1se <- performance(pred_1se,"auc")@y.values[[1]]
## plot
png(paste0('./results/Figure3/lasso/S5C-ROC.png'), width = 5,height = 5,res=1200,units="in")
plot(perf_min,colorize=F, col="red")
plot(perf_1se,colorize=F, col="blue",add = T)
lines(c(0,1),c(0,1), col = "gray", lty = 4 )
text(0.7,0.3, labels = paste0("lambda.min: AUC=", round(auc_min,2)),col = "red")
dev.off()
#----------------------------------------------------------------------------------
# Step 6: Multivariate analysis
#----------------------------------------------------------------------------------
# data
y_forest <- y %>% left_join(cli.pdac , by =c('ID'='ID'))
y_forest <- y_forest %>% select( c('time','status','LASSO.level',
'age','gender','CA125',
'CA19_9','CEA','Smoking',"Drinking",'AJCC_V8',
'Censoring_chemo'))
y_forest <- y_forest %>% rename(LassoLevel=LASSO.level,
Age=age,Gender=gender,
CA199=CA19_9,
Adjuvant_therapy=Censoring_chemo,
AJCC_stage=AJCC_V8)
y_forest <- y_forest %>% mutate(Gender=factor(Gender,levels = c("Female","Male")))
# AJCC_stage
table(y_forest$AJCC_stage)
a <- y_forest$AJCC_stage
b <- recode(a, Ia = "I", Ib = "I",IIa= "II", IIb= "II",III="III",IV="IV")
y_forest$AJCC_stage <- factor(b,levels = c("I","II","III"))
# coxph
cfit <- coxph(Surv(time,status)~LassoLevel+Age+Gender+CA125+CA199+
CEA+Smoking+Drinking+AJCC_stage+Adjuvant_therapy,
data = y_forest)
# Forest plots
p <- forest_model(cfit,
format_options = forest_model_format_options(colour = brewer.pal(9,'Set1')[2], shape = 16,text_size = 15,point_size = 2.5, banded = T),
factor_separate_line = F )
ggsave(paste0("./results/Figure3/S5D-multivariate_forest.png"), p, width = 8, height = 6)

3.2 (b) Kaplan-Meier curves in OS/DFS

#----------------------------------------------------------------------------------
# Step 1: Loading data
#----------------------------------------------------------------------------------
# (1) Sample information and clinical characteristics of the 191 PDAC patients (the RJ-cohort 1)
rj1.cohort <- read.xlsx("./data/Extended Data Table 2.xlsx", startRow = 2)
# ID Proteomic_ID RNAseq_ID Age Gender BMI DM Smoking
# 1 RJ-01-0020 RJ-01-0020-T_PRO RJ-01-0020-T_RNA 61 Female 19.63000 1 0
# 2 RJ-01-0038 RJ-01-0038-T_PRO RJ-01-0038-T_RNA 73 Male 22.46003 0 0
# 3 RJ-01-0050 RJ-01-0050-T_PRO RJ-01-0050-T_RNA 69 Female 17.63085 0 0
# 4 RJ-01-0069 RJ-01-0069-T_PRO RJ-01-0069-T_RNA 69 Female 17.48179 1 0
# 5 RJ-01-0070 RJ-01-0070-T_PRO RJ-01-0070-T_RNA 65 Male 23.43750 0 0
# 6 RJ-01-0074 RJ-01-0074-T_PRO RJ-01-0074-T_RNA 61 Male 23.66144 0 0
rj1.cohort <- rj1.cohort %>% mutate(LassoLevel=factor(as.character(LassoLevel), levels = c("Low","High")))
color.bin.lasso <- c("#00599F","#d80700")
#----------------------------------------------------------------------------------
# Step 2: OS survival stratified by lasso level
#----------------------------------------------------------------------------------
info <- summary(coxph(Surv(OS_month, OS_status) ~ LassoLevel, data = rj1.cohort))
anno.text <- ""
for (i in 1:nrow(info$conf.int)) {
anno.text <- paste0(anno.text, "\n", paste0(rownames(info$conf.int)[i], " HR=", round(info$conf.int[i, 1], 3), " CI=", round(info$conf.int[i, 3], 3), "-", round(info$conf.int[i, 4], 3), " P=", signif(info$coefficients[i, 5], 4) ))
}
anno.text <- paste0(anno.text, "\nKaplan-Meier P=", signif(survdiff(Surv(OS_month, OS_status) ~ LassoLevel, data = rj1.cohort)$pvalue, 4) )
anno.text <- str_replace_all(anno.text, "LassoLevel", "")
fit <- survfit(Surv(OS_month, OS_status) ~ LassoLevel, data = rj1.cohort)
p1 <- ggsurvplot(fit,
data = rj1.cohort,
xlab = 'Time (Months)',
pval = TRUE,
risk.table = TRUE,
risk.table.height = 0.28,
conf.int.alpha = 0.05,
conf.int = TRUE,
palette = color.bin.lasso,
axes.offset = TRUE,
break.time.by = 12, xlim = c(0, 48),
title= paste0("OS LassoLevel \n", anno.text))
#----------------------------------------------------------------------------------
# Step 2: DFS survival stratified by lasso level
#----------------------------------------------------------------------------------
info <- summary(coxph(Surv(DFS_month, DFS_status) ~ LassoLevel, data = rj1.cohort))
anno.text <- ""
for (i in 1:nrow(info$conf.int)) {
anno.text <- paste0(anno.text, "\n", paste0(rownames(info$conf.int)[i], " HR=", round(info$conf.int[i, 1], 3), " CI=", round(info$conf.int[i, 3], 3), "-", round(info$conf.int[i, 4], 3), " P=", signif(info$coefficients[i, 5], 4) ))
}
anno.text <- paste0(anno.text, "\nKaplan-Meier P=", signif(survdiff(Surv(DFS_month, DFS_status) ~ LassoLevel, data = rj1.cohort)$pvalue, 4) )
anno.text <- str_replace_all(anno.text, "LassoLevel", "")
fit <- survfit(Surv(DFS_month, DFS_status) ~ LassoLevel, data = rj1.cohort)
p2 <- ggsurvplot(fit,
data = rj1.cohort,
xlab = 'Time (Months)',
pval = TRUE,
risk.table = TRUE,
risk.table.height = 0.28,
conf.int.alpha = 0.05,
conf.int = TRUE,
palette = color.bin.lasso,
axes.offset = TRUE,
break.time.by = 12, xlim = c(0, 48),
title= paste0("DFS LassoLevel \n", anno.text))
## output
p <- arrange_ggsurvplots(list(p1, p2), ncol = 2, nrow = 1, print = FALSE)
ggsave(paste0("./results/Figure3/Figure3b.pdf"), p, width = 16, height = 10)

3.3 (c) Lasso proteins in module

## data
node.data <- readRDS("./data/proteomics/wgcna/2D-node.data.rds")
edge.data <- readRDS("./data/proteomics/wgcna/2D-edge.data.rds")
## lasso gene plot
color.module <- c("#ecb888","#af88bb","#a032cb","#efbed6","#fc496a","#b6d37f","#589336","#7fd68e","#52c465","#3372e0","#84d7f6","#5394c3","#6376b3","#7f6cd7","#c4ceff","#fc9d40","#5c95e0","#cd7560","#ff70e4", "#ff8738", "#ffcead","#1cbf8b", "#b76d38", "#1584ff", "#7f006d", "#ffd35f","#E66F73","#F57F20","#1DBB95","#9CB79F","#F0B8D2","#A0485E","#A0688E","#C7E1DF","#51B1DF","#6D97D7","#5D6193","#CEC3E0","#A9917E","#7C7D80","#F4E192","#ADD666")
gg <- ggplot()
gg <- gg + geom_segment(mapping = aes(x = from.x, y = from.y, xend = to.x, yend = to.y), color = "#CCCCCC", size = 0.01, data = edge.data)
gg <- gg + geom_point(mapping = aes(x = pos.x, y = pos.y, color = Module), size = 1, data = node.data)
gg <- gg + geom_point(mapping = aes(x = pos.x, y = pos.y, color = Module), fill = "red", size = 5, shape = 21, data = node.data[which(node.data$ISlassoGene == "yes"), ])
gg <- gg + geom_text(mapping = aes(x = pos.x, y = pos.y, label = Gene), size = 5, color = "black", data = node.data[which(node.data$ISlassoGene == "yes"), ])
gg <- gg + scale_size(range = c(0, 6) * 2)
gg <- gg + theme_void()
gg <- gg + labs(x = "", y = "", title = paste0("color by ISlassoGene"))
gg <- gg + scale_colour_manual(values = color.module)
gg <- gg + theme(legend.position = "none")
ggsave(paste0("./results/Figure3/3c-Lasso_gene_in_module.png"), gg, width = 8, height = 8.2)

3.4 (d) LassoScore~ModuleScore

library(ggcorrplot)
#----------------------------------------------------------------------------------
# Step 1: Load the Data
#----------------------------------------------------------------------------------
# (1) Sample information and clinical characteristics of the 191 PDAC patients (the RJ-cohort 1)
rj1.cohort <- read.xlsx("./data/Extended Data Table 2.xlsx", startRow = 2)
# ID Proteomic_ID RNAseq_ID Age Gender BMI DM Smoking
# 1 RJ-01-0020 RJ-01-0020-T_PRO RJ-01-0020-T_RNA 61 Female 19.63000 1 0
# 2 RJ-01-0038 RJ-01-0038-T_PRO RJ-01-0038-T_RNA 73 Male 22.46003 0 0
# 3 RJ-01-0050 RJ-01-0050-T_PRO RJ-01-0050-T_RNA 69 Female 17.63085 0 0
# 4 RJ-01-0069 RJ-01-0069-T_PRO RJ-01-0069-T_RNA 69 Female 17.48179 1 0
# 5 RJ-01-0070 RJ-01-0070-T_PRO RJ-01-0070-T_RNA 65 Male 23.43750 0 0
# 6 RJ-01-0074 RJ-01-0074-T_PRO RJ-01-0074-T_RNA 61 Male 23.66144 0 0

# (2) Scores of the 32 modules in TACs and PDACs of RJ-cohort 1.
module.ssgsea.pro.all <- read.xlsx("./data/Extended Data Table 4.xlsx", sheet = 2, startRow = 2, rowNames = T)
module.ssgsea.pro.all <- t(module.ssgsea.pro.all)
# RJ-01-0143-N_PRO RJ-01-0768-N_PRO RJ-01-0697-N_PRO RJ-01-0609-N_PRO RJ-01-1055-N_PRO
# ME01 0.34742611 0.253253916 0.28390477 0.32067554 0.325797851
# ME02 -0.11467744 0.009051615 -0.03296699 -0.08755935 -0.096947986
# ME03 -0.04291552 -0.035855079 -0.02943711 -0.05749618 0.016639009
# ME04 0.06708347 0.034863920 0.05803330 0.10310726 0.007498371
# ME05 0.19099018 0.100153070 0.14246697 0.16675071 0.127932887
# ME06 0.03893163 0.240865770 0.14178269 0.07618613 0.038306853
#----------------------------------------------------------------------------------
# Step 2: Spearman correlation : lasso score ~ module score
#----------------------------------------------------------------------------------
plot.mat <- data.frame(t(module.ssgsea.pro.all[, rj1.cohort$Proteomic_ID]), LassoScore = rj1.cohort[, c("LassoScore")] )
corr <- cor(plot.mat, use = "complete.obs", method = "spearman")
p.mat <- cor_pmat(plot.mat, use = "complete.obs", method = "spearman")
p.adj <- p.mat
for (i in 1:ncol(p.adj)) {
p.adj[, i] <- p.adjust(p.adj[, i], method = "fdr")
}
write.xlsx(list(Corr = corr, p.mat = p.mat, p.adj = p.adj), "./results/Figure3/Figure3d-1.xlsx", overwrite = T, rowNames = T)
# plot
pdf(paste0("./results/Figure3/Figure3d-1.pdf"), width = 10, height = 10)
corrplot::corrplot(corr[, ], method = "ellipse",
col = colorRampPalette(c("#1B3361","#76C7FF","#FFFFFF","#FF987A","#6A0D28"))(200))
dev.off()
#----------------------------------------------------------------------------------
# Step 3: Spearman correlation : lasso score ~ module score in Cao et al cohort
#----------------------------------------------------------------------------------
module.ssgsea.pro.cell <- readRDS("./data/Figure2/cao.module.ssgsea.pro.rds")
cell.meta.data <- read.xlsx("./data/Figure2/cao.meta.data.xlsx")
cell.meta.data$LassoLevel <- factor(as.character(cell.meta.data$LassoLevel), levels = c("Low","High"))
plot.mat <- data.frame(t(module.ssgsea.pro.cell[, cell.meta.data$Proteomic_ID]), LassoScore = cell.meta.data[, c("LassoScore")] )
# Spearman correlation
corr <- cor(plot.mat, use = "complete.obs", method = "spearman")
p.mat <- cor_pmat(plot.mat, use = "complete.obs", method = "spearman")
p.adj <- p.mat
for (i in 1:ncol(p.adj)) {
p.adj[, i] <- p.adjust(p.adj[, i], method = "fdr")
}
write.xlsx(list(Corr = corr, p.mat = p.mat, p.adj = p.adj), "./results/Figure3/Figure3d-2.xlsx", overwrite = T, rowNames = T)
# plot
pdf(paste0("./results/Figure3/Figure3d-2.pdf"), width = 10, height = 10)
corrplot::corrplot(corr[, ], method = "ellipse",
col = colorRampPalette(c("#1B3361","#76C7FF","#FFFFFF","#FF987A","#6A0D28"))(200))
dev.off()

3.5 (e) Cao et al validation

## data
cell.meta.data <- read.xlsx("./data/Figure2/cao.meta.data.xlsx")
color.bin.lasso <- c("#00599F","#d80700")
cell.meta.data <- cell.meta.data %>% mutate(LassoLevel=factor(LassoLevel,level=c("Low","High")))
## coxph
info <- summary(coxph(Surv(OS_month, OS_status) ~ LassoLevel, data = cell.meta.data))
anno.text <- ""
for (i in 1:nrow(info$conf.int)) {
anno.text <- paste0(anno.text, "\n", paste0(rownames(info$conf.int)[i], " HR=", round(info$conf.int[i, 1], 3), " CI=", round(info$conf.int[i, 3], 3), "-", round(info$conf.int[i, 4], 3), " P=", signif(info$coefficients[i, 5], 4) ))
}
anno.text <- paste0(anno.text, "\nKaplan-Meier P=", signif(survdiff(Surv(OS_month, OS_status) ~ LassoLevel, data = cell.meta.data)$pvalue, 4) )
anno.text <- str_replace_all(anno.text, "LassoLevel", "")
fit <- survfit(Surv(OS_month, OS_status) ~ LassoLevel, data = cell.meta.data)
p1 <- ggsurvplot(fit,
data = cell.meta.data,
xlab = 'Time (Months)',
pval = TRUE,
risk.table = TRUE,
risk.table.height = 0.28,
conf.int.alpha = 0.05,
conf.int = TRUE,
palette = color.bin.lasso,
axes.offset = TRUE,
break.time.by = 12, xlim = c(0, 48),
title= paste0("OS LassoLevel \n", anno.text))
p <- arrange_ggsurvplots(list(p1), ncol = 1, nrow = 1, print = FALSE)
ggsave(paste0("./results/Figure3/Figure3e.pdf"), p, width = 8, height = 10)

3.6 (f) RJ-cohort 2 validation

#----------------------------------------------------------------------------------
# Step 1: Load the Data
#----------------------------------------------------------------------------------
# The LASSO score in RJ-cohort 2
rj2.cohort <- read.xlsx("./data/Extended Data Table 5.xlsx", sheet = 3, startRow = 2)
rj2.cohort <- rj2.cohort %>% mutate(LassoLevel=factor(as.character(LassoLevel), levels = c("Low","High")))
color.bin.lasso <- c("#00599F","#d80700")
#----------------------------------------------------------------------------------
# Step 2: OS survival stratified by lasso level
#----------------------------------------------------------------------------------
info <- summary(coxph(Surv(OS_month, OS_status) ~ LassoLevel, data = rj2.cohort))
anno.text <- ""
for (i in 1:nrow(info$conf.int)) {
anno.text <- paste0(anno.text, "\n", paste0(rownames(info$conf.int)[i], " HR=", round(info$conf.int[i, 1], 3), " CI=", round(info$conf.int[i, 3], 3), "-", round(info$conf.int[i, 4], 3), " P=", signif(info$coefficients[i, 5], 4) ))
}
anno.text <- paste0(anno.text, "\nKaplan-Meier P=", signif(survdiff(Surv(OS_month, OS_status) ~ LassoLevel, data = rj2.cohort)$pvalue, 4) )
anno.text <- str_replace_all(anno.text, "LassoLevel", "")
fit <- survfit(Surv(OS_month, OS_status) ~ LassoLevel, data = rj2.cohort)
p1 <- ggsurvplot(fit,
data = rj2.cohort,
xlab = 'Time (Months)',
pval = TRUE,
risk.table = TRUE,
risk.table.height = 0.28,
conf.int.alpha = 0.05,
conf.int = TRUE,
palette = color.bin.lasso,
axes.offset = TRUE,
break.time.by = 12, xlim = c(0, 48),
title= paste0("OS LassoLevel \n", anno.text))
#----------------------------------------------------------------------------------
# Step 3: DFS survival stratified by lasso level
#----------------------------------------------------------------------------------
info <- summary(coxph(Surv(DFS_month, DFS_status) ~ LassoLevel, data = rj2.cohort))
anno.text <- ""
for (i in 1:nrow(info$conf.int)) {
anno.text <- paste0(anno.text, "\n", paste0(rownames(info$conf.int)[i], " HR=", round(info$conf.int[i, 1], 3), " CI=", round(info$conf.int[i, 3], 3), "-", round(info$conf.int[i, 4], 3), " P=", signif(info$coefficients[i, 5], 4) ))
}
anno.text <- paste0(anno.text, "\nKaplan-Meier P=", signif(survdiff(Surv(DFS_month, DFS_status) ~ LassoLevel, data = rj2.cohort)$pvalue, 4) )
anno.text <- str_replace_all(anno.text, "LassoLevel", "")
fit <- survfit(Surv(DFS_month, DFS_status) ~ LassoLevel, data = rj2.cohort)
p2 <- ggsurvplot(fit,
data = rj2.cohort,
xlab = 'Time (Months)',
pval = TRUE,
risk.table = TRUE,
risk.table.height = 0.28,
conf.int.alpha = 0.05,
conf.int = TRUE,
palette = color.bin.lasso,
axes.offset = TRUE,
break.time.by = 12, xlim = c(0, 48),
title= paste0("DFS LassoLevelRNA \n", anno.text))
# output
p <- arrange_ggsurvplots(list(p1, p2), ncol = 2, nrow = 1, print = FALSE)
ggsave(paste0("./results/Figure3/Figure3f.pdf"), p, width = 16, height = 10)

4. Figure 4

4.1 (a) Chemotherapy biomarker

library(openxlsx)
library(tidyverse)
library(stringr)
library(survival)
library(survminer)
library(pbapply)
library(ggthemes)
dir.create("./results/Figure4")
#----------------------------------------------------------------------------------
# Step 1: Load the Data
#----------------------------------------------------------------------------------
deg <- readRDS("./data/proteomics/wgcna/2D-node.data.rds") # DEG & module genes
pro <- readRDS("./data/proteomics/20230412_PDAC_PRO_exp.rds")
cli <- readRDS("./data/clinical/20230412_PDAC_191_patients_clinical.rds")
#----------------------------------------------------------------------------------
# Step 2: Coxph data
#----------------------------------------------------------------------------------
# clinical data + DEG genes' expression data
deg <- deg %>% filter(proSig !="zz") %>% pull(Gene) # 1860 DEG genes
rownames(cli) <- cli$ID
id.inter <- intersect(rownames(cli),colnames(pro)) # 191 patients
pro <- pro[deg,id.inter] %>% t()
identical(rownames(cli),rownames(pro))
biomarker_data <- cbind(cli %>% dplyr::select(OS_Time,OS_Status,Censoring_chemo),pro)
# category attribute: chemotherapy treatment
biomarker_data <- biomarker_data %>% mutate(chem=factor(Censoring_chemo,levels = c("0","1")))
#----------------------------------------------------------------------------------
# Step 3: Coxph
#----------------------------------------------------------------------------------
# Coxph : OS ~ chemotherapy*gene
ls_re <- pbapply::pblapply(colnames(pro),FUN=function(g){
data <- biomarker_data %>% select(OS_Time,OS_Status,chem) %>%
mutate(Gene=biomarker_data%>%pull(g))
fit <- coxph( Surv(OS_Time,OS_Status) ~ Gene*chem ,data=data) ## interaction term
broom::tidy(fit) %>% mutate(HR=exp(estimate)) %>% select(p.value,HR) %>% t() %>% as.vector()
})
# result
res <- do.call(rbind,ls_re)
colnames(res) <- c("biomarker_p","biomarker_HR","chemotherapy_p",
"chemotherapy_HR","interaction_p","interaction_HR")
## predictive biomarker
res <- res %>% as_tibble() %>% mutate(Gene=colnames(pro)) %>%
arrange(interaction_p) %>%
dplyr::select(Gene,everything())
# cutoff
res[which(res$interaction_HR>1 & res$interaction_p <0.05),"group"] = "bad_biomarker"
res[which(res$interaction_HR<1 & res$interaction_p <0.05),"group"] = "good_biomarker"
table(res$group)
#----------------------------------------------------------------------------------
# Step 4: Visualization
#----------------------------------------------------------------------------------
color.tri <- c("#D01910","#00599F","#444444") # up ,down ,not
## volcano
res <- res %>% mutate(group=factor(group,levels = c("bad_biomarker","good_biomarker")),
Label=Gene,
Label=replace(Label,is.na(group),""))
res <- res %>% mutate(log2HR=log2(interaction_HR)) %>%
mutate(logP= -log10(interaction_p))
p <- ggscatter(res,
x = "log2HR", y = "logP",
color = "group", size = 1,
main = paste0("Predictive biomarker for chemotherapy"),
xlab = "log2(HR)", ylab = "-log10(Pvalue)",
palette = color.tri,
label = res$Label,font.label = 6, , repel = T)+
theme_base()+
geom_hline(yintercept = -log10(0.05), linetype="dashed", color = "#222222") +
theme(plot.background = element_blank())
ggsave("./results/Figure4/4A-Predictive-biomarker.pdf", p, width = 10, height = 8)
write.xlsx(as.data.frame(res),"./results/Figure4/4A-Predictive-biomarker.xlsx",colNames=T,rowNames=F,overwrite = T)

4.2 (b) Forest plots of multivariable Cox

library(forestmodel)
library(RColorBrewer)
#----------------------------------------------------------------------------------
# Step 1: Load the Data
#----------------------------------------------------------------------------------
# Sample information and clinical characteristics of the 191 PDAC patients (the RJ-cohort 1)
rj1 <- read.xlsx("./data/Extended Data Table 2.xlsx", startRow = 2)
rj1$`LassoLevel (High vs. Low)` <- ifelse(rj1$LassoLevel == 'High', 1, 0)
rj1$`Adjuvant_therapy (Yes vs. No)` <- ifelse(rj1$Censoring_chemo == 'Chem', 1, 0)
rj1$`Stage (III vs. I_II)` <- ifelse(rj1$Stage == 'III', 1, 0)
rj1$`NDUFB8 (High vs. Low)` <- ifelse(rj1$NDUFB8_l == 'High', 1, 0)
rj1$`CEMIP2 (High vs. Low)` <- ifelse(rj1$CEMIP2_l == 'High', 1, 0)
rj1$`Sex (Male vs. Female)` <- ifelse(rj1$Gender == 'Male', 1, 0)
rj1$`Margin (R1 vs. R0)` <- ifelse(rj1$Margin == 'R1', 1, 0)
rj1$`Smoking (Yes vs. No)` <- rj1$Smoking
rj1$`Drinking (Yes vs. No)` <- rj1$Drinking
#----------------------------------------------------------------------------------
# Step 2: plot for NDUFB8 * Adjuvant_therapy
#----------------------------------------------------------------------------------
# custom panels for column: Adjust the font style, thickness, and column width of variables in the graph
forest.panels <- list(
list(width = 0.01),
list(width = 0.1, display = ~variable,
fontface = "bold", heading = "Variable"),
list(width = 0.1, display = ~level),
list(width = 0.05, display = ~n, hjust = 1, heading = "N"),
list(width = 0.03, item = "vline", hjust = 0.5),
list(width = 0.55, item = "forest", hjust = 0.5,
heading = "Hazard ratio", linetype = "dashed",line_x = 0),
list(width = 0.03, item = "vline", hjust = 0.5),
list(width = 0.12, display = ~ ifelse(reference, "Reference", sprintf("%0.3f (%0.3f, %0.3f)",
trans(estimate), trans(conf.low), trans(conf.high))), display_na = NA),
list(width = 0.03,
display = ~ ifelse(reference, "", format.pval(p.value, digits = 1, eps = 0.001) ),
display_na = NA, hjust = 1, heading = "p"),
list(width = 0.03))

Interaction <- rj1[["NDUFB8"]] * rj1[['Adjuvant_therapy (Yes vs. No)']]

pdf('./results/Figure4/Figure4b-1.pdf', width = 10, height = 6)
print(forest_model(coxph(as.formula(paste0('Surv(OS_month, OS_status) ~', '`', "NDUFB8", '`', '+ `Adjuvant_therapy (Yes vs. No)` + Interaction + Age + `Sex (Male vs. Female)` + `Stage (III vs. I_II)` + `Margin (R1 vs. R0)` + CA19_9 + `Smoking (Yes vs. No)` + `Drinking (Yes vs. No)`')), data = rj1), panels = forest.panels, factor_separate_line = F))
dev.off()
#----------------------------------------------------------------------------------
# Step 3: plot for CEMIP2 * Adjuvant_therapy
#----------------------------------------------------------------------------------
Interaction <- rj1[["CEMIP2"]] * rj1[['Adjuvant_therapy (Yes vs. No)']]
pdf('./results/Figure4/Figure4b-2.pdf', width = 10, height = 6)
print(forest_model(coxph(as.formula(paste0('Surv(OS_month, OS_status) ~', '`', "CEMIP2", '`', '+ `Adjuvant_therapy (Yes vs. No)` + Interaction + Age + `Sex (Male vs. Female)` + `Stage (III vs. I_II)` + `Margin (R1 vs. R0)` + CA19_9 + `Smoking (Yes vs. No)` + `Drinking (Yes vs. No)`')), data = rj1), panels = forest.panels, factor_separate_line = F))
dev.off()

4.3 (cd) Kaplan-Meier curves of NDUFB8/CEMIP2

#----------------------------------------------------------------------------------
# Step 1: Load the data
#----------------------------------------------------------------------------------
# (1) Sample information and clinical characteristics of the 191 PDAC patients (the RJ-cohort 1)
rj1.cohort <- read.xlsx("./data/Extended Data Table 2.xlsx", startRow = 2)
# ID Proteomic_ID RNAseq_ID Age Gender BMI DM Smoking
# 1 RJ-01-0020 RJ-01-0020-T_PRO RJ-01-0020-T_RNA 61 Female 19.63000 1 0
# 2 RJ-01-0038 RJ-01-0038-T_PRO RJ-01-0038-T_RNA 73 Male 22.46003 0 0
# 3 RJ-01-0050 RJ-01-0050-T_PRO RJ-01-0050-T_RNA 69 Female 17.63085 0 0
# 4 RJ-01-0069 RJ-01-0069-T_PRO RJ-01-0069-T_RNA 69 Female 17.48179 1 0
# 5 RJ-01-0070 RJ-01-0070-T_PRO RJ-01-0070-T_RNA 65 Male 23.43750 0 0
# 6 RJ-01-0074 RJ-01-0074-T_PRO RJ-01-0074-T_RNA 61 Male 23.66144 0 0

plot.cohort <- rj1.cohort %>%
mutate(Censoring_chemo_rev = Censoring_chemo,
Censoring_chemo = factor(as.character(Censoring_chemo), levels = c("Chem","NoChem")),
Censoring_chemo_rev = factor(as.character(Censoring_chemo_rev), levels = c("NoChem","Chem"))) %>%
mutate(NDUFB8_l = factor(as.character(NDUFB8_l), levels = c("Low","High")),
CEMIP2_l = factor(as.character(CEMIP2_l), levels = c("Low","High"))) %>%
mutate(GemType_rev = GemType,
GemType = factor(as.character(GemType), levels = c("Gem","NonGem","NoChem")),
GemType_rev = factor(as.character(GemType_rev), levels = c("NoChem","Gem","NonGem")))

summary(coxph(Surv(OS_month, OS_status) ~ Censoring_chemo_rev * NDUFB8, data = plot.cohort))
summary(coxph(Surv(OS_month, OS_status) ~ Censoring_chemo_rev * CEMIP2, data = plot.cohort))
#----------------------------------------------------------------------------------
# Step 2: OS survival stratified by lasso level in [NDUFB8 low] patients
#----------------------------------------------------------------------------------
plot.data <- plot.cohort[which(plot.cohort$NDUFB8_l == "Low"), ]
info <- summary(coxph(Surv(OS_month, OS_status) ~ Censoring_chemo_rev, data = plot.data))
anno.text <- ""
for (i in1:nrow(info$conf.int)) {
anno.text <- paste0(anno.text, "\n", paste0(rownames(info$conf.int)[i], " HR=", round(info$conf.int[i, 1], 3), " CI=", round(info$conf.int[i, 3], 3), "-", round(info$conf.int[i, 4], 3), " P=", signif(info$coefficients[i, 5], 4) ))
}
anno.text <- paste0(anno.text, "\nKaplan-Meier P=", signif(survdiff(Surv(OS_month, OS_status) ~ Censoring_chemo_rev, data = plot.data)$pvalue, 4) )
anno.text <- str_replace_all(anno.text, "Censoring_chemo_rev", "")
fit <- survfit(Surv(OS_month, OS_status) ~ Censoring_chemo, data = plot.data)
p1 <- ggsurvplot(fit,
data = plot.data,
xlab = 'Time (Months)',
pval = TRUE,
risk.table = TRUE,
risk.table.height = 0.28,
legend.labs = c("Chem","NoChem"),
conf.int = TRUE, conf.int.alpha = 0.05,
palette = c("#00599F","#d80700"),
axes.offset = TRUE,
break.time.by = 12, xlim = c(0, 48),
title= paste0("OS NDUFB8_low Censoring_chemo \n", anno.text))
#----------------------------------------------------------------------------------
# Step 3: OS survival stratified by lasso level in [NDUFB8 High] patients
#----------------------------------------------------------------------------------
plot.data <- plot.cohort[which(plot.cohort$NDUFB8_l == "High"), ]
info <- summary(coxph(Surv(OS_month, OS_status) ~ Censoring_chemo_rev, data = plot.data))
anno.text <- ""
for (i in1:nrow(info$conf.int)) {
anno.text <- paste0(anno.text, "\n", paste0(rownames(info$conf.int)[i], " HR=", round(info$conf.int[i, 1], 3), " CI=", round(info$conf.int[i, 3], 3), "-", round(info$conf.int[i, 4], 3), " P=", signif(info$coefficients[i, 5], 4) ))
}
anno.text <- paste0(anno.text, "\nKaplan-Meier P=", signif(survdiff(Surv(OS_month, OS_status) ~ Censoring_chemo_rev, data = plot.data)$pvalue, 4) )
anno.text <- str_replace_all(anno.text, "Censoring_chemo_rev", "")
fit <- survfit(Surv(OS_month, OS_status) ~ Censoring_chemo, data = plot.data)
p2 <- ggsurvplot(fit,
data = plot.data,
xlab = 'Time (Months)',
pval = TRUE,
risk.table = TRUE,
risk.table.height = 0.28,
legend.labs = c("Chem","NoChem"),
conf.int = TRUE, conf.int.alpha = 0.05,
palette = c("#00599F","#d80700"),
axes.offset = TRUE,
break.time.by = 12, xlim = c(0, 48),
title= paste0("OS NDUFB8_high Censoring_chemo \n", anno.text))
#----------------------------------------------------------------------------------
# Step 4: OS survival stratified by lasso level in [CEMIP2 low] patients
#----------------------------------------------------------------------------------
plot.data <- plot.cohort[which(plot.cohort$CEMIP2_l == "Low"), ]
info <- summary(coxph(Surv(OS_month, OS_status) ~ Censoring_chemo_rev, data = plot.data))
anno.text <- ""
for (i in1:nrow(info$conf.int)) {
anno.text <- paste0(anno.text, "\n", paste0(rownames(info$conf.int)[i], " HR=", round(info$conf.int[i, 1], 3), " CI=", round(info$conf.int[i, 3], 3), "-", round(info$conf.int[i, 4], 3), " P=", signif(info$coefficients[i, 5], 4) ))
}
anno.text <- paste0(anno.text, "\nKaplan-Meier P=", signif(survdiff(Surv(OS_month, OS_status) ~ Censoring_chemo_rev, data = plot.data)$pvalue, 4) )
anno.text <- str_replace_all(anno.text, "Censoring_chemo_rev", "")
fit <- survfit(Surv(OS_month, OS_status) ~ Censoring_chemo, data = plot.data)
p3 <- ggsurvplot(fit,
data = plot.data,
xlab = 'Time (Months)',
pval = TRUE,
risk.table = TRUE,
risk.table.height = 0.28,
legend.labs = c("Chem","NoChem"),
conf.int = TRUE, conf.int.alpha = 0.05,
palette = c("#00599F","#d80700"),
axes.offset = TRUE,
break.time.by = 12, xlim = c(0, 48),
title= paste0("OS CEMIP2_low Censoring_chemo \n", anno.text))
#----------------------------------------------------------------------------------
# Step 5: OS survival stratified by lasso level in [CEMIP2 High] patients
#----------------------------------------------------------------------------------
plot.data <- plot.cohort[which(plot.cohort$CEMIP2_l == "High"), ]
info <- summary(coxph(Surv(OS_month, OS_status) ~ Censoring_chemo_rev, data = plot.data))
anno.text <- ""
for (i in1:nrow(info$conf.int)) {
anno.text <- paste0(anno.text, "\n", paste0(rownames(info$conf.int)[i], " HR=", round(info$conf.int[i, 1], 3), " CI=", round(info$conf.int[i, 3], 3), "-", round(info$conf.int[i, 4], 3), " P=", signif(info$coefficients[i, 5], 4) ))
}
anno.text <- paste0(anno.text, "\nKaplan-Meier P=", signif(survdiff(Surv(OS_month, OS_status) ~ Censoring_chemo_rev, data = plot.data)$pvalue, 4) )
anno.text <- str_replace_all(anno.text, "Censoring_chemo_rev", "")
fit <- survfit(Surv(OS_month, OS_status) ~ Censoring_chemo, data = plot.data)
p4 <- ggsurvplot(fit,
data = plot.data,
xlab = 'Time (Months)',
pval = TRUE,
risk.table = TRUE,
risk.table.height = 0.28,
legend.labs = c("Chem","NoChem"),
conf.int = TRUE, conf.int.alpha = 0.05,
palette = c("#00599F","#d80700"),
axes.offset = TRUE,
break.time.by = 12, xlim = c(0, 48),
title= paste0("OS CEMIP2_high Censoring_chemo \n", anno.text))
# output
p <- arrange_ggsurvplots(list(p1, p2, p3, p4), ncol = 4, nrow = 1, print = FALSE)
ggsave(paste0("./results/Figure4/Figure4cd.pdf"), p, width = 20, height = 8)

4.4 (e) GSEA stratified by NDUFB8

#----------------------------------------------------------------------------------
# Step 1: Load the data
#----------------------------------------------------------------------------------
# enrichment results
plot.data.1 <- read.xlsx("./data/Figure4/1.protein.deg.NH_NL.GSEA_1k.xlsx")
plot.data.2 <- read.xlsx("./data/Figure4/1.CFPAC.deg.NH_NL.GSEA_1k.xlsx")
plot.data.3 <- read.xlsx("./data/Figure4/1.8988.deg.NH_NL.GSEA_1k.xlsx")
# ID Description setSize enrichmentScore
# 1 KEGG_RIBOSOME KEGG_RIBOSOME 87 0.6233584
# 2 KEGG_SPLICEOSOME KEGG_SPLICEOSOME 126 0.4505875
# 3 HALLMARK_MYC_TARGETS_V1 HALLMARK_MYC_TARGETS_V1 199 0.3968739
# 4 HALLMARK_TNFA_SIGNALING_VIA_NFKB HALLMARK_TNFA_SIGNALING_VIA_NFKB 199 -0.3993534
# 5 HALLMARK_INFLAMMATORY_RESPONSE HALLMARK_INFLAMMATORY_RESPONSE 200 -0.3683088
# 6 KEGG_HEMATOPOIETIC_CELL_LINEAGE KEGG_HEMATOPOIETIC_CELL_LINEAGE 85 -0.3989387
plot.data.1$Omics <- "Protein"
plot.data.2$Omics <- "CFPAC"
plot.data.3$Omics <- "8988"
plot.data <- rbind(plot.data.1, plot.data.2, plot.data.3) %>%
mutate(SignP= -log10(pvalue))
plot.data$SignP[which(plot.data$NES < 0)] <- plot.data$SignP[which(plot.data$NES < 0)] * -1
plot.pathway <- c("KEGG_RIBOSOME","KEGG_OXIDATIVE_PHOSPHORYLATION")
plot.data.sub <- plot.data[plot.data$ID %in% plot.pathway, ]
plot.data.sub$Omics <- factor(as.character(plot.data.sub$Omics), levels = c("Protein","CFPAC","8988"))
#----------------------------------------------------------------------------------
# Step 2: plot
#----------------------------------------------------------------------------------
p <- ggbarplot(plot.data.sub, x = "ID", y = "SignP",
color = "#FFC25F", fill = "#FFC25F",
position = position_dodge(0.8),
order = rev(plot.pathway),
main = "GSEA NDUFB8 high vs low")
p <- p + geom_hline(yintercept = c(1.30), linetype="dashed", color = "#222222")
p <- p + facet_wrap(~ Omics, ncol = 3)
p <- p + scale_y_continuous(limits = c(0, 3.5), expand = c(0,0))
p <- p + coord_flip() + theme_base()
ggsave(paste0("./results/Figure4/Figure4e.pdf"), p, width = 13, height = 3)

4.5 (f) GSEA stratified by CEMIP2

#----------------------------------------------------------------------------------
# Step 1: Load the data
#----------------------------------------------------------------------------------
# enrichment results
plot.data.1 <- read.xlsx("./data/Figure4/2.protein.deg.CH_CL.GSEA_1k.xlsx")
plot.data.2 <- read.xlsx("./data/Figure4/2.CFPAC.deg.CH_CL.GSEA_1k.xlsx")
plot.data.3 <- read.xlsx("./data/Figure4/2.8988.deg.CH_CL.GSEA_1k.xlsx")
# ID Description setSize
# 1 HALLMARK_TNFA_SIGNALING_VIA_NFKB HALLMARK_TNFA_SIGNALING_VIA_NFKB 199
# 2 KEGG_RIBOSOME KEGG_RIBOSOME 87
# 3 HALLMARK_HYPOXIA HALLMARK_HYPOXIA 200
# 4 HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION 197
# 5 HALLMARK_COAGULATION HALLMARK_COAGULATION 138
# 6 HALLMARK_KRAS_SIGNALING_UP HALLMARK_KRAS_SIGNALING_UP 200
plot.data.1$Omics <- "Protein"
plot.data.2$Omics <- "CFPAC"
plot.data.3$Omics <- "8988"
plot.data <- rbind(plot.data.1, plot.data.2, plot.data.3)
plot.data$SignP <- -log10(plot.data$pvalue)
plot.data$SignP[which(plot.data$NES < 0)] <- plot.data$SignP[which(plot.data$NES < 0)] * -1
plot.pathway <- c("NABA_ECM_GLYCOPROTEINS", "HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION")
plot.data.sub <- plot.data[plot.data$ID %in% plot.pathway, ]
plot.data.sub$Omics <- factor(as.character(plot.data.sub$Omics), levels = c("Protein","CFPAC","8988"))
#----------------------------------------------------------------------------------
# Step 2: plot
#----------------------------------------------------------------------------------
p <- ggbarplot(plot.data.sub, x = "ID", y = "SignP",
color = "#FFC2C1", fill = "#FFC2C1",
position = position_dodge(0.8),
order = rev(plot.pathway),
main = "GSEA CEMIP2 high vs low")
p <- p + geom_hline(yintercept = c(1.30), linetype="dashed", color = "#222222")
p <- p + facet_wrap(~ Omics, ncol = 3)
p <- p + scale_y_continuous(limits = c(0, 3.5), expand = c(0,0))
p <- p + coord_flip() + theme_base()
ggsave(paste0("./results/Figure4/Figure4f.pdf"), p, width = 13, height = 3)

5 Figure 5

5.1 (cd) Kaplan-Meier curves in RJ-cohort 3

dir.create("./results/Figure5")
#----------------------------------------------------------------------------------
# Step 1: Load the Data
#----------------------------------------------------------------------------------
# Sample information, clinical features and protein levels of markers in RJ-cohort 3
rj3.cohort <- read.xlsx("./data/Extended Data Table 7.xlsx", startRow = 2)

plot.cohort <- rj3.cohort %>%
mutate(Censoring_chemo_rev = Censoring_chemo,
Censoring_chemo = factor(as.character(Censoring_chemo), levels = c("Chem","NoChem")),
Censoring_chemo_rev = factor(as.character(Censoring_chemo_rev), levels = c("NoChem","Chem"))) %>%
mutate(NDUFB8_l = factor(as.character(NDUFB8_l), levels = c("Low","High")),
CEMIP2_l = factor(as.character(CEMIP2_l), levels = c("Low","High"))) %>%
mutate(GemType_rev = GemType,
GemType = factor(as.character(GemType), levels = c("Gem","NonGem","NoChem")),
GemType_rev = factor(as.character(GemType_rev), levels = c("NoChem","Gem","NonGem")))
summary(coxph(Surv(OS_month, OS_status) ~ Censoring_chemo_rev * NDUFB8_l, data = plot.cohort))
summary(coxph(Surv(OS_month, OS_status) ~ Censoring_chemo_rev * CEMIP2_l, data = plot.cohort))
idx <- which(plot.cohort$OS_month > 60)
plot.cohort$OS_month[idx] <- 60
plot.cohort$OS_status[idx] <- 0
#----------------------------------------------------------------------------------
# Step 2: OS survival stratified by lasso level in [NDUFB8 low] patients
#----------------------------------------------------------------------------------
plot.data <- plot.cohort[which(plot.cohort$NDUFB8_l == "Low"), ]
info <- summary(coxph(Surv(OS_month, OS_status) ~ Censoring_chemo_rev, data = plot.data))
anno.text <- ""
for (i in1:nrow(info$conf.int)) {
anno.text <- paste0(anno.text, "\n", paste0(rownames(info$conf.int)[i], " HR=", round(info$conf.int[i, 1], 3), " CI=", round(info$conf.int[i, 3], 3), "-", round(info$conf.int[i, 4], 3), " P=", signif(info$coefficients[i, 5], 4) ))
}
anno.text <- paste0(anno.text, "\nKaplan-Meier P=", signif(survdiff(Surv(OS_month, OS_status) ~ Censoring_chemo_rev, data = plot.data)$pvalue, 4) )
anno.text <- str_replace_all(anno.text, "Censoring_chemo_rev", "")
fit <- survfit(Surv(OS_month, OS_status) ~ Censoring_chemo, data = plot.data)
p1 <- ggsurvplot(fit,
data = plot.data,
xlab = 'Time (Months)',
pval = TRUE,
risk.table = TRUE,
risk.table.height = 0.28,
legend.labs = c("Chem","NoChem"),
conf.int = TRUE, conf.int.alpha = 0.05,
palette = c("#00599F","#d80700"),
axes.offset = TRUE,
break.time.by = 12, xlim = c(0, 60),
title= paste0("OS NDUFB8_low Censoring_chemo \n", anno.text))
#----------------------------------------------------------------------------------
# Step 3: OS survival stratified by lasso level in [NDUFB8 High] patients
#----------------------------------------------------------------------------------
plot.data <- plot.cohort[which(plot.cohort$NDUFB8_l == "High"), ]
info <- summary(coxph(Surv(OS_month, OS_status) ~ Censoring_chemo_rev, data = plot.data))
anno.text <- ""
for (i in1:nrow(info$conf.int)) {
anno.text <- paste0(anno.text, "\n", paste0(rownames(info$conf.int)[i], " HR=", round(info$conf.int[i, 1], 3), " CI=", round(info$conf.int[i, 3], 3), "-", round(info$conf.int[i, 4], 3), " P=", signif(info$coefficients[i, 5], 4) ))
}
anno.text <- paste0(anno.text, "\nKaplan-Meier P=", signif(survdiff(Surv(OS_month, OS_status) ~ Censoring_chemo_rev, data = plot.data)$pvalue, 4) )
anno.text <- str_replace_all(anno.text, "Censoring_chemo_rev", "")
fit <- survfit(Surv(OS_month, OS_status) ~ Censoring_chemo, data = plot.data)
p2 <- ggsurvplot(fit,
data = plot.data,
xlab = 'Time (Months)',
pval = TRUE,
risk.table = TRUE,
risk.table.height = 0.28,
legend.labs = c("Chem","NoChem"),
conf.int = TRUE, conf.int.alpha = 0.05,
palette = c("#00599F","#d80700"),
axes.offset = TRUE,
break.time.by = 12, xlim = c(0, 60),
title= paste0("OS NDUFB8_high Censoring_chemo \n", anno.text))
#----------------------------------------------------------------------------------
# Step 4: OS survival stratified by lasso level in [CEMIP2 low] patients
#----------------------------------------------------------------------------------
plot.data <- plot.cohort[which(plot.cohort$CEMIP2_l == "Low"), ]
info <- summary(coxph(Surv(OS_month, OS_status) ~ Censoring_chemo_rev, data = plot.data))
anno.text <- ""
for (i in1:nrow(info$conf.int)) {
anno.text <- paste0(anno.text, "\n", paste0(rownames(info$conf.int)[i], " HR=", round(info$conf.int[i, 1], 3), " CI=", round(info$conf.int[i, 3], 3), "-", round(info$conf.int[i, 4], 3), " P=", signif(info$coefficients[i, 5], 4) ))
}
anno.text <- paste0(anno.text, "\nKaplan-Meier P=", signif(survdiff(Surv(OS_month, OS_status) ~ Censoring_chemo_rev, data = plot.data)$pvalue, 4) )
anno.text <- str_replace_all(anno.text, "Censoring_chemo_rev", "")
fit <- survfit(Surv(OS_month, OS_status) ~ Censoring_chemo, data = plot.data)
p3 <- ggsurvplot(fit,
data = plot.data,
xlab = 'Time (Months)',
pval = TRUE,
risk.table = TRUE,
risk.table.height = 0.28,
legend.labs = c("Chem","NoChem"),
conf.int = TRUE, conf.int.alpha = 0.05,
palette = c("#00599F","#d80700"),
axes.offset = TRUE,
break.time.by = 12, xlim = c(0, 60),
title= paste0("OS CEMIP2_low Censoring_chemo \n", anno.text))
#----------------------------------------------------------------------------------
# Step 5: OS survival stratified by lasso level in [CEMIP2 High] patients
#----------------------------------------------------------------------------------
plot.data <- plot.cohort[which(plot.cohort$CEMIP2_l == "High"), ]
info <- summary(coxph(Surv(OS_month, OS_status) ~ Censoring_chemo_rev, data = plot.data))
anno.text <- ""
for (i in1:nrow(info$conf.int)) {
anno.text <- paste0(anno.text, "\n", paste0(rownames(info$conf.int)[i], " HR=", round(info$conf.int[i, 1], 3), " CI=", round(info$conf.int[i, 3], 3), "-", round(info$conf.int[i, 4], 3), " P=", signif(info$coefficients[i, 5], 4) ))
}
anno.text <- paste0(anno.text, "\nKaplan-Meier P=", signif(survdiff(Surv(OS_month, OS_status) ~ Censoring_chemo_rev, data = plot.data)$pvalue, 4) )
anno.text <- str_replace_all(anno.text, "Censoring_chemo_rev", "")
fit <- survfit(Surv(OS_month, OS_status) ~ Censoring_chemo, data = plot.data)
p4 <- ggsurvplot(fit,
data = plot.data,
xlab = 'Time (Months)',
pval = TRUE,
risk.table = TRUE,
risk.table.height = 0.28,
legend.labs = c("Chem","NoChem"),
conf.int = TRUE, conf.int.alpha = 0.05,
palette = c("#00599F","#d80700"),
axes.offset = TRUE,
break.time.by = 12, xlim = c(0, 60),
title= paste0("OS CEMIP2_high Censoring_chemo \n", anno.text))
# output
p <- arrange_ggsurvplots(list(p1, p2, p3, p4), ncol = 4, nrow = 1, print = FALSE)
ggsave(paste0("./results/Figure5/Figure5cd.pdf"), p, width = 20, height = 8)

5.2 (ef) Kaplan-Meier curves in SH-cohort

#----------------------------------------------------------------------------------
# Step 1: Load the Data
#----------------------------------------------------------------------------------
# Sample information, clinical features and protein levels of markers in SH-cohort.
sh.cohort <- read.xlsx("./data/Extended Data Table 8.xlsx", startRow = 2)

plot.cohort <- sh.cohort %>%
mutate(Censoring_chemo_rev = Censoring_chemo,
Censoring_chemo = factor(as.character(Censoring_chemo), levels = c("Chem","NoChem")),
Censoring_chemo_rev = factor(as.character(Censoring_chemo_rev), levels = c("NoChem","Chem"))) %>%
mutate(NDUFB8_l = factor(as.character(NDUFB8_l), levels = c("Low","High")),
CEMIP2_l = factor(as.character(CEMIP2_l), levels = c("Low","High")))

summary(coxph(Surv(OS_month, OS_status) ~ Censoring_chemo_rev * NDUFB8_l, data = plot.cohort))
summary(coxph(Surv(OS_month, OS_status) ~ Censoring_chemo_rev * CEMIP2_l, data = plot.cohort))

idx <- which(plot.cohort$OS_month > 60)
plot.cohort$OS_month[idx] <- 60
plot.cohort$OS_status[idx] <- 0
#----------------------------------------------------------------------------------
# Step 2: OS survival stratified by lasso level in [NDUFB8 low] patients
#----------------------------------------------------------------------------------
plot.data <- plot.cohort[which(plot.cohort$NDUFB8_l == "Low"), ]
info <- summary(coxph(Surv(OS_month, OS_status) ~ Censoring_chemo_rev, data = plot.data))
anno.text <- ""
for (i in1:nrow(info$conf.int)) {
anno.text <- paste0(anno.text, "\n", paste0(rownames(info$conf.int)[i], " HR=", round(info$conf.int[i, 1], 3), " CI=", round(info$conf.int[i, 3], 3), "-", round(info$conf.int[i, 4], 3), " P=", signif(info$coefficients[i, 5], 4) ))
}
anno.text <- paste0(anno.text, "\nKaplan-Meier P=", signif(survdiff(Surv(OS_month, OS_status) ~ Censoring_chemo_rev, data = plot.data)$pvalue, 4) )
anno.text <- str_replace_all(anno.text, "Censoring_chemo_rev", "")
fit <- survfit(Surv(OS_month, OS_status) ~ Censoring_chemo, data = plot.data)
p1 <- ggsurvplot(fit,
data = plot.data,
xlab = 'Time (Months)',
pval = TRUE,
risk.table = TRUE,
risk.table.height = 0.28,
legend.labs = c("Chem","NoChem"),
conf.int = TRUE, conf.int.alpha = 0.05,
palette = c("#00599F","#d80700"),
axes.offset = TRUE,
break.time.by = 12, xlim = c(0, 60),
title= paste0("OS NDUFB8_low Censoring_chemo \n", anno.text))
#----------------------------------------------------------------------------------
# Step 3: OS survival stratified by lasso level in [NDUFB8 High] patients
#----------------------------------------------------------------------------------
plot.data <- plot.cohort[which(plot.cohort$NDUFB8_l == "High"), ]
info <- summary(coxph(Surv(OS_month, OS_status) ~ Censoring_chemo_rev, data = plot.data))
anno.text <- ""
for (i in1:nrow(info$conf.int)) {
anno.text <- paste0(anno.text, "\n", paste0(rownames(info$conf.int)[i], " HR=", round(info$conf.int[i, 1], 3), " CI=", round(info$conf.int[i, 3], 3), "-", round(info$conf.int[i, 4], 3), " P=", signif(info$coefficients[i, 5], 4) ))
}
anno.text <- paste0(anno.text, "\nKaplan-Meier P=", signif(survdiff(Surv(OS_month, OS_status) ~ Censoring_chemo_rev, data = plot.data)$pvalue, 4) )
anno.text <- str_replace_all(anno.text, "Censoring_chemo_rev", "")
fit <- survfit(Surv(OS_month, OS_status) ~ Censoring_chemo, data = plot.data)
p2 <- ggsurvplot(fit,
data = plot.data,
xlab = 'Time (Months)',
pval = TRUE,
risk.table = TRUE,
risk.table.height = 0.28,
legend.labs = c("Chem","NoChem"),
conf.int = TRUE, conf.int.alpha = 0.05,
palette = c("#00599F","#d80700"),
axes.offset = TRUE,
break.time.by = 12, xlim = c(0, 60),
title= paste0("OS NDUFB8_high Censoring_chemo \n", anno.text))
#----------------------------------------------------------------------------------
# Step 4: OS survival stratified by lasso level in [CEMIP2 low] patients
#----------------------------------------------------------------------------------
plot.data <- plot.cohort[which(plot.cohort$CEMIP2_l == "Low"), ]
info <- summary(coxph(Surv(OS_month, OS_status) ~ Censoring_chemo_rev, data = plot.data))
anno.text <- ""
for (i in1:nrow(info$conf.int)) {
anno.text <- paste0(anno.text, "\n", paste0(rownames(info$conf.int)[i], " HR=", round(info$conf.int[i, 1], 3), " CI=", round(info$conf.int[i, 3], 3), "-", round(info$conf.int[i, 4], 3), " P=", signif(info$coefficients[i, 5], 4) ))
}
anno.text <- paste0(anno.text, "\nKaplan-Meier P=", signif(survdiff(Surv(OS_month, OS_status) ~ Censoring_chemo_rev, data = plot.data)$pvalue, 4) )
anno.text <- str_replace_all(anno.text, "Censoring_chemo_rev", "")
fit <- survfit(Surv(OS_month, OS_status) ~ Censoring_chemo, data = plot.data)
p3 <- ggsurvplot(fit,
data = plot.data,
xlab = 'Time (Months)',
pval = TRUE,
risk.table = TRUE,
risk.table.height = 0.28,
legend.labs = c("Chem","NoChem"),
conf.int = TRUE, conf.int.alpha = 0.05,
palette = c("#00599F","#d80700"),
axes.offset = TRUE,
break.time.by = 12, xlim = c(0, 60),
title= paste0("OS CEMIP2_low Censoring_chemo \n", anno.text))
#----------------------------------------------------------------------------------
# Step 5: OS survival stratified by lasso level in [CEMIP2 High] patients
#----------------------------------------------------------------------------------
plot.data <- plot.cohort[which(plot.cohort$CEMIP2_l == "High"), ]
info <- summary(coxph(Surv(OS_month, OS_status) ~ Censoring_chemo_rev, data = plot.data))
anno.text <- ""
for (i in1:nrow(info$conf.int)) {
anno.text <- paste0(anno.text, "\n", paste0(rownames(info$conf.int)[i], " HR=", round(info$conf.int[i, 1], 3), " CI=", round(info$conf.int[i, 3], 3), "-", round(info$conf.int[i, 4], 3), " P=", signif(info$coefficients[i, 5], 4) ))
}
anno.text <- paste0(anno.text, "\nKaplan-Meier P=", signif(survdiff(Surv(OS_month, OS_status) ~ Censoring_chemo_rev, data = plot.data)$pvalue, 4) )
anno.text <- str_replace_all(anno.text, "Censoring_chemo_rev", "")
fit <- survfit(Surv(OS_month, OS_status) ~ Censoring_chemo, data = plot.data)
p4 <- ggsurvplot(fit,
data = plot.data,
xlab = 'Time (Months)',
pval = TRUE,
risk.table = TRUE,
risk.table.height = 0.28,
legend.labs = c("Chem","NoChem"),
conf.int = TRUE, conf.int.alpha = 0.05,
palette = c("#00599F","#d80700"),
axes.offset = TRUE,
break.time.by = 12, xlim = c(0, 60),
title= paste0("OS CEMIP2_high Censoring_chemo \n", anno.text))
# output
p <- arrange_ggsurvplots(list(p1, p2, p3, p4), ncol = 4, nrow = 1, print = FALSE)
ggsave(paste0("./results/Figure5/Figure5ef.pdf"), p, width = 20, height = 8)

5.3 (gh) Kaplan-Meier curves in FR-cohort

#----------------------------------------------------------------------------------
# Step 1: Load the Data
#----------------------------------------------------------------------------------
# Sample information, clinical features and protein levels of markers in FR-cohort.
fr.cohort <- read.xlsx("./data/Extended Data Table 9.xlsx", startRow = 2)

plot.cohort <- fr.cohort %>%
mutate(Censoring_chemo_rev = Censoring_chemo,
Censoring_chemo = factor(as.character(Censoring_chemo), levels = c("Chem","NoChem")),
Censoring_chemo_rev = factor(as.character(Censoring_chemo_rev), levels = c("NoChem","Chem"))) %>%
mutate(NDUFB8_l = factor(as.character(NDUFB8_l), levels = c("Low","High")),
CEMIP2_l = factor(as.character(CEMIP2_l), levels = c("Low","High")))

summary(coxph(Surv(OS_month, OS_status) ~ Censoring_chemo_rev * NDUFB8_l, data = plot.cohort))
summary(coxph(Surv(OS_month, OS_status) ~ Censoring_chemo_rev * CEMIP2_l, data = plot.cohort))

idx <- which(plot.cohort$OS_month > 60)
plot.cohort$OS_month[idx] <- 60
plot.cohort$OS_status[idx] <- 0
#----------------------------------------------------------------------------------
# Step 2: OS survival stratified by lasso level in [NDUFB8 low] patients
#----------------------------------------------------------------------------------
plot.data <- plot.cohort[which(plot.cohort$NDUFB8_l == "Low"), ]
info <- summary(coxph(Surv(OS_month, OS_status) ~ Censoring_chemo_rev, data = plot.data))
anno.text <- ""
for (i in1:nrow(info$conf.int)) {
anno.text <- paste0(anno.text, "\n", paste0(rownames(info$conf.int)[i], " HR=", round(info$conf.int[i, 1], 3), " CI=", round(info$conf.int[i, 3], 3), "-", round(info$conf.int[i, 4], 3), " P=", signif(info$coefficients[i, 5], 4) ))
}
anno.text <- paste0(anno.text, "\nKaplan-Meier P=", signif(survdiff(Surv(OS_month, OS_status) ~ Censoring_chemo_rev, data = plot.data)$pvalue, 4) )
anno.text <- str_replace_all(anno.text, "Censoring_chemo_rev", "")
fit <- survfit(Surv(OS_month, OS_status) ~ Censoring_chemo, data = plot.data)
p1 <- ggsurvplot(fit,
data = plot.data,
xlab = 'Time (Months)',
pval = TRUE,
risk.table = TRUE,
risk.table.height = 0.28,
legend.labs = c("Chem","NoChem"),
conf.int = TRUE, conf.int.alpha = 0.05,
palette = c("#00599F","#d80700"),
axes.offset = TRUE,
break.time.by = 12, xlim = c(0, 60),
title= paste0("OS NDUFB8_low Censoring_chemo \n", anno.text))
#----------------------------------------------------------------------------------
# Step 3: OS survival stratified by lasso level in [NDUFB8 High] patients
#----------------------------------------------------------------------------------
plot.data <- plot.cohort[which(plot.cohort$NDUFB8_l == "High"), ]
info <- summary(coxph(Surv(OS_month, OS_status) ~ Censoring_chemo_rev, data = plot.data))
anno.text <- ""
for (i in1:nrow(info$conf.int)) {
anno.text <- paste0(anno.text, "\n", paste0(rownames(info$conf.int)[i], " HR=", round(info$conf.int[i, 1], 3), " CI=", round(info$conf.int[i, 3], 3), "-", round(info$conf.int[i, 4], 3), " P=", signif(info$coefficients[i, 5], 4) ))
}
anno.text <- paste0(anno.text, "\nKaplan-Meier P=", signif(survdiff(Surv(OS_month, OS_status) ~ Censoring_chemo_rev, data = plot.data)$pvalue, 4) )
anno.text <- str_replace_all(anno.text, "Censoring_chemo_rev", "")
fit <- survfit(Surv(OS_month, OS_status) ~ Censoring_chemo, data = plot.data)
p2 <- ggsurvplot(fit,
data = plot.data,
xlab = 'Time (Months)',
pval = TRUE,
risk.table = TRUE,
risk.table.height = 0.28,
legend.labs = c("Chem","NoChem"),
conf.int = TRUE, conf.int.alpha = 0.05,
palette = c("#00599F","#d80700"),
axes.offset = TRUE,
break.time.by = 12, xlim = c(0, 60),
title= paste0("OS NDUFB8_high Censoring_chemo \n", anno.text))
#----------------------------------------------------------------------------------
# Step 4: OS survival stratified by lasso level in [CEMIP2 low] patients
#----------------------------------------------------------------------------------
plot.data <- plot.cohort[which(plot.cohort$CEMIP2_l == "Low"), ]
info <- summary(coxph(Surv(OS_month, OS_status) ~ Censoring_chemo_rev, data = plot.data))
anno.text <- ""
for (i in1:nrow(info$conf.int)) {
anno.text <- paste0(anno.text, "\n", paste0(rownames(info$conf.int)[i], " HR=", round(info$conf.int[i, 1], 3), " CI=", round(info$conf.int[i, 3], 3), "-", round(info$conf.int[i, 4], 3), " P=", signif(info$coefficients[i, 5], 4) ))
}
anno.text <- paste0(anno.text, "\nKaplan-Meier P=", signif(survdiff(Surv(OS_month, OS_status) ~ Censoring_chemo_rev, data = plot.data)$pvalue, 4) )
anno.text <- str_replace_all(anno.text, "Censoring_chemo_rev", "")
fit <- survfit(Surv(OS_month, OS_status) ~ Censoring_chemo, data = plot.data)
p3 <- ggsurvplot(fit,
data = plot.data,
xlab = 'Time (Months)',
pval = TRUE,
risk.table = TRUE,
risk.table.height = 0.28,
legend.labs = c("Chem","NoChem"),
conf.int = TRUE, conf.int.alpha = 0.05,
palette = c("#00599F","#d80700"),
axes.offset = TRUE,
break.time.by = 12, xlim = c(0, 60),
title= paste0("OS CEMIP2_low Censoring_chemo \n", anno.text))
#----------------------------------------------------------------------------------
# Step 5: OS survival stratified by lasso level in [CEMIP2 High] patients
#----------------------------------------------------------------------------------
plot.data <- plot.cohort[which(plot.cohort$CEMIP2_l == "High"), ]
info <- summary(coxph(Surv(OS_month, OS_status) ~ Censoring_chemo_rev, data = plot.data))
anno.text <- ""
for (i in1:nrow(info$conf.int)) {
anno.text <- paste0(anno.text, "\n", paste0(rownames(info$conf.int)[i], " HR=", round(info$conf.int[i, 1], 3), " CI=", round(info$conf.int[i, 3], 3), "-", round(info$conf.int[i, 4], 3), " P=", signif(info$coefficients[i, 5], 4) ))
}
anno.text <- paste0(anno.text, "\nKaplan-Meier P=", signif(survdiff(Surv(OS_month, OS_status) ~ Censoring_chemo_rev, data = plot.data)$pvalue, 4) )
anno.text <- str_replace_all(anno.text, "Censoring_chemo_rev", "")
fit <- survfit(Surv(OS_month, OS_status) ~ Censoring_chemo, data = plot.data)
p4 <- ggsurvplot(fit,
data = plot.data,
xlab = 'Time (Months)',
pval = TRUE,
risk.table = TRUE,
risk.table.height = 0.28,
legend.labs = c("Chem","NoChem"),
conf.int = TRUE, conf.int.alpha = 0.05,
palette = c("#00599F","#d80700"),
axes.offset = TRUE,
break.time.by = 12, xlim = c(0, 60),
title= paste0("OS CEMIP2_high Censoring_chemo \n", anno.text))
# output
p <- arrange_ggsurvplots(list(p1, p2, p3, p4), ncol = 4, nrow = 1, print = FALSE)
ggsave(paste0("./results/Figure5/Figure5gh.pdf"), p, width = 20, height = 8)


本期教程参考:

Jiang, L., Qin, J., Dai, Y., Zhao, S., Zhan, Q., Cui, P., Ren, L., Wang, X., Zhang, R., Gao, C., Zhou, Y., Cai, S., Wang, G., Xie, W., Tang, X., Shi, M., Ma, F., Liu, J., Wang, T., Wang, C., … Shen, B. (2024). Prospective observational study on biomarkers of response in pancreatic ductal adenocarcinoma. Nature medicine30(3), 749–761. https://doi.org/10.1038/s41591-023-02790-x


往期部分文章

1. 最全WGCNA教程(替换数据即可出全部结果与图形)

推荐大家购买最新的教程,若是已经购买以前WGNCA教程的同学,可以在对应教程留言,即可获得最新的教程。(注:此教程也仅基于自己理解,不仅局限于此,难免有不恰当地方,请结合自己需求,进行改动。)


2. 精美图形绘制教程

3. 转录组分析教程

4. 转录组下游分析

小杜的生信筆記 ,主要发表或收录生物信息学教程,以及基于R分析和可视化(包括数据分析,图形绘制等);分享感兴趣的文献和学习资料!!

小杜的生信筆記
小杜的生信筆記,生信小白,初来乍到请多指教。 主要学习分享,转录组数据分析,基于R语言数据分析和绘制图片等,以及相关文献的分享。
 最新文章