文末福利!与WGCNA、差异分析媲美的时间序列分析Mfuzz!

学术   2024-12-31 00:06   上海  

聚类是数据分析中的常用方法,可用于揭示隐藏模式(大型复杂数据集中的对象集群)。大多数聚类方法将一个对象分配给一个聚类。这种所谓的硬聚类方法适用于多种应用,所检测到的共表达基因簇表明共调节。然而,基因表达通常不是以简单的“开”-“关”方式调节的,而是受到许多微调的转录机制的严格调节,这也反映在微阵列产生的表达数据集中。这是一个常见的观察结果,许多基因显示出类似于几个簇模式的表达谱。

在理想情况下,微阵列分析的聚类方法应该能够以适当的方式处理这种复杂性。不仅应该区分基因与簇的主要表达模式的紧密程度,而且如果它们的表达模式相似,它们还应该能将基因分配到几个簇cluster。软聚类可以提供这些有利的能力。研究表明,软聚类应用于微阵列数据分析会导致①具有信息丰富结构的更充分聚类,②增加的抗噪性和③改进调节序列基序的识别。时间序列分析的Mfuzz包就是一类常用的软聚类算法。

Mfuzz包最初是为处理芯片数据时间序列分析而开发的一种软聚类算法包。绝大多数聚类算法都会对数据进行硬划分,即每个基因或蛋白都被准确地分配到一个聚类中。如果集群分离良好,硬集群是有利的。然而,这通常不是基因表达时间过程数据的情况,其中基因或蛋白簇经常重叠。并且,硬聚类算法通常对噪声高度敏感。为了克服硬聚类的局限性,软聚类可以避免从数据分析中排除生物学相关的基因/蛋白。

########--------GSE37178--------########
library(GEOquery) #芯片数据专用R包
## 载入需要的程序包:Biobase
## 载入需要的程序包:BiocGenerics
## 
## 载入程序包:'BiocGenerics'
## The following objects are masked from 'package:stats':
##
## IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
##
## anyDuplicated, aperm, append, as.data.frame, basename, cbind,
## colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
## get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
## match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
## Position, rank, rbind, Reduce, rownames, sapply, setdiff, table,
## tapply, union, unique, unsplit, which.max, which.min
## Welcome to Bioconductor
##
## Vignettes contain introductory material; view with
## 'browseVignettes()'. To cite Bioconductor, see
## 'citation("Biobase")', and for packages 'citation("pkgname")'.
## Setting options('download.file.method.GEOquery'='auto')
## Setting options('GEOquery.inmemory.gpl'=FALSE)
gset <- getGEO('GSE37178',        # 加载数据
destdir = '.', # 本地读取
getGPL = F, # 不获取探针信息
AnnotGPL = T) # 获取芯片平台
## Found 1 file(s)
## GSE37178_series_matrix.txt.gz
## Using locally cached version: ./GSE37178_series_matrix.txt.gz
## Warning in grep(pattern, bfr, value = TRUE): unable to translate '
## <c7><fd><b6><af><c6><f7> C <d6>еľ<ed>û<d3>б<ea>ǩ<a1><a3>' to a wide string
## Warning in grep(pattern, bfr, value = TRUE): input string 1 is invalid
## Warning in grep(pattern, bfr, value = TRUE): unable to translate '
## <be><ed><b5><c4><d0><f2><c1>к<c5><ca><c7> F2DA-F2F6' to a wide string
## Warning in grep(pattern, bfr, value = TRUE): input string 2 is invalid
## Warning in grep(pattern, bfr, value = TRUE): unable to translate '
## C:\Users\Administrator\AppData\Local\Temp <b5><c4>Ŀ¼' to a wide string
## Warning in grep(pattern, bfr, value = TRUE): input string 4 is invalid
## Warning in grep(pattern, bfr, value = TRUE): unable to translate '2024-08-29
## 22:15 191,910
## <c9><ea><bc>γ<bf>-<b1><a6><b1><b4><ce><d2><cf><eb><b6><d4><c4><e3>˵.docx' to a
## wide string
## Warning in grep(pattern, bfr, value = TRUE): input string 1637 is invalid
## Warning in grep(pattern, bfr, value = TRUE): unable to translate ' 1583
## <b8><f6><ce>ļ<fe> 230,208,024 <d7>ֽ<da>' to a wide string
## Warning in grep(pattern, bfr, value = TRUE): input string 1638 is invalid
## Warning in grep(pattern, bfr, value = TRUE): unable to translate ' 49
## <b8><f6>Ŀ¼ 799,337,431,040 <bf><c9><d3><c3><d7>ֽ<da>' to a wide string
gset <- gset[[1]]          # 转为对象
expr <- exprs(gset) # 表达矩阵
pdata <- pData(gset) # 样本信息
gset@annotation # 查看芯片平台
## [1] "GPL6947"
## 第二步,加载探针信息 #############################
### 不推荐的自带 R 包读取
probe <- read.table(file = 'GPL6947-13512.txt',
sep = '\t',
quote = '',
comment.char = '#', # 过滤掉文件中以‘#’开头的注释信息
header = T,
fill = T, # 如果文件中某行的数据少于其他行,则自动添加空白域。
stringsAsFactors = F) # 字符串不改为因子

ids <- probe[probe$Symbol != '', # 提取探针和geneID
c(1,14)]

## 第三步,清洗数据,筛选探针和表达数据 ########
library(dplyr)
## 
## 载入程序包:'dplyr'
## The following object is masked from 'package:Biobase':
##
## combine
## The following objects are masked from 'package:BiocGenerics':
##
## combine, intersect, setdiff, union
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyverse)
## Warning: 程序包'ggplot2'是用R版本4.4.2 来建造的
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0 ✔ readr 2.1.5
## ✔ ggplot2 3.5.1 ✔ stringr 1.5.1
## ✔ lubridate 1.9.3 ✔ tibble 3.2.1
## ✔ purrr 1.0.2 ✔ tidyr 1.3.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::combine() masks Biobase::combine(), BiocGenerics::combine()
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ✖ ggplot2::Position() masks BiocGenerics::Position(), base::Position()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
colnames(ids)
## [1] "ID"     "Symbol"
expr <- as.data.frame(expr)
colnames(expr) <- pdata$geo_accession
expr$ID <- rownames(expr)
exprSet <- inner_join(ids,expr,by = 'ID')

#### 用aggregate包,取最大值(强烈推荐)
exprset1 <- aggregate(.~Symbol,max,data = exprSet)

rownames(exprset1) <- exprset1$Symbol
exprset1 <- exprset1[,-c(1:2)]

## 批量转换为数值
exprset = as.data.frame(lapply(exprset1,as.numeric))
rownames(exprset) = rownames(exprset1)

### 绘制样本表达的箱线图
p1 <- boxplot(exprset,
outline = F, # 超出值
las = 2, # 间距值
col = 'blue', # 颜色
xaxt = 'n', # 不显示x轴的刻度和标签
ann = F) # 注释

title(main = list('Before normalization',cex = 2 ,font = 2), # 符号大小和字体大小
xlab = list('Sample list',cex = 1.5,font = 2),
ylab = '',line = 0.7)
mtext('Expression value',side = 2,padj = -3,font = 2,cex = 1.5)

### 分位数标准化预处理
library(limma)
## Warning: 程序包'limma'是用R版本4.4.1 来建造的
## 
## 载入程序包:'limma'
##
## The following object is masked from 'package:BiocGenerics':
##
## plotMA
normalized_expr <- normalizeBetweenArrays(exprset) # 分位数标准化 method默认为quantile

p2 <- boxplot(normalized_expr,outline = FALSE,las=2,col = 'red',xaxt = 'n',ann = F)

title(main = list('Normalization',cex = 2 ,font = 2),
xlab = list('Sample list',cex = 1.5,font = 2),
ylab = '',line = 0.7)

mtext('Expression value',side = 2,padj = -3,font = 2,cex = 1.5)

## 第四步,样本分组 ########
### 分为两组
(group_list <- pdata$title)
##  [1] "Colorectal cancer tissue patient 09/1601 T0 A CL145" 
## [2] "Colorectal cancer tissue patient 09/1601 T0 B CL146"
## [3] "Colorectal cancer tissue patient 09/1601 T0 C CL147"
## [4] "Colorectal cancer tissue patient 09/1601 T1 A CL148"
## [5] "Colorectal cancer tissue patient 09/1601 T3 A CL149"
## [6] "Colorectal cancer tissue patient 09/1601 T2 A CL150"
## [7] "Colorectal cancer tissue patient 09/1066 T0 A CL109"
## [8] "Colorectal cancer tissue patient 09/1066 T0 B CL110"
## [9] "Colorectal cancer tissue patient 09/1066 T0 C CL111"
## [10] "Colorectal cancer tissue patient 09/1066 T1 A CL112"
## [11] "Colorectal cancer tissue patient 09/1066 T3 A CL113"
## [12] "Colorectal cancer tissue patient 09/1066 T2 A CL114"
## [13] "Colorectal cancer tissue patient 09/1825 T0 A CL157"
## [14] "Colorectal cancer tissue patient 09/1825 T0 B CL158"
## [15] "Colorectal cancer tissue patient 09/1825 T0 C CL159"
## [16] "Colorectal cancer tissue patient 09/1825 T2 A CL160"
## [17] "Colorectal cancer tissue patient 09/1825 T1 A CL161"
## [18] "Colorectal cancer tissue patient 09/1825 T3 A CL162"
## [19] "Colorectal cancer tissue patient 09/1056 T0 A CL121"
## [20] "Colorectal cancer tissue patient 09/1056 T0 B CL122"
## [21] "Colorectal cancer tissue patient 09/1056 T0 C CL123"
## [22] "Colorectal cancer tissue patient 09/1056 T3 A CL124"
## [23] "Colorectal cancer tissue patient 09/1056 T1 A CL125"
## [24] "Colorectal cancer tissue patient 09/1056 T2 A CL126"
## [25] "Colorectal cancer tissue patient 08/16477 T0 A CL133"
## [26] "Colorectal cancer tissue patient 08/16477 T0 B CL134"
## [27] "Colorectal cancer tissue patient 08/16477 T0 C CL135"
## [28] "Colorectal cancer tissue patient 08/16477 T3 A CL136"
## [29] "Colorectal cancer tissue patient 08/16477 T2 A CL137"
## [30] "Colorectal cancer tissue patient 08/16477 T1 A CL138"
## [31] "Colorectal cancer tissue patient 08/15233 T0 A CL097"
## [32] "Colorectal cancer tissue patient 08/15233 T0 B CL098"
## [33] "Colorectal cancer tissue patient 08/15233 T0 C CL099"
## [34] "Colorectal cancer tissue patient 08/15233 T1 A CL100"
## [35] "Colorectal cancer tissue patient 08/15233 T3 A CL101"
## [36] "Colorectal cancer tissue patient 08/15233 T2 A CL102"
## [37] "Colorectal cancer tissue patient 08/15827 T0 A CL049"
## [38] "Colorectal cancer tissue patient 08/15827 T0 B CL050"
## [39] "Colorectal cancer tissue patient 08/15827 T0 C CL051"
## [40] "Colorectal cancer tissue patient 08/15827 T2 A CL052"
## [41] "Colorectal cancer tissue patient 08/15827 T1 A CL053"
## [42] "Colorectal cancer tissue patient 08/15827 T3 A CL054"
## [43] "Colorectal cancer tissue patient 08/15824 T0 A CL061"
## [44] "Colorectal cancer tissue patient 08/15824 T0 B CL062"
## [45] "Colorectal cancer tissue patient 08/15824 T0 C CL063"
## [46] "Colorectal cancer tissue patient 08/15824 T2 A CL064"
## [47] "Colorectal cancer tissue patient 08/15824 T3 A CL065"
## [48] "Colorectal cancer tissue patient 08/15824 T1 A CL066"
## [49] "Colorectal cancer tissue patient 08/15607 T0 A CL001"
## [50] "Colorectal cancer tissue patient 08/15607 T0 B CL002"
## [51] "Colorectal cancer tissue patient 08/15607 T0 C CL003"
## [52] "Colorectal cancer tissue patient 08/15607 T2 A CL004"
## [53] "Colorectal cancer tissue patient 08/15607 T1 A CL005"
## [54] "Colorectal cancer tissue patient 08/15607 T3 A CL006"
## [55] "Colorectal cancer tissue patient 08/14856 T0 A CL013"
## [56] "Colorectal cancer tissue patient 08/14856 T0 B CL014"
## [57] "Colorectal cancer tissue patient 08/14856 T0 C CL015"
## [58] "Colorectal cancer tissue patient 08/14856 T1 A CL016"
## [59] "Colorectal cancer tissue patient 08/14856 T2 A CL017"
## [60] "Colorectal cancer tissue patient 08/14856 T3 A CL018"
## [61] "Colorectal cancer tissue patient 08/14623 T0 A CL025"
## [62] "Colorectal cancer tissue patient 08/14623 T0 B CL026"
## [63] "Colorectal cancer tissue patient 08/14623 T0 C CL027"
## [64] "Colorectal cancer tissue patient 08/14623 T1 A CL028"
## [65] "Colorectal cancer tissue patient 08/14623 T2 A CL029"
## [66] "Colorectal cancer tissue patient 08/14623 T3 A CL030"
## [67] "Colorectal cancer tissue patient 08/15161 T0 A CL037"
## [68] "Colorectal cancer tissue patient 08/15161 T0 B CL038"
## [69] "Colorectal cancer tissue patient 08/15161 T0 C CL039"
## [70] "Colorectal cancer tissue patient 08/15161 T2 A CL040"
## [71] "Colorectal cancer tissue patient 08/15161 T3 A CL041"
## [72] "Colorectal cancer tissue patient 08/15161 T1 A CL042"
## [73] "Colorectal cancer tissue patient 08/15000 T0 A CL073"
## [74] "Colorectal cancer tissue patient 08/15000 T0 B CL074"
## [75] "Colorectal cancer tissue patient 08/15000 T0 C CL075"
## [76] "Colorectal cancer tissue patient 08/15000 T1 A CL076"
## [77] "Colorectal cancer tissue patient 08/15000 T3 A CL077"
## [78] "Colorectal cancer tissue patient 08/15000 T2 A CL078"
## [79] "Colorectal cancer tissue patient 08/14630 T0 A CL085"
## [80] "Colorectal cancer tissue patient 08/14630 T0 B CL086"
## [81] "Colorectal cancer tissue patient 08/14630 T0 C CL087"
## [82] "Colorectal cancer tissue patient 08/14630 T1 A CL088"
## [83] "Colorectal cancer tissue patient 08/14630 T2 A CL089"
## [84] "Colorectal cancer tissue patient 08/14630 T3 A CL090"
T0 <- normalized_expr[,grep('T0',group_list)]
T1 <- normalized_expr[,grep('T1',group_list)]
T2 <- normalized_expr[,grep('T2',group_list)]
T3 <- normalized_expr[,grep('T3',group_list)]

exprSet1 <- cbind(T0,T1,T2,T3)
group_list <- c(rep('T0',ncol(T0)),
rep('T1',ncol(T1)),
rep('T2',ncol(T2)),
rep('T3',ncol(T3))
)
head(exprSet1)
##        GSM913048 GSM913049 GSM913050 GSM913054 GSM913055 GSM913056 GSM913060
## 7A5 8.537764 8.548547 8.441245 8.410595 8.414263 8.616209 8.515499
## A1BG 8.512669 8.530805 8.700733 8.691493 8.481080 8.629389 8.670392
## A1CF 8.654236 8.873897 8.839549 8.941091 9.076200 8.679202 8.768690
## A26A1 8.518424 8.452777 8.421283 8.471795 8.460606 8.650432 8.498228
## A26B1 8.457336 8.383686 8.622979 8.483531 8.455759 8.536761 8.359836
## A26C1B 8.533794 8.331390 8.416182 8.485151 8.505959 8.553145 8.545388
## GSM913061 GSM913062 GSM913066 GSM913067 GSM913068 GSM913072 GSM913073
## 7A5 8.490087 8.400916 8.510602 8.586458 8.536003 8.501388 8.608650
## A1BG 8.633758 8.636521 8.476212 8.620177 8.533209 8.598709 8.521600
## A1CF 9.423402 8.984643 9.170350 9.492706 9.198246 9.403652 8.777734
## A26A1 8.474302 8.391699 8.460444 8.468957 8.484902 8.455495 8.514507
## A26B1 8.559591 8.373386 8.338834 8.457792 8.445911 8.429789 8.556293
## A26C1B 8.597674 8.417813 8.388652 8.410312 8.312372 8.453813 8.548926
## GSM913074 GSM913078 GSM913079 GSM913080 GSM913084 GSM913085 GSM913086
## 7A5 8.566945 8.485387 8.332163 8.563685 8.481738 8.576301 8.562440
## A1BG 8.604458 8.715413 8.582538 8.466035 8.713205 8.752326 8.623740
## A1CF 8.869712 9.632351 9.045018 8.844624 8.914730 8.868131 8.886039
## A26A1 8.458701 8.375024 8.380636 8.407303 8.444645 8.542998 8.472254
## A26B1 8.356074 8.312429 8.431713 8.487154 8.489230 8.437695 8.514529
## A26C1B 8.425353 8.432517 8.477540 8.508398 8.908141 8.583687 8.459102
## GSM913090 GSM913091 GSM913092 GSM913096 GSM913097 GSM913098 GSM913102
## 7A5 8.398004 8.577844 8.575319 8.351676 8.351389 8.338270 8.403866
## A1BG 8.707946 8.578197 8.541272 8.735042 8.762820 8.549134 8.655556
## A1CF 8.918084 9.645060 8.704864 8.682492 8.810641 8.890091 8.888089
## A26A1 8.356659 8.533072 8.492119 8.583453 8.413861 8.473275 8.447700
## A26B1 8.473917 8.429333 8.451270 8.446517 8.371802 8.572044 8.513722
## A26C1B 8.538081 8.416325 8.462330 8.492622 8.486831 8.522631 8.555365
## GSM913103 GSM913104 GSM913108 GSM913109 GSM913110 GSM913114 GSM913115
## 7A5 8.488898 8.333868 8.541359 8.732981 8.541494 8.340063 8.492536
## A1BG 8.889002 8.571107 8.518282 8.697950 8.614696 8.649994 8.722199
## A1CF 8.555566 8.959074 8.633046 8.646464 8.832119 8.775655 8.622818
## A26A1 8.529220 8.413418 8.580921 8.415736 8.460494 8.507114 8.501783
## A26B1 8.472724 8.395020 8.423822 8.432968 8.424050 8.432166 8.467958
## A26C1B 8.556802 8.657343 8.415226 8.499466 8.472102 8.546444 8.493911
## GSM913116 GSM913120 GSM913121 GSM913122 GSM913126 GSM913127 GSM913128
## 7A5 8.337011 8.383782 8.381134 8.415736 8.553787 8.692006 8.562566
## A1BG 8.648586 8.612120 8.653166 8.565024 8.631850 8.663689 8.788348
## A1CF 8.778847 8.622213 8.752072 8.763191 9.029017 8.973425 9.156202
## A26A1 8.476363 8.437354 8.480842 8.472897 8.401005 8.484041 8.471051
## A26B1 8.376489 8.484435 8.291979 8.465818 8.607175 8.513410 8.383766
## A26C1B 8.449951 8.395895 8.411001 8.619189 8.420490 8.616816 8.463050
## GSM913051 GSM913057 GSM913064 GSM913070 GSM913077 GSM913081 GSM913088
## 7A5 8.317402 8.375310 8.331546 8.537624 8.425729 8.445478 8.588992
## A1BG 8.566594 8.725533 8.570961 8.645525 8.629407 8.692089 8.476494
## A1CF 8.679879 8.811827 8.886601 9.380615 8.693547 8.985770 8.736010
## A26A1 8.425044 8.434419 8.568114 8.475647 8.510293 8.421432 8.471222
## A26B1 8.443940 8.456357 8.483554 8.599432 8.338423 8.513228 8.436931
## A26C1B 8.436286 8.479615 8.480258 8.491078 8.478000 8.432661 8.451069
## GSM913095 GSM913100 GSM913105 GSM913111 GSM913119 GSM913123 GSM913129
## 7A5 8.760944 8.375475 8.376840 8.418904 8.537841 8.434130 8.580650
## A1BG 8.591186 8.880060 8.686938 8.554570 8.593187 8.568553 8.699546
## A1CF 8.967839 8.755390 8.768656 8.940446 9.854297 8.649289 8.891473
## A26A1 8.466526 8.636393 8.667059 8.450892 8.498323 8.460993 8.516290
## A26B1 8.468394 8.439708 8.449154 8.574446 8.353320 8.442900 8.409688
## A26C1B 8.494759 8.496363 8.546163 8.512258 8.491877 8.490632 8.497134
## GSM913053 GSM913059 GSM913063 GSM913071 GSM913076 GSM913083 GSM913087
## 7A5 8.505096 8.534322 8.344130 8.648560 8.433921 8.496799 8.648533
## A1BG 8.610832 8.705259 8.624638 8.614193 8.576803 8.736069 8.632939
## A1CF 8.708524 8.675985 8.827869 9.128261 8.830862 9.768302 8.774044
## A26A1 8.569030 8.507556 8.611225 8.436261 8.389166 8.425749 8.517450
## A26B1 8.495660 8.452468 8.405867 8.410602 8.385142 8.460522 8.620479
## A26C1B 8.461980 8.482152 8.488556 8.425761 8.480559 8.491284 8.447808
## GSM913093 GSM913099 GSM913106 GSM913112 GSM913117 GSM913125 GSM913130
## 7A5 8.572106 8.375917 8.582969 9.194208 8.369164 8.530705 8.346284
## A1BG 8.527215 8.526758 8.483598 8.583972 8.657044 8.687815 8.669618
## A1CF 8.672130 8.942210 8.911944 8.722469 8.932235 8.594556 9.030243
## A26A1 8.454469 8.666757 8.740436 8.495673 8.510980 8.512781 8.587438
## A26B1 8.568751 8.429729 8.567181 8.421079 8.397150 8.538010 8.406256
## A26C1B 8.441631 8.423899 8.576186 8.463679 8.564298 8.454349 8.561757
## GSM913052 GSM913058 GSM913065 GSM913069 GSM913075 GSM913082 GSM913089
## 7A5 8.380664 8.307257 8.404990 8.641728 8.444766 8.404998 8.478766
## A1BG 8.546639 8.924156 8.561134 8.563479 8.737350 8.701359 8.510950
## A1CF 8.712546 8.821639 8.754917 8.996373 9.891539 9.921730 8.748095
## A26A1 8.511990 8.410070 8.603449 8.396327 8.512534 8.452303 8.517746
## A26B1 8.426556 8.330715 8.371459 8.615856 8.393094 8.358854 8.529640
## A26C1B 8.545413 8.615610 8.470870 8.440960 8.519927 8.433883 8.543080
## GSM913094 GSM913101 GSM913107 GSM913113 GSM913118 GSM913124 GSM913131
## 7A5 8.421420 8.286649 8.533063 8.674214 8.330775 8.406755 8.652578
## A1BG 8.587943 8.511492 8.528971 8.606641 8.545403 8.631003 8.588577
## A1CF 9.861112 9.194680 8.965774 8.609866 8.716887 8.577866 8.825236
## A26A1 8.497880 8.488757 8.467583 8.532427 8.622994 8.470501 8.464269
## A26B1 8.465917 8.352749 8.398465 8.460788 8.414046 8.439099 8.423422
## A26C1B 8.538857 8.495782 8.544623 8.599415 8.512185 8.455956 8.557736
colnames(exprSet1) <- group_list
head(exprSet1)
##              T0       T0       T0       T0       T0       T0       T0       T0
## 7A5 8.537764 8.548547 8.441245 8.410595 8.414263 8.616209 8.515499 8.490087
## A1BG 8.512669 8.530805 8.700733 8.691493 8.481080 8.629389 8.670392 8.633758
## A1CF 8.654236 8.873897 8.839549 8.941091 9.076200 8.679202 8.768690 9.423402
## A26A1 8.518424 8.452777 8.421283 8.471795 8.460606 8.650432 8.498228 8.474302
## A26B1 8.457336 8.383686 8.622979 8.483531 8.455759 8.536761 8.359836 8.559591
## A26C1B 8.533794 8.331390 8.416182 8.485151 8.505959 8.553145 8.545388 8.597674
## T0 T0 T0 T0 T0 T0 T0 T0
## 7A5 8.400916 8.510602 8.586458 8.536003 8.501388 8.608650 8.566945 8.485387
## A1BG 8.636521 8.476212 8.620177 8.533209 8.598709 8.521600 8.604458 8.715413
## A1CF 8.984643 9.170350 9.492706 9.198246 9.403652 8.777734 8.869712 9.632351
## A26A1 8.391699 8.460444 8.468957 8.484902 8.455495 8.514507 8.458701 8.375024
## A26B1 8.373386 8.338834 8.457792 8.445911 8.429789 8.556293 8.356074 8.312429
## A26C1B 8.417813 8.388652 8.410312 8.312372 8.453813 8.548926 8.425353 8.432517
## T0 T0 T0 T0 T0 T0 T0 T0
## 7A5 8.332163 8.563685 8.481738 8.576301 8.562440 8.398004 8.577844 8.575319
## A1BG 8.582538 8.466035 8.713205 8.752326 8.623740 8.707946 8.578197 8.541272
## A1CF 9.045018 8.844624 8.914730 8.868131 8.886039 8.918084 9.645060 8.704864
## A26A1 8.380636 8.407303 8.444645 8.542998 8.472254 8.356659 8.533072 8.492119
## A26B1 8.431713 8.487154 8.489230 8.437695 8.514529 8.473917 8.429333 8.451270
## A26C1B 8.477540 8.508398 8.908141 8.583687 8.459102 8.538081 8.416325 8.462330
## T0 T0 T0 T0 T0 T0 T0 T0
## 7A5 8.351676 8.351389 8.338270 8.403866 8.488898 8.333868 8.541359 8.732981
## A1BG 8.735042 8.762820 8.549134 8.655556 8.889002 8.571107 8.518282 8.697950
## A1CF 8.682492 8.810641 8.890091 8.888089 8.555566 8.959074 8.633046 8.646464
## A26A1 8.583453 8.413861 8.473275 8.447700 8.529220 8.413418 8.580921 8.415736
## A26B1 8.446517 8.371802 8.572044 8.513722 8.472724 8.395020 8.423822 8.432968
## A26C1B 8.492622 8.486831 8.522631 8.555365 8.556802 8.657343 8.415226 8.499466
## T0 T0 T0 T0 T0 T0 T0 T0
## 7A5 8.541494 8.340063 8.492536 8.337011 8.383782 8.381134 8.415736 8.553787
## A1BG 8.614696 8.649994 8.722199 8.648586 8.612120 8.653166 8.565024 8.631850
## A1CF 8.832119 8.775655 8.622818 8.778847 8.622213 8.752072 8.763191 9.029017
## A26A1 8.460494 8.507114 8.501783 8.476363 8.437354 8.480842 8.472897 8.401005
## A26B1 8.424050 8.432166 8.467958 8.376489 8.484435 8.291979 8.465818 8.607175
## A26C1B 8.472102 8.546444 8.493911 8.449951 8.395895 8.411001 8.619189 8.420490
## T0 T0 T1 T1 T1 T1 T1 T1
## 7A5 8.692006 8.562566 8.317402 8.375310 8.331546 8.537624 8.425729 8.445478
## A1BG 8.663689 8.788348 8.566594 8.725533 8.570961 8.645525 8.629407 8.692089
## A1CF 8.973425 9.156202 8.679879 8.811827 8.886601 9.380615 8.693547 8.985770
## A26A1 8.484041 8.471051 8.425044 8.434419 8.568114 8.475647 8.510293 8.421432
## A26B1 8.513410 8.383766 8.443940 8.456357 8.483554 8.599432 8.338423 8.513228
## A26C1B 8.616816 8.463050 8.436286 8.479615 8.480258 8.491078 8.478000 8.432661
## T1 T1 T1 T1 T1 T1 T1 T1
## 7A5 8.588992 8.760944 8.375475 8.376840 8.418904 8.537841 8.434130 8.580650
## A1BG 8.476494 8.591186 8.880060 8.686938 8.554570 8.593187 8.568553 8.699546
## A1CF 8.736010 8.967839 8.755390 8.768656 8.940446 9.854297 8.649289 8.891473
## A26A1 8.471222 8.466526 8.636393 8.667059 8.450892 8.498323 8.460993 8.516290
## A26B1 8.436931 8.468394 8.439708 8.449154 8.574446 8.353320 8.442900 8.409688
## A26C1B 8.451069 8.494759 8.496363 8.546163 8.512258 8.491877 8.490632 8.497134
## T2 T2 T2 T2 T2 T2 T2 T2
## 7A5 8.505096 8.534322 8.344130 8.648560 8.433921 8.496799 8.648533 8.572106
## A1BG 8.610832 8.705259 8.624638 8.614193 8.576803 8.736069 8.632939 8.527215
## A1CF 8.708524 8.675985 8.827869 9.128261 8.830862 9.768302 8.774044 8.672130
## A26A1 8.569030 8.507556 8.611225 8.436261 8.389166 8.425749 8.517450 8.454469
## A26B1 8.495660 8.452468 8.405867 8.410602 8.385142 8.460522 8.620479 8.568751
## A26C1B 8.461980 8.482152 8.488556 8.425761 8.480559 8.491284 8.447808 8.441631
## T2 T2 T2 T2 T2 T2 T3 T3
## 7A5 8.375917 8.582969 9.194208 8.369164 8.530705 8.346284 8.380664 8.307257
## A1BG 8.526758 8.483598 8.583972 8.657044 8.687815 8.669618 8.546639 8.924156
## A1CF 8.942210 8.911944 8.722469 8.932235 8.594556 9.030243 8.712546 8.821639
## A26A1 8.666757 8.740436 8.495673 8.510980 8.512781 8.587438 8.511990 8.410070
## A26B1 8.429729 8.567181 8.421079 8.397150 8.538010 8.406256 8.426556 8.330715
## A26C1B 8.423899 8.576186 8.463679 8.564298 8.454349 8.561757 8.545413 8.615610
## T3 T3 T3 T3 T3 T3 T3 T3
## 7A5 8.404990 8.641728 8.444766 8.404998 8.478766 8.421420 8.286649 8.533063
## A1BG 8.561134 8.563479 8.737350 8.701359 8.510950 8.587943 8.511492 8.528971
## A1CF 8.754917 8.996373 9.891539 9.921730 8.748095 9.861112 9.194680 8.965774
## A26A1 8.603449 8.396327 8.512534 8.452303 8.517746 8.497880 8.488757 8.467583
## A26B1 8.371459 8.615856 8.393094 8.358854 8.529640 8.465917 8.352749 8.398465
## A26C1B 8.470870 8.440960 8.519927 8.433883 8.543080 8.538857 8.495782 8.544623
## T3 T3 T3 T3
## 7A5 8.674214 8.330775 8.406755 8.652578
## A1BG 8.606641 8.545403 8.631003 8.588577
## A1CF 8.609866 8.716887 8.577866 8.825236
## A26A1 8.532427 8.622994 8.470501 8.464269
## A26B1 8.460788 8.414046 8.439099 8.423422
## A26C1B 8.599415 8.512185 8.455956 8.557736
data_avereps <- t(limma::avereps(t(exprSet1),ID = colnames(exprSet1)))

save(data_avereps,file = "data_avereps.Rdata")

##mfuzz,聚类分析########
library(Mfuzz)
## 载入需要的程序包:e1071
## Warning: 程序包'e1071'是用R版本4.4.2 来建造的
## 
## 载入程序包:'widgetTools'
##
## The following object is masked from 'package:dplyr':
##
## funs
##
##
## 载入程序包:'DynDoc'
##
## The following object is masked from 'package:BiocGenerics':
##
## path
library(tidyverse)
library(limma)
library(clusterProfiler)
## Warning: 程序包'clusterProfiler'是用R版本4.4.1 来建造的
## 
## clusterProfiler v4.12.6 Learn more at https://yulab-smu.top/contribution-knowledge-mining/
##
## Please cite:
##
## S Xu, E Hu, Y Cai, Z Xie, X Luo, L Zhan, W Tang, Q Wang, B Liu, R Wang,
## W Xie, T Wu, L Xie, G Yu. Using clusterProfiler to characterize
## multiomics data. Nature Protocols. 2024, doi:10.1038/s41596-024-01020-z
##
## 载入程序包:'clusterProfiler'
##
## The following object is masked from 'package:purrr':
##
## simplify
##
## The following object is masked from 'package:stats':
##
## filter
library(ggplot2)
library(ggstatsplot)
## Warning: 程序包'ggstatsplot'是用R版本4.4.2 来建造的
## You can cite this package as:
## Patil, I. (2021). Visualizations with statistical details: The 'ggstatsplot' approach.
## Journal of Open Source Software, 6(61), 3167, doi:10.21105/joss.03167
load("data_avereps.Rdata")

protein <- data_avereps
protein <- as.matrix(protein)

#构建对象
mfuzz_class <- new('ExpressionSet',exprs = protein)

#预处理缺失值或者异常值
mfuzz_class <- filter.NA(mfuzz_class, thres = 0.25)
## 0 genes excluded.
mfuzz_class <- fill.NA(mfuzz_class, mode = 'mean')
mfuzz_class <- filter.std(mfuzz_class, min.std = 0)
## 0 genes excluded.

#标准化数据
mfuzz_class <- standardise(mfuzz_class)

#Mfuzz 基于 fuzzy c-means 的算法进行聚类
set.seed(123)
cluster_num <- 10
mfuzz_cluster <- mfuzz(mfuzz_class, c = cluster_num, m = mestimate(mfuzz_class))

#作图,详情 ?mfuzz.plot2
#time.labels 参数设置时间轴,需要和原基因表达数据集中的列对应
#颜色、线宽、坐标轴、字体等细节也可以添加其他参数调整,此处略,详见函数帮助
mfuzz.plot2(mfuzz_class, cl = mfuzz_cluster, mfrow = c(2, 5), time.labels = colnames(protein))

#查看每个聚类群中各自包含的蛋白数量
cluster_size <- mfuzz_cluster$size
names(cluster_size) <- 1:cluster_num
cluster_size
##    1    2    3    4    5    6    7    8    9   10 
## 2428 2574 2604 2369 2440 2299 2775 2576 2609 2485
#查看每个蛋白所属的聚类群
head(mfuzz_cluster$cluster)
##    7A5   A1BG   A1CF  A26A1  A26B1 A26C1B 
## 1 8 5 1 10 9
#Mfuzz 通过计算 membership 值判断蛋白质所属的聚类群,以最大的 membership 值为准
#查看各蛋白的 membership 值
head(mfuzz_cluster$membership)
##                 1          2          3          4          5          6
## 7A5 0.56158702 0.09177368 0.04225036 0.06374335 0.02785055 0.02939087
## A1BG 0.04807359 0.06399578 0.04117682 0.03624329 0.04978102 0.12602771
## A1CF 0.04827192 0.04889955 0.10864039 0.08169000 0.25773246 0.07765183
## A26A1 0.16948013 0.07951809 0.07191140 0.15703302 0.07442994 0.05993009
## A26B1 0.13049075 0.09497807 0.04461358 0.05527676 0.04070962 0.05137946
## A26C1B 0.05612422 0.05931843 0.17103348 0.09901397 0.17309102 0.07926009
## 7 8 9 10
## 7A5 0.03321991 0.04529607 0.03371560 0.07117259
## A1BG 0.12110744 0.37523179 0.03665506 0.10170750
## A1CF 0.05936079 0.04682827 0.22624699 0.04467780
## A26A1 0.08501830 0.06948703 0.10306746 0.13012454
## A26B1 0.07095177 0.10724701 0.04107946 0.36327352
## A26C1B 0.05782595 0.05141744 0.20515120 0.04776420
#最后,提取所有蛋白所属的聚类群,并和它们的原始表达值整合在一起
protein_cluster <- mfuzz_cluster$cluster
protein_cluster <- cbind(protein[names(protein_cluster), ], protein_cluster)
head(protein_cluster)
##              T0       T1       T2       T3 protein_cluster
## 7A5 8.487630 8.464776 8.541622 8.454902 1
## A1BG 8.629773 8.634332 8.616911 8.610364 8
## A1CF 8.928172 8.928688 8.894260 9.042733 5
## A26A1 8.469947 8.500189 8.530355 8.496345 1
## A26B1 8.450493 8.457820 8.468493 8.427190 10
## A26C1B 8.494933 8.484154 8.483135 8.519593 9
write.table(protein_cluster, 'p_cluster.txt', sep = '\t', col.names = NA, quote = FALSE)

#如果您想提取数据分析过程中,标准化后的表达值(绘制曲线图用的那个值,而非原始蛋白表达值)
protein_cluster <- mfuzz_cluster$cluster
protein_standard <- mfuzz_class@assayData$exprs
protein_standard_cluster <- cbind(protein_standard[names(protein_cluster), ], protein_cluster)
head(protein_standard_cluster)
##                 T0          T1         T2         T3 protein_cluster
## 7A5 0.01025974 -0.57931111 1.4030925 -0.8340412 1
## A1BG 0.62303545 1.03304571 -0.5336421 -1.1224391 8
## A1CF -0.31275575 -0.30480005 -0.8354671 1.4530229 5
## A26A1 -1.18300390 0.03962164 1.2591758 -0.1157936 1
## A26B1 -0.02890642 0.38957938 0.9991443 -1.3598173 10
## A26C1B -0.03072416 -0.66647302 -0.7265358 1.4237330 9

文末福利,mfuzz实操_Mango代码和数据集链接:

https://pan.baidu.com/s/1hrex6QpuVe-yfBdhz8nSOA?pwd=yjz9 

提取码: yjz9

参考资料:

  1. https://www.bioconductor.org/packages/release/bioc/html/Mfuzz.html

  2. https://www.bioconductor.org/packages/release/bioc/manuals/Mfuzz/man/Mfuzz.pdf

芒果师兄
1.生信技能和基因编辑。2.论文发表和基金写作。3. 健康管理和医学科研资讯。4.幸福之路,读书,音乐和娱乐。
 最新文章