一边学习,一边总结,一边分享!
由于微信改版,一直有同学反映。存在长时间接收不到公众号的推文。那么请跟随以下步骤,将小杜的生信筆記设置为星标,不错过每一条推文教程。
欢迎关注《小杜的生信笔记》!!
如何加入社群
小杜的生信笔记
,仅有微信社群。
1. 微信群:付费社群。添加小杜好友,加友请知:加友须知!!,加入社群请查看小杜生笔记付费加友入群声明。
2. 小杜个人微信:若你有好的教程或想法,可添加小杜个人微信。值得注意的是,小杜个人微信并不支持免费咨询长时间咨询,但支持小问题2-3个免费咨询。
小杜微信:
知识星球:
2022年教程总汇
2023年教程总汇
6. 比对到本物种mRNA.fa
# 构建index
bowtie-build mRNA.fa mRNA
## 比对本物种mRNA序列
bowtie -f -v 0 -p 1 -a --best --strata \
../S0.ref_prepare/03.genome/mRNA \ # bowtie index
P11.mapped.fa > P11.exon.bwt 2>P11.exon.log
7. 比对到本物种intron序列,用TBtools插件生成gff文件
参考文章:https://mp.weixin.qq.com/s/KQBe4IlMpD2qQqJqwIp8aQ
7.1 获得本物种intron序列
下载getIntronGXF脚本
https://github.com/nongxinshengxin/getIntronGXF
2. 使用TBtools进行操作
安装安装成功!后面的操作,直接拖拖拽拽就可以。https://mp.weixin.qq.com/s/KQBe4IlMpD2qQqJqwIp8aQ
# 需要先将 gff 1-based 格式转化为 bed 0-based格式
# 用bedtools,提取bed中的序列
awk '{print $1"\t"$4-1"\t"$5}' intron.gtf > intron.bed
bedtools getfasta -fi genome.fa -fo intron.fa -bed intron.bed -name
7.2 进行比对
# 构建index
bowtie-build intorn.fa intorn
## 比对本物种intron序列
bowtie -f -v 0 -p 1 -a --best --strata \
../S0.ref_prepare/03.genome/intron \ # bowtie index
P11.mapped.fa > P11.intron.bwt 2>P11.intron.log
8. 进行最后的注释分类
## 对small rna进行分类注释
## known mirna > ncrna > repeat > exon > intron
perl ../script/srna_anno.pl \
-fa P11.mapped.fa \ # 用于分类的reads文件
-rfam ../S0.ref_prepare/02.Rfam/family.txt1 \ #rfam家族信息文件
-mirna P11.miRNA.bwt \ # 已知miRNA比对结果
-ncrna P11.ncRNA.bwt \ # 与其他ncRNA比对结果
-intron P11.intron.bwt \ # 与intron比对结果
-repeat P11.repeat.bwt \ # 与repeat比对结果
-exon P11.exon.bwt \ # 与mRNA比对结果
-outpre P11.out # 输出结果前缀
srna_anno.pl脚本
#!/usr/bin/env perl
use strict ;
use Getopt::Long;
use Bio::SeqIO;
my ($fa, $rfam, $mirna, $ncrna, $exon, $intron, $repeat, $outpre);
GetOptions(
"fa=s" => \$fa,
"rfam=s" => \$rfam,
"mirna=s" => \$mirna,
"ncrna=s" => \$ncrna,
"exon=s" => \$exon,
"intron=s" => \$intron,
"repeat=s" => \$repeat,
"outpre=s" => \$outpre
);
my $usage="usage:
perl $0 -fa srna.fa -rfam family.txt -mirna mirna.bwt -ncrna rfam.bwt -exon exon.bwt -intron intron.bwt -repeat repeat.bwt -outpre out.stat\n
[options]
-fa srna.fa :small rna in fasta format;
-rfam family.txt : rfam family file, two col;
-mirna mirna.bwt : mirna alignment by bowtie;
-ncrna rfam.bwt : Rfam(rm mirna) alignment by bowtie;
-exon exon.bwt or none : exon aligment by bowtie ;
-intron intron.bwt or none : intron alignment by bowtie;
-repeat repeat.bwt or none : repeat aligment by bowtie;
-outpre outprefix : prefix for output
";
#print "$fa, $rfam, $mirna, $ncrna, $exon, $intron, $repeat, $outpre\n";
unless ( $fa && $rfam && $mirna && $ncrna && $exon && $intron && $repeat && $outpre ){
die $usage;
}
my %mirna = &gene_bwt($mirna);
my %exon = $exon eq "none" ? () : &gene_bwt($exon);
my %intron = $intron eq "none" ? () : &gene_bwt($intron);
my %repeat = $repeat eq "none" ? () : &gene_bwt($repeat);
my %ncrna = &ncrna_bwt($rfam, $ncrna);
## annotation order: mirna > ncrna > repeat > exon > intron
## ncrna order: rRNA > tRNA > snRNA > snoRNA > other_ncRNA
my $read_ann = "$outpre.read_anno.txt";
my $stat_ann = "$outpre.read_anno.stat";
my $total = 0 ;
my $uniq = 0 ;
open ANN, ">$read_ann" or die $!;
print ANN "#name\tsequence\tnumber\tlength\tannotation\n";
my %ann;
#my %ann_keep;
my $fasta = Bio::SeqIO->new(-file => $fa, -format => 'fasta');
while(my $obj = $fasta->next_seq()){
my $id = $obj->id;
my $seq = $obj->seq;
my $len = $obj->length;
my $nu = $1 if ($id =~ /^\S+_x(\d+)/);
my $ann = "unknown";
if(exists $mirna{$id}){
$ann = "miRNA";
}
elsif(exists $ncrna{$id}){
$ann = $ncrna{$id};
}
elsif(exists $repeat{$id}){
$ann = "repeat";
}
elsif(exists $exon{$id}){
my $strand = "-" ;
if(exists $exon{$id}{"+"}){
$strand = "+"
}
if($strand eq "+"){
$ann = "exon_sense";
}
else{
$ann = "exon_antisense"
}
}
elsif(exists $intron{$id}){
my $strand = "-" ;
if(exists $intron{$id}{"+"}){
$strand = "+";
}
if($strand eq "+"){
$ann = "intron_sense";
}
else{
$ann = "intron_antisense"
}
}
$ann{$ann}{"total"} += $nu;
$ann{$ann}{"uniq"} ++;
#$ann_keep{$ann} = 1;
$total += $nu;
$uniq ++;
print ANN "$id\t$seq\t$nu\t$len\t$ann\n";
}
close ANN;
open OUT, ">$stat_ann" or die $!;
my @ann_type = ("miRNA",
"rRNA", "tRNA", "snRNA", "snoRNA" , "other_ncRNA" ,
"repeat",
"exon_sense", "exon_antisense",
"intron_sense", "intron_antisense",
"unknown"
);
print OUT "ann_type\ttotal_reads\ttotal_percent\tuniq_reads\tuniq_percent\n";
foreach my $ann_type ( @ann_type ){
my $tnum = 0;
my $unum = 0;
next if ($exon eq "none" && ($ann_type eq "exon_sense" || $ann_type eq "exon_antisense"));
next if ($intron eq "none" && ( $ann_type eq "intron_sense" || $ann_type eq "intron_antisense"));
next if ($repeat eq "none" && $ann_type eq "repeat");
if( exists $ann{$ann_type}){
$tnum = $ann{$ann_type}{"total"};
$unum = $ann{$ann_type}{"uniq"};
}
my $tpercent = $tnum/$total;
my $upercent = $unum/$uniq;
printf OUT "$ann_type\t$tnum\t%.6f\t$unum\t%.6f\n",$tpercent,$upercent;
}
close OUT;
sub gene_bwt{
my $bwt = shift;
my %out ;
open IN, $bwt or die $!;
while(<IN>){
chomp;
# P11_2460498_x5615 + ccr-mir-26a 16 TTCAAGTAATCCAGGATAGGCT IIIIIIIIIIIIIIIIIIIIII 0
my @t = split /\t/, $_;
$out{$t[0]}{$t[1]} = 1;
}
close IN;
return %out;
}
sub ncrna_bwt{
my $fam = shift;
my $bwt = shift;
my %fam;
open FM, $fam or die $!;
#Cis-reg;
#Cis-reg;
#Cis-reg; frameshift_element;
#Cis-reg; IRES;
#Cis-reg; leader;
#Cis-reg; riboswitch;
#Cis-reg; thermoregulator;
#Cis-reg; thermoregulator;
#Gene;
#Gene; antisense;
#Gene; antisense;
#Gene; antitoxin;
#Gene; CRISPR;
#Gene; lncRNA;
#Gene; ribozyme;
#Gene; rRNA;
#Gene; snRNA;
#Gene; snRNA; snoRNA; CD-box;
#Gene; snRNA; snoRNA; HACA-box;
#Gene; snRNA; snoRNA; scaRNA;
#Gene; snRNA; splicing;
#Gene; sRNA;
#Gene; tRNA;
#Intron;
while(<FM>){
chomp;
my ($rf,$name, $type) = split /\t/,$_;
if ($type =~ /rRNA;/){
$fam{$rf} = "rRNA";
}
elsif($type =~ /tRNA;/){
$fam{$rf} = "tRNA";
}
elsif($type =~ /snoRNA;/ ){
$fam{$rf} = "snoRNA";
}
elsif($type =~ /snRNA;/){
$fam{$rf} = "snRNA";
}
else{
$fam{$rf} = "other_ncRNA";
}
#print "$rf\t$fam{$rf}\n";
}
close FM;
my %ty_order = { "rRNA" => 1, "tRNA" => 2, "snRNA" => 3, "snoRNA" => 4, "orther_ncRNA" => 5};
my %out;
open BWT, $bwt or die $!;
while(<BWT>){
chomp;
#P11_1019552_x109273 + RF00005:CU928174.1/736663-736781 37 GAGTACAACTCTTAGCGG
my @t = split /\t/, $_;
my $id = $t[0];
my @f = split /:/, $t[2];
my $ty = $fam{$f[0]};
if( !exists $out{$id} ){
$out{$id} = $ty;
}
else{
my $old = $out{$id};
if( $ty_order{$ty} < $ty_order{$old}){
$out{$id} = $ty;
}
}
}
close BWT;
return %out;
}
若我们的教程对你有所帮助,请点赞+收藏+转发,大家的支持是我们更新的动力!!
往期部分文章
1. 最全WGCNA教程(替换数据即可出全部结果与图形)
推荐大家购买最新的教程,若是已经购买以前WGNCA教程的同学,可以在对应教程留言,即可获得最新的教程。(注:此教程也仅基于自己理解,不仅局限于此,难免有不恰当地方,请结合自己需求,进行改动。)
2. 精美图形绘制教程
3. 转录组分析教程
4. 转录组下游分析
小杜的生信筆記 ,主要发表或收录生物信息学教程,以及基于R分析和可视化(包括数据分析,图形绘制等);分享感兴趣的文献和学习资料!!