缘由
生信之路,道阻且长,记录经验,但渡有缘人。
碎碎念
及时当勉励,岁月不饶人 ——陶渊明
示例数据
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
#示例数据 for问题1 2
df <- tibble(column_name =c("apple","banana","cherry","date","elderberry",NA,"","--"))
my_vector <-c("ban","dat")
data <- data.f(
a =1:5,
b =c("red","blue","green","yellow","blue"),
c=c(2.5,3.6,1.9,4.2,5.0),
d =c("high","low","medium","high","medium")
)
#示例数据for问题3
rename_df <- tibble(oldname =c("a","b","c"), newname =c("alpha","beta","gamma"))
data_count <- tibble(a =1:5, b =6:10,c=11:15)
# 示例数据for问题5
data <- tibble(
Protein.accession =c("P12345","P67890","P23456","P98765"),
Gene =c("Gene1","Gene2","Gene3","Gene4")
)
combined_data <- tibble(
Protein.accession =c("P12345","P23456","P11111"),
KEGG.pathway =c("pathway1","pathway2","pathway3")
)
代码
如何保留指定列中包含给定向量元素的行
1
2
3
4
5
6
7
library(dplyr)
library(stringr)
# 使用 filter 和 str_detect 来过滤包含子字符串的行,并保留 NA、空值或 "--"
filtered_df <- df %>%
filter(str_detect(column_name, paste(my_vector, collapse ="|"))#筛选给定的向量
)
如何筛选某一列之后的所有列?筛选b列之后的所有列
1
data %>% select((match("b",names(data))+1):ncol(data))
如何在R里面实现python中字典的更改列名
1
2
3
4
5
col_map <- setNames(rename_df$newname, rename_df$oldname)
colnames(data_count)<- col_map[rename_df$oldname]
如何在R里面批量读取文件并记录文件的来源?
1
2
paths=listfiles(path ="your_folder")
df_list <- lapply(path,function(x) read.delim(x, check.names =FALSE)%>% mutate(source = x))
如何合并两个数据框后将某kegg pathway不存在的行填充为"--"
1
2
3
4
5
6
7
8
9
10
11
12
13
14
# 方法1 基础版:使用inner_join和antijoin 合并数据
data2 <- data %>%
inner_join(combined_data, by ="Protein.accession")
data_del <- data %>%
anti_join(combined_data, by ="Protein.accession")%>%
mutate(KEGG.pathway ="--")
data_org <- rbind(data2, data_del)
# 方法2 进阶版: Left join and mutate
data_org <- data %>%
left_join(combined_data, by ="Protein.accession")%>%
mutate(KEGG.pathway = coalesce(KEGG.pathway,"--"))