R语言那些事(一)数据小技巧

文摘   2024-12-08 23:10   新加坡  

缘由

生信之路,道阻且长,记录经验,但渡有缘人。

碎碎念

及时当勉励,岁月不饶人  ——陶渊明

示例数据

  • 1

  • 2

  • 3

  • 4

  • 5

  • 6

  • 7

  • 8

  • 9

  • 10

  • 11

  • 12

  • 13

  • 14

  • 15

  • 16

  • 17

  • 18

  • 19

  • 20

  • 21

  • 22

  • 23

  • 24

  • 25

#示例数据 for问题1 2
df <- tibble(column_name =c("apple","banana","cherry","date","elderberry",NA,"","--"))
my_vector <-c("ban","dat")

data <- data.f(
 a =1:5,
 b =c("red","blue","green","yellow","blue"),
c=c(2.5,3.6,1.9,4.2,5.0),
 d =c("high","low","medium","high","medium")
)

#示例数据for问题3
rename_df <- tibble(oldname =c("a","b","c"), newname =c("alpha","beta","gamma"))
data_count <- tibble(a =1:5, b =6:10,c=11:15)

# 示例数据for问题5
data <- tibble(
 Protein.accession =c("P12345","P67890","P23456","P98765"),
 Gene =c("Gene1","Gene2","Gene3","Gene4")
)

combined_data <- tibble(
 Protein.accession =c("P12345","P23456","P11111"),
 KEGG.pathway =c("pathway1","pathway2","pathway3")
)

代码

  1. 如何保留指定列中包含给定向量元素的行

  • 1

  • 2

  • 3

  • 4

  • 5

  • 6

  • 7

library(dplyr)
library(stringr)

# 使用 filter 和 str_detect 来过滤包含子字符串的行,并保留 NA、空值或 "--"
filtered_df <- df %>%
 filter(str_detect(column_name, paste(my_vector, collapse ="|"))#筛选给定的向量
)
  1. 如何筛选某一列之后的所有列?筛选b列之后的所有列

  • 1

data %>% select((match("b",names(data))+1):ncol(data))
  1. 如何在R里面实现python中字典的更改列名

  • 1

  • 2

  • 3

  • 4

  • 5




col_map <- setNames(rename_df$newname, rename_df$oldname)
colnames(data_count)<- col_map[rename_df$oldname]
  1. 如何在R里面批量读取文件并记录文件的来源?

  • 1

  • 2

paths=listfiles(path ="your_folder")
df_list <- lapply(path,function(x) read.delim(x, check.names =FALSE)%>% mutate(source = x))
  1. 如何合并两个数据框后将某kegg pathway不存在的行填充为"--"

  • 1

  • 2

  • 3

  • 4

  • 5

  • 6

  • 7

  • 8

  • 9

  • 10

  • 11

  • 12

  • 13

  • 14

# 方法1 基础版:使用inner_join和antijoin 合并数据
data2 <- data %>%
 inner_join(combined_data, by ="Protein.accession")

data_del <- data %>%
 anti_join(combined_data, by ="Protein.accession")%>%
 mutate(KEGG.pathway ="--")
data_org <- rbind(data2, data_del)


# 方法2 进阶版: Left join and mutate
data_org <- data %>%
 left_join(combined_data, by ="Protein.accession")%>%
 mutate(KEGG.pathway = coalesce(KEGG.pathway,"--"))


RPython
人生苦短,R和Python。
 最新文章