[读书笔记]R语言基础

文摘   2024-09-25 17:03   澳大利亚  

R is an open-source interpreted language for analysing and visualising data,designed by statisticians (alternatives:  Python, Matlab, SAS, …).

=====================

目录:

一、基本数据类型

二、基本运算符

三、条件语句与循环

四、基础函数

五、文件读写

六、数理统计函数

七、基础绘图函数

八、编程工具

九、书籍

=====================

一、基础数据类型

1. R has four basic classes of objects:

1.1 numeric:

1.1.1 double (real numbers)

1.1.2 integer # typeof(2L) , explicitly add L to the end of the number to get an interger

1.2 character

1.3 logical: TRUE and FALSE 

1.4 complex


2. Numbers in R are considerd as numeric. Special numbers:

2.1 Inf, infinity, for 1/0 

2.2 NaN, not a number, for 0/0 

2.3 NA can be thought as a missing value


3. an explicit coercion to  Change Class of a Value

3.1 as.numeric() to change the type into numeric if it is possibel

3.2 as.logical() to change into logical if it is possible

3.3 as.character()

3.4 as.complex()

3.5 as.integer()


4. Missing Values :In R missing values are displayed by the symbol NA (not avaiable).

4.1 is.na(): This function checks if a value is missing (i.e., it is NA). is.na() detects both NA and NaN.

4.2 is.nan():This function specifically checks if a value is "Not a Number" (NaN).is.nan() detects only NaN.


5. Vectors are created by c() function or vector(),seq(), rep() etc.

5.1 v <- c(1,2,3) # v[1] = 1, since the index starts from 1 for R. 

5.2 v <- vector("numeric", length=5)

5.3 v <- seq(2,8); v<-seq(from=2, to=8, by=1)

5.4 v <- rep(3,4)

5.5  v[-1] means all the elements excludes the first one.

6 Vectorised Operations: R can perform functions over entire vectors, for example, max(x), range(x) etc.


7. A list is very similar to a vector, but it could contain objects from different classes. 

7.1 L1 <- list(5, "a", 2) # L1 has 3 elements, and each element is considered as a vector


8. Subsetting

8.1 [] always returns an object of the same class

8.2 [[]] is used to extract elements from a list fo dataframe. It always return a single element.

8.3 $ is to extract elements from a list or dataframe unsing a name


9. Factor: Categorical data in R are represented using factors. R sorts factors in alphabetical order. Factors can be ordered or unordered.

9.1 using the gl() function, we can generate factor levels .

9.3 gl(n, m, labels):n is the number of levels, m is the number of repeatitions labels is a vector of labels.


10. A matrix is a rectangular array of numbers.Matrices are a special type of vector.

10.1 m <- matrix(nrow=2, ncol=3) #empty matrix with dimension

10.1.1 m1 * m2: Performs element-wise multiplication. Both matrices must have the same dimensions.

10.1.2 m1 %*% m2: Performs matrix multiplication. The number of columns in m1 must match the number of rows in m2.

10.2 Other commonly used approaches to create matrix are cbind() and rbind().

10.2.1 m1 <- cbind(x,y) #column-binding

10.2.2 m2 <- rbind(x,y) #raw-binding


11. A data frame is technically a list, with each component being a vector corresponding to a column in our data matrix.

11.1 functions for data frames: attributes( ),nrow(),ncol(),names(),dimnames(),colnames(), rownames(), row.name() etc


12.Matrix and dataframes are very similar。However, matrices are extensions of vectors, and dataframes are extensions of lists.

12.1 Matrix has all the data of the same type. Therefore, when your data has different data types, use dataframes.

12.2 one example:

#m1<- matrix(1:25,5,5)

#df1 <- as.data.frame(m1)

#The object.size commands indicate how much memory of data take up in the computer

#print(paste("the size of df1 is ", object.size(df1), " bytes and the size of m1 is ", object.size(m1), " bytes" ))

=====================

二、基本运算符

1.  using <- or = sign to assign values into variables. It is recommended to use <-.

2. R is case sensitive. Variable x and X are different.

3. Exponentiation could be written in two ways: 2**3 or 2^3 

4. integer division:3 %/% 5; This is different from 3/5.

5.modulus or remainder: 5 %% 3

6. Scientific notation: 2.54e5  is the same as 2.54 * 10 ^ 5; 7456.3e-2  is the same as 7456.3 * 10^(-2)


7. R Relational Operators:allow you to compare values, resulting in logical values (TRUE or FALSE).

7.1 < : Less than

7.2 > : Greater than

7.3 <= : Less than or equal to

7.4 >= : Greater than or equal to

7.5 == : Equal to

7.6 != : Not equal to


8. R Logical Operators:allow you to perform logical operations on values.

8.1. x & y : Element-wise logical AND (compares each element of x with each element of y)

8.2. x && y : Logical AND (evaluates only the first element of x and y)

8.3. x | y : Element-wise logical OR (compares each element of x with each element of y)

8.4. x || y : Logical OR (evaluates only the first element of x and y)

8.5. !x : Logical NOT (negates the logical value of x)

=====================

三、条件语句与循环

1.if, else: to check a condition

2.for: to loop for a fixed number of times

3.while: to loop while a condition is TRUE

4.break: to break a loop

5.next: to skip an iteration # the same as "continue" in Python

6.return: to exit a function

7.stop(): To halt execution and raise an error.

8.ifelse(test, yes, no): To apply conditional logic across vectors.

9.any(): To check if at least one element in a vector is TRUE.

10.all(): To check if all elements in a vector are TRUE.

=====================

四、基础函数

1.myfunc <- function(n){ n*n} # the function will return the last value,but it's better to write as "return(n*n)"

2. ? round  # to get more information about this function

3.methods(): Lists all available methods for a given generic function or class.

4. args(): Shows the arguments of a function, helping you understand what inputs the function expects.

5. build-in functions 

5.1 c( ) :combine objects into a vector

5.2 cbind() : combine objects as columns

5.3 class() :class of the object

5.4 data():Loads specified data sets, or list the available data sets.

5.6 length() :gives the length of a vector

5.7 ls(): to find the list of exisitng variables in the current environment use ls() or objects() .

5.8 mode() : Get or set the type or storage mode of an object.

5.9 paste():Concatenate vectors after converting to character

5.10 print(): prints its argument 

5.11 rbind( ) # combine objects as rows

5.12 rm():  delete any variale using rm() or remove() functions. rm(x) :removes a variable,rm(list=ls()) :remove all

5.13 str(): Compactly display the internal structure of an R object

5.14 summary() : a generic function used to produce result summaries of the results of various model fitting functions. 

5.15 typeof():determines the (R internal) type or storage mode of any object

5.16 UseMethod(): Defines a generic function and dispatches methods based on the class of the object.


6. Packages are collections of R functions that are ready to use.

6.1 library() : throws an error if the package is missing, typically used in scripts where package availability is assumed.

6.2 require() :to load a pckage to use it. Returns FALSE if the package is not available, typically used in functions or conditional statements where package availability is optional or needs to be handled.

6.3 search() : see packages currently loaded

6.4 install.packages() to install a package. 

=====================

五、文件读写

1. read.table() to read a .txt data file, and read.csv() for .csv files

2. parameters for read.table() or read.csv(): (hearder=TRUE, the first row is the header; sep="\t" tab delimitted, sep=",")

3. write.table(), write.csv() to export data into a file.

4. source() to bring .r files and make the code inside the file available. 

5. The source() function is often used to modularize code by keeping different functions and scripts in separate files and loading them when needed.

6. getwd(): to get the current working directory, inessence where you are now

7. setwd(): to change the working directory

8.dir(): gives you a list of all files and folders

9.ls(): list a exisiting variables

10.Saving graphs as jpeg, png or pdf.  One example:

jpeg(file="plot.jpeg")

hist(xxx)

dev.off()

=====================

六、数理统计函数

1. For each peobability density function, there are four functions related to them:

1.1 d for density

1.2 r for random number generator

1.3 p for cumulative distribution

1.4 q for quantile function

1.5 https://www.stat.umn.edu/geyer/old/5101/rlook.html


2. The sample() function draws randomly from a specified set of (scalar) objects allowing you to sample from arbitrary distributions, e.g sample(1:10,4)

=====================

七、基础绘图函数

1. Some of the key base plotting functions

1.1 hist(): histogram

1.2 barplot() : a bar chart

1.3 boxplot() :a box-and-whisker plot

1.4 plot(): plots based on the object type of the input

1.5 lines(): add lines to the plot (just connect dots)

1.6 points(): add points

1.7 text(): add text labels to a plot using x,y coordinates

1.8 title(): add titles

1.9 mtext():add arbitrary text to the margin

1.10 axis(): adding axis ticks/labels


2. some important parameters

2.1 pch: the plotting symbol (plotting character)

2.2 lty: the line type; solid, dashed, ...

2.3 lwd: the line width; lwd=2

2.4 col: color; col="red"

2.5 xlab: x-axis label; xlab="units"

2.6 ylab: y-axix label; ylab="price"


3. plot(x, y,... type="s",...),Different values for type:

3.1 "p" - points (defult)

3.2 "l" - lines

3.3 "b" - both points and lines

3.4 "c" - empty points joined by lines

3.5 "o" - overplotted points and lines

3.6 "s" and "S" - stair steps

3.7 "h" - histogram-like vertical lines

3.8 "n" - does not produce any points or line


4. The par() is used for global graphics parameters. 

4.1 before doing any change record the standard default parameters oldpar <- par()

4.2 las: the orientation of axis labels on the plot

4.3 bg: the background color

4.4 mar: the margin size

4.5 oma: the outer margin size

4.6 mfrow: number of plots per row (plots are filled row-wise)

4.7 mfcol: number of plots per row (plots are filled column-wise)

4.8 at the end, par(oldpar) and neglect the warning messages.

=====================

八、编程工具

1、R&RStudio.

https://posit.co/download/rstudio-desktop/

2、The Comprehensive R Archive Network

https://cran.r-project.org/

3、Anaconda:用于数据科学和机器学习的大型Python/R发行版,包含了大量的科学计算包和工具。

https://www.anaconda.com/download/success

https://docs.anaconda.com/anaconda/install/windows/

=====================

九、书籍

1.R document:

https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/Normal

2.R textbook: Learning R by Richard Cotton 

3.R markdown cheatdown sheet

https://nestacms.com/docs/creating-content/markdown-cheat-sheet

4.Latex Primer 

https://www.tug.org/twg/mactex/tutorials/ltxprimer-1.0.pdf

5.Latex: an unofficial reference manual 

http://tug.ctan.org/info/latex2e-help-texinfo/latex2e.pdf


ingenieur
不动笔墨不读书