author: Parash Upreti date: autosize: true
.
Disclaimer: Most of the information may seem basic but I recommend quickly trying the codes and rejuvinating rusty skills.
This is a very basic R Tutorial and assumes that you have a very little or no previous R knowledge. You can easily follow the slides with R your local installation of the software. Topics here include:
- Getting Started with R
- Working on R
- Workspace
- Commenting R Code
- Working Directory
- Working with data
- Vectors, list and Matrices
- R Data Structures
- Logical Operators and Truth values
- Programming in R
- Functions and packages
-
R is a statistical and mathematical computational software. It is available free of cost here [https://cran.r-project.org/bin/windows/base/].
-
The popular IDE for R is RStudio which can be downloaded for free as well here [https://www.rstudio.com/products/RStudio/] Choose the desktop version.
-
In RStudio, you will be working on either console or script windows. Inputs can be inserted in both windows but Console is where your results are displayed.
-
RStudio is nicer since you can see two additional windows which can be helpful in many ways.
Do the following. Write type the following lines on your R script and hit control + enter (there are other ways to run code in R) :
1+3*33**5
3^5As you can see, R has a intreprative command line. It can easily perform basic arithematic operations. Try some other calculations. (Note, you can directly run code in console)
Also, you can assign values to a variable we will use "<-" to assign values. (Can also use "=" but we try to avoid.)
x<- 7
x[1] 7
y<- 3^2
x + y[1] 16
If you check on the window to your right on the environment tab, aka workspace, you will see x and y as values. You can check what is on your workspace by ls()
- You can alter the workspace by reassigning the value
x<- 1000
x<- x *3
x[1] 3000
- You can also remove a particular value by
rm(x)- OR
remove(y)-
Commenting is the most important part of writing code. Comments can be useful to give instructions or explain what particular line of code does.
-
In R, you can write comments by beginning the line with
#
# The following code adds two numbers
3+5 # 3 (first number) comes from x values and 5 (second) from y[1] 8
The code runs with the command (which is ignored by R)
- If you need to clear your console you can press
cltr + l. It clears the texts from the console but not from the workspace.
Working directory is the location on your hard drive thar R is currently accessing. This can be changed anytime. In RStudio, you can see this on the bottom right window under files tab.
#getting current working directory
getwd()[1] "C:/Users/Machine/Documents/R/R Tutorial"
- Set working directory by:
setwd(#location on your computer) - Can also view the working directory by:
dir()[1] "Data manupulation Presentation.Rmd"
[2] "Data_manupulation_Presentation.html"
[3] "R Tutorial.Rproj"
[4] "R_Tutorial Presentation-figure"
[5] "R_Tutorial Presentation.html"
[6] "R_Tutorial Presentation.Rpres"
[7] "R_Tutorial_Presentation.html"
Data in R is in a vector form (for now). Concatinate (c) numbers as entries of vectors:
x<- c(1,3,7,2,9,5)
y<- c(12,9,11, 14,7)Check sombe basic information about your data
length(x)[1] 6
length(c(x,y))[1] 11
- Select particular data point in your vector
x[5][1] 9
- Or remove a particular value
y[-2][1] 12 11 14 7
- Change value or even add extra datapoints
y[2] <- 10
y[6] <- 13 #earlier y only had 6 data pointsMore on this later but just an example of what can be done with vectors
xycol<- cbind(x,y) # x and y have to be same length
xycol x y
[1,] 1 12
[2,] 3 10
[3,] 7 11
[4,] 2 14
[5,] 9 7
[6,] 5 13
xyrow<- rbind (x,y)
xyrow [,1] [,2] [,3] [,4] [,5] [,6]
x 1 3 7 2 9 5
y 12 10 11 14 7 13
Check to see what list does
list(4,3)
first_list<- list(xcol =x,ycol=y)
first_listNotice the difference of the
first_list[2]$ycol
[1] 12 10 11 14 7 13
first_list[[2]][1] 12 10 11 14 7 13
X= matrix(c(1,2,4,2,2,3,2,1,2,4,2,4,1,3,4), ncol= 5, byrow =T)
X [,1] [,2] [,3] [,4] [,5]
[1,] 1 2 4 2 2
[2,] 3 2 1 2 4
[3,] 2 4 1 3 4
By default, matrix is arranged by column. Also, you need to assign the number of columns (or rows). Like vectors, matrix elements can be located or parsed by using:
X[ ,3 ] #First entry is row and second is column index[1] 4 1 1
X[2,3][1] 1
Table commands tabulates the instances of each datapoint occurances
table(c(1,2,4,2,2,3,2,1,2,4,2,4,1,3,4))
1 2 3 4
3 6 2 4
prop.table(table(X)) # converts the counts into probabilityX
1 2 3 4
0.2000000 0.4000000 0.1333333 0.2666667
R data comes in following formats:
-
Numeric/Integers: Numeric values including integers and decimals
-
Strings : Characters, (exclude if they are factors)
-
Factors : Levels of classification. Eg. male/female, red/yellow/blue, etc
-
Boolean : True/False (assigns 1 for true and 0 for false). Can add/subtract boolean in R
-
Date/Time : Self explanatory. Try
sys.date()andsys.time()
R can perform matrix operations. Try the following on your console
x<-1:3
y<-4:6
z<-7:9
M<-cbind(x,y,z)
N<-matrix(c(10,11,12,50,51,52),3,2,byrow='true')Verify the following:
dim(N) #Returns the dimensions of N.
dim(N)[1] #Number of rows in N.
dim(N)[2] #Number of cols in N.Try the following codes: Comments are provided as hints
M*M #Elementwise multiplication
M%*%N #Matrix multiplication
M^2 #Square each element
N+7 #Add 7 to each element of M
M%*%M #Square M using matrix multiplication
t(M) #Transpose of A
diag(M) #Returns the diagonal of M as a vector.
diag(3) #Creates the 3x3 identity matrix.Try creating more matrices and try other operations
What does following command do?
M^(-1)What does the following "solve" command do?
A<- matrix(c(2,3,4,2), nrow =2)
solve(A)- Hint: Try Solve command for a non- square matrix. Try it again for matrix M
More solve:
solve(A)%*%A
B =solve(A,diag(2)) #remember what diag(2) does?
B #What is B?- More hint: solve is solving something. Matrix multiply A and B to find out. Shouldn't be that hard.
A^(-1) is not same as solve(A)
These are equivalent to your true false statements.
- Equal == [careful here, double equal sign]
- Not equal !=
- Greater than/equal >/>=
- Less than/ equal </<=
Run the code and see the results.
3==7
3!=7
3<7
3<=7
x=1:10
x==7
x<=7
x[x<=7] #Understand this line of code please- And [
&] -- Or [|] -- Not [!]
Does the word "truth table " ring a bell?
TRUE & FALSE[1] FALSE
TRUE | FALSE[1] TRUE
!TRUE[1] FALSE
P = c(TRUE, TRUE, FALSE, FALSE)
Q = c(TRUE, FALSE, FALSE, TRUE)
cbind(P, Q, "P&Q"= P&Q, "P_or_Q" = P|Q, "Not_P" =!P) P Q P&Q P_or_Q Not_P
[1,] TRUE TRUE TRUE TRUE FALSE
[2,] TRUE FALSE FALSE TRUE FALSE
[3,] FALSE FALSE FALSE FALSE TRUE
[4,] FALSE TRUE FALSE TRUE TRUE
If else statements in R
if(5>1){print('Yes, it is.')}[1] "Yes, it is."
if(5>1){print('Yes, it is.')}else{print('No, it is not')}[1] "Yes, it is."
if(5==1){print('Yes, it is.')}else{print('No, it is not')}[1] "No, it is not"
We have already used many functions available in R.
- functions are in the form
function()such ascbind()ormatrix() - such functions are available in
R baseintegrated in plain R - thre are many statistical functions in R such as
mean(),median(), etc. (more on this later) - you can create your own function as well. For eg.
operation<- function(x,y){
sumops= x+y
difops= x-y
prodops=x*y
quotops=x/y
list(sum =sumops, diff = difops, prod = prodops, quot = quotops)
}
operation(3,22)$sum[1] 25
R is equipped with many statistical functions. Other functions will be introduced as we move forward
normal_data =rnorm(1000,100,15) #sample of size 1000 from a normal distribution with mu=100 and sigma=15
mean(normal_data) #Computes the sample mean.[1] 99.82245
sd(normal_data) #Computes the sample st dev.[1] 15.12774
sum(normal_data) #Computes the sum of the elements of x.[1] 99822.45
Since R is an open source software and many users in the community are contributing, it is growing. R packages are developed by a variety of people in the community, academia, industry, and software companies and storing them in CRAN repositories, GitHub or on some websites. R packages could be thought of as functions created by others that are not already availabe in base R.
So they have to be installed in the machine using install.packages("package_name") (only once) and then called in before every section using library(package_name)
Example of R pacakges:
- ggplot
- e1071
we will discuss packages and their use in the future
- Using dataset
- Dataset available in R base
- Plots