Basic R

Jake Feldman

Data Types and Structures

-Data comes in many different forms

-Some of the common typed include

-Store multiple pieces of info in a data structure

Creating Numeric Variables

-Variables are used to store or keep track of information

-When you read data into R, you’ll need to use a variable to store this info

#We use the = command to set a variable equal to something. 
#Now the variable a, for example, will be equal to 5 as long as 
#I don't change it
a=5
b=4.2
c=5.432
#Running the name of the variable will show you what is equal
a
[1] 5
#The class command shows the type of the variable
class(c)
[1] "numeric"

-In order to create variables you need to run the line of code where they are defined or manipulated

-I will show you how to run code after this introduction (this is one advantage of R Studio)

Creating Character Variables

#Notice the use of quotes (Can be single or double) to 
#create a string variable
e="TRUE"
#You should think of this variable as 4 letters
e
[1] "TRUE"
#Get the type 
class(e)
[1] "character"

Creating Logical Variables

#Needs to be all caps. 
v=TRUE
v
[1] TRUE
class(v)
[1] "logical"

A Few Notes

-Make variable names meaningful - makes code more readable.

-Variables are case sensitive.

-No spaces but you can use numbers, ‘.’, ‘_’

-The # stands for comments. Tells you and others what you are tring to accomplish. All of your code should be commented.

-Commands in the editor are seperated by semi-colon or new line. We will generally use the latter.

Creating Vectors

#Vectors store data that is all of the same type in single dimension;
#think a single row or column of data. The c() means combine and is how
#we create a vector
colors= c("yellow", "green", "red")
colors
[1] "yellow" "green"  "red"   
class(colors)
[1] "character"
#Trying to create a vector with different data types
weirdColors = c(5, "yellow")
weirdColors
[1] "5"      "yellow"

Factors

#Create a factor. Notice the input is a vector
crazyColors = factor(c("yellow", "red", "yellow", "green", "red", "yellow"))
#Printing a factor gives you the various categories
crazyColors
[1] yellow red    yellow green  red    yellow
Levels: green red yellow
#The levels command gives us the labels in a vector
levels(crazyColors)
[1] "green"  "red"    "yellow"

Factors Continued

#Factors store a vector as well as the distinct elements of the vector
#as categories or labels
crazyColors = factor(c("yellow", "red", "yellow", "green", "red", "yellow"))

#Get the number of different categories using the nlevels command
nlevels(crazyColors)
[1] 3
#Get the counts under each label
table(crazyColors)
crazyColors
 green    red yellow 
     1      2      3 

Creating Data Frames

#Data frames let us store data is 2D (rows and columns) like an Excel
#spreadsheet.  Here I show to create one from scratch.  
QBA = data.frame(names=c("Jake", "Jonny", "Jill"), 
  height = c(152, 171.5,165), fromCali = c(TRUE,TRUE, FALSE) )
QBA
  names height fromCali
1  Jake  152.0     TRUE
2 Jonny  171.5     TRUE
3  Jill  165.0    FALSE
#Use str() to the structure of the data frame
str(QBA)
'data.frame':   3 obs. of  3 variables:
 $ names   : Factor w/ 3 levels "Jake","Jill",..: 1 3 2
 $ height  : num  152 172 165
 $ fromCali: logi  TRUE TRUE FALSE

Importance of Data Types

-When working with a data frame the first command you should run is the str() command

Arithmetic Operations

# Addition
5+3
[1] 8
c(3,3) + c(1,1)
[1] 4 4
a=5
b=3
c=a+b
c
[1] 8
#Subtraction
5-3
[1] 2
c(3,3) - c(1,1)
[1] 2 2
#Multiplication
5*3
[1] 15
c(3,3) * c(1,1)
[1] 3 3
#Division
5/3
[1] 1.666667
c(3,3) / c(1,1)
[1] 3 3
#Raising to a power
5^3
[1] 125
c(3,3) ^ c(1,1)
[1] 3 3
#Getting remainder
5%%3
[1] 2
c(3,3) %% c(1,1)
[1] 0 0
#Getting quotient
5%/%3
[1] 1
c(3,3) %/% c(1,1)
[1] 3 3

Boolean Operations

-This syntax will be most useful when we start writing SQL queries

#Greater than
5>3
[1] TRUE
c(3,3) > c(1,1)
[1] TRUE TRUE
#Greater than or equal
5>=3
[1] TRUE
c(3,3) >= c(1,1)
[1] TRUE TRUE
#Less than
5<3
[1] FALSE
c(3,3) < c(1,1)
[1] FALSE FALSE
#Less than or equal
5<=3
[1] FALSE
c(3,3) <= c(1,1)
[1] FALSE FALSE
#Greater than
5>3
[1] TRUE
c(3,3) > c(1,1)
[1] TRUE TRUE
#Equal
5==3
[1] FALSE
c(3,3) == c(1,1)
[1] FALSE FALSE
#Greater than
5>3
[1] TRUE
c(3,3) > c(1,1)
[1] TRUE TRUE
#Not equal
5!=3
[1] TRUE
c(3,3) != c(1,1)
[1] TRUE TRUE

Take Homes

-Pieces of data/info come in different formates, which we need to conscious of.

-Varible help us both store and keep track of data.

-We can use variables to store

-We can do basic arithmetic on variables.

How Does This Help Me???

-We are building up the necessary tools to create something meaningful

-Creating and manipulating variables will be at the core of all the code we write

Next Lecture

-Slicing and indexing data frames

-How do I pick out specific parts (columns or rows)?

-How do I change specific parts?

-How do I access/change column names?

-Reading in our first data set.