R has several built-in data types. Each one has specific properties…
x <- 5 # Numeric data type
print(x)
y <- "Hello, World!" # Character data type
print(y)
z <- TRUE # Logical data type
print(z)
w <- factor(c("A", "B", "C", "A")) # Factor data type
print(w)
date <- as.Date("2022-03-01") # Date/time data type
print(date)
Variables/data can be grouped into different structures: vectors, lists, matrices and dataframes
Try these out:
my_vector <- c(1, 2, 3)
my_vector2 <- c(1, "words", 3)
my_list <- list(1, "words", 3)
print(my_vector2); print(my_list)
Elements in data structures are ordered numerically, even if they are text, and are read left-to-right and up-to-down An element’s index is its position in the data structure
If an element is in the first position, its index is one. If an element is the second position, its index is two, and so on. This might sound obvious, but some languages start indexing at zero (e.g., Python, JavaScript)
Try getting the value of an index:
test_index = c(99, 500, 123, 2.5)
test_index[1]; test_index[4]; test_index[5]
Try getting the index of a particular value:
which(test_index == 99)
which(test_index == 456)
Dataframes are tables that hold collections of potentially different data types and have column names to identify what the data are.
people = c("Alex", "Barb", "Carl") # col 1
ages = c(19, 29, 39) # col 2
df = data.frame(people, ages)
names(df) = c("Name", "Age")
print(df)
View(df)
summary(df)
dim(df) # rows by columns
Columns in dataframes are accessed using ‘$’
df$Age
Individual elements are accessed using their row and column position/index. The first number is the row and the second number is the column
df[1,1]; df[1,2]; df[2,1]
df$Age[1]
Multiple values can also be accessed
df[1:2, 1]
df$Name[1:2]
With this, you can change specific values:
df$Age[3] = 99
df$Name[1] = "Alexander"
print(df)
New columns can be added:
df$AgeSquared <- df$Age * df$Age
df$EyeColor <- c("blue", "green", "brown")
print(df)
Column values can be changed:
df$EyeColor[1] <- "orange"
df[1, 4] <- "red"
Columns can be removed. There are many different ways of doing this. It depends on what you find best. Be careful when doing this. You may make a mistake and accidentally overwrite your data with the wrong values. I almost always make a copy.
df2 = subset(df, select = -c(EyeColor))
View(df2)
df3 = df[ -c(4) ] # df2 = df[ c(1,2,3) ]
View(df3)
R comes with several built-in datasets:
data()
data(mtcars)
data(USArrests)
data(ChickWeight)
These are helpful when practicing working with data and running analyses
data <- ChickWeight
View(ChickWeight)