Data Types in R

R has several built-in data types. Each one has specific properties…

x <- 5 # Numeric data type
print(x)
y <- "Hello, World!" # Character data type
print(y)
z <- TRUE # Logical data type
print(z)
w <- factor(c("A", "B", "C", "A")) # Factor data type
print(w)
date <- as.Date("2022-03-01") # Date/time data type
print(date)

Variables/data can be grouped into different structures: vectors, lists, matrices and dataframes

  • Vectors = list of one type of data (like a data column)
  • List = list of anything, even different data types
  • Matrix = 2D collection of numbers (rows and columns)
  • Data frame = list of vectors (like a list with all of your columns in your data file!)

Try these out:

my_vector <- c(1, 2, 3)
my_vector2 <- c(1, "words", 3)
my_list <- list(1, "words", 3)
print(my_vector2); print(my_list)

Accessing Data Structures

Elements in data structures are ordered numerically, even if they are text, and are read left-to-right and up-to-down An element’s index is its position in the data structure

If an element is in the first position, its index is one. If an element is the second position, its index is two, and so on. This might sound obvious, but some languages start indexing at zero (e.g., Python, JavaScript)

Try getting the value of an index:

test_index = c(99, 500, 123, 2.5)
test_index[1]; test_index[4]; test_index[5]

Try getting the index of a particular value:

which(test_index == 99)
which(test_index == 456)

Dataframes in R

Dataframes are tables that hold collections of potentially different data types and have column names to identify what the data are.

  • The data within each column are the same type
  • But the data types between each column can be different Try creating and working with this dataframe below:
people = c("Alex", "Barb", "Carl") # col 1
ages = c(19, 29, 39)  # col 2
df = data.frame(people, ages)
names(df) = c("Name", "Age")
print(df)
View(df)
summary(df)
dim(df) # rows by columns

Accessing Dataframes

Columns in dataframes are accessed using ‘$’

df$Age

Individual elements are accessed using their row and column position/index. The first number is the row and the second number is the column

df[1,1]; df[1,2]; df[2,1]
df$Age[1]

Multiple values can also be accessed

df[1:2, 1]
df$Name[1:2]

With this, you can change specific values:

df$Age[3] = 99
df$Name[1] = "Alexander"
print(df)

Modifying dataframes

New columns can be added:

df$AgeSquared <- df$Age * df$Age
df$EyeColor <- c("blue", "green", "brown")
print(df)

Column values can be changed:

df$EyeColor[1] <- "orange"
df[1, 4] <- "red"

Columns can be removed. There are many different ways of doing this. It depends on what you find best. Be careful when doing this. You may make a mistake and accidentally overwrite your data with the wrong values. I almost always make a copy.

df2 = subset(df, select = -c(EyeColor))
View(df2)
df3 = df[ -c(4) ]   # df2 = df[ c(1,2,3) ]
View(df3)

Built-In Datasets from R

R comes with several built-in datasets:

data()
data(mtcars)
data(USArrests)
data(ChickWeight)

These are helpful when practicing working with data and running analyses

data <- ChickWeight
View(ChickWeight)