The R Novices badge; learn it before you earn it.

print(summary)

[1] "If you know R only as a letter from the alphabet, you'll be surprised to learn it is an entire language. That is, a programming language for statistical computing and graphics. In the We R Champions masterclass series, the We R Novices masterclass lays the foundation for the other masterclasses. You'll not only learn the R syntax and a bit of vocabulary, but also its primary data science dialect, the 'tidyverse'. Although we understand there are few things more exciting than R's syntax, you'll never grasp R without also touching upon the utterly boring things it enables you to do, such as interactive visualizations. Yawn. By the end of the masterclass we'll give you a sneak preview of what's to come in the rest of this series."

1 Welcome

1.1 Academic countdown

Our online version of an academic quarter. We’ll take five minutes for everyone to get in and get settled. Are you ready?

All done? Familiarize yourself with the document.

1.2 Expectations

1.3 Prerequisites

None.

But of course, I successfully…

Finished the We R Preppers guide.

1.4 Reading guide

Sections.

A plenary activity, because we’re all in this together.
A demonstration that integrates everything you’re about to learn.
A self-guided activity, including resources and exercises.
A break, for you to grab another coffee.

Text boxes.

Yeh. Tells you what you can expect to find in a chapter’s resources. Nah. And what not.

Base versus tidy. Explains you key differences between base R and the tidyverse.

Give me more. Adds some more advanced information, for those who can’t get enough.

2 R

Yeh

All the basics of R that help you understand the language and let you work with the tidyverse.

Nah

Everything you don’t (necessarily) need when working with the tidyverse, such as…

Indexing with $, [], and [[]].
Procedural logic, such as for loops and if statements.
Writing functions.
Simulations.

Exactly, we don’t cover the things that most R learners find most difficult. Many of you will probably never miss it, but as for some of you it can be quite powerful, we’ll cover it in the We R Programmers masterclass.

2.1 Demonstration

The demonstration gives you the opportunity to see everything you’ll learn in harmony, before we break it apart into all the various building blocks that you can explore on your own.

# This script celebrates our first lines of R code

# Create cake object, which is added to environment tab
cake <- 3.14  
cake

# Create slices vector
slices <- c(cake / 3, cake / 3, cake / 3)  # a vector object
slices

# Plot slices with pie function
pie(x = slices)  # warning: pie charts are ridiculous, only use them for celebrations

# Let's do it more efficiently
?"rep"
slices <- rep(x = cake / 3, times = 3)
slices

# Let's separate 'ingredients' from 'recipe'
n_participants <- 20
slice <- cake / n_participants
slices <- rep(x = slice, times = n_participants)
pie(x = slices)

# Now, let's give this cake a bit more body
library("plotrix")  # first run install.packages("plotrix") once (we like to leave it out of our script)
plotrix::pie3D(x = slices, explode = .1, main = "There's cake for everyone!")

2.2 Resources

2.2.1 Projects

RStudio

Going to use R for a new project?

Create a project
Create a script
Comment the script
Write the script
Save the script

2.2.2 Functions

It is absolutely key to understand the idea of a function. Luckily, the idea is straightforward. You put something in a function, and you get out something else.

Functions always look like this: function(argument_name_1 = input_1, argument_name_2 = input_2). The argument names tell you what you need to input, and are separated by commas. Below, we dissect the mean, print, length, and round functions.

Argument names

The mean function requires one primary argument, named x. The x argument takes a vector with data and computes the mean.

some_data <- c(1, 4, 4, 2.356, 25)  # create vector with data
mean(x = some_data)
## [1] 7.2712

The length and round functions also take x as their first argument, and the round function takes an additional argument.

length(x = some_data)
## [1] 5

round(x = some_data, digits = 3)
## [1]  1.000  4.000  4.000  2.356 25.000

Seriously busy persons can leave the argument names out, as R will nevertheless interpret the input.

mean(some_data)
## [1] 7.2712

However, seriously busy persons need to take care of the order in which the arguments appear, which we’ll show you in a bit.

Argument defaults

Often, functions have default arguments. If you don’t give an input for such arguments, the function will use the default.

mean(x = some_data)
## [1] 7.2712

mean(x = some_data, na.rm = FALSE, trim = 0)  # the mean function uses these default arguments, but you can change them, see below
## [1] 7.2712

Of course, you may change these defaults. Below, we set the na.rm argument to TRUE in order to remove missing values (called NA’s).

some_data <- c(1, 4, 4, NA, 2.356, 25)  # create vector with (missing) data
mean(x = some_data)
## [1] NA

mean(x = some_data, na.rm = TRUE)
## [1] 7.2712

Here’s an example with the print function. The print function prints any R object to your console. Conveniently, most of the time you won’t need to explicitly call the print function.

print(x = some_data)  # explicit call
## [1]  1.000  4.000  4.000     NA  2.356 25.000

some_data  # implicit call
## [1]  1.000  4.000  4.000     NA  2.356 25.000

print(x = some_data, digits = 2)  # explicit call with additional arguments
## [1]  1.0  4.0  4.0   NA  2.4 25.0

Argument order

When you name the arguments, the order in which you input the arguments is not important. Let’s trim 20% of the observations from each end of the input, before the mean is computed.

mean(x = some_data, trim = .2, na.rm = TRUE)
## [1] 3.452

mean(na.rm = TRUE, x = some_data, trim = .2)
## [1] 3.452

However, if you do not name the arguments, the order does matter!

mean(some_data, .2, TRUE)  # works like a charm
mean(TRUE, some_data, .2)  # fails miserably

Give me more. In R, functions can get funky. For instance, some functions have an ellipsis—or ...—argument. It signals that the function accepts any number of additional arguments. Check the help file for the mean function for instance (?"mean"). Also, some functions take other functions as an argument. Yes, Matryoshka functions, why not. One day you’ll learn why this is all very powerful.

2.2.3 Packages

Functions live in packages. Base R comes with a bunch of pre-installed packages, such as the base package, and you can install additional packages and thus extend the functionality of R. There are currently 16618 available. Crazy! And even crazier, we’ll cover most of them in this masterclass.

No, of course we won’t.

Flattening the curve? Not quite. (source)

Installing and loading packages

Packages must first be installed and can then be loaded. As soon as the package is loaded, all its functions become readily available.

install.packages("package_name_here")  # installs a package
library("package_name_here")  # loads a package
library(package_name_here)  # also loads a package
remove.packages("package_name_here")  # removes a package
update.packages()  # updates all packages
installed.packages()  # see which packages are installed
available.packages()  # see which packages are available

Installation of a package generally only needs to be done once for every project you are working on. On the other hand, loading a package generally needs to be done every time you reopen RStudio.

Give me more. Importantly, as anyone can contribute a package, functions from different packages may share the same name. Therefore, when loading a package, R ‘masks’ functions from previously loaded packages if those contain functions with an identical name. To ensure that you’re calling the right function, there are three options:

Load packages in a smart order. If you need packages education and fish, which both have the function school, but you only need to use that function from the education package: load package fish before you load package education. You may run search() to see in which order R searches packages for functions.
Explicitly call the required package. When you use a function, you can tell R directly in which package it must find it. It’s simple: call package::function() rather than function().
Don’t install additional packages. But only do this if you like life to be utterly boring.

Here, we choose the second option. It’s pedagogically sound and you will never make a mistake. For convenience, we’ll only call the package if it is not part of base R (that is, if we used the library function to load it).

base::mean()  # the mean function is in the base package, which is pre-installed
mean()  # so we'll just call it like this
tibble::tibble()  # tibble is in the tibble package from the tidyverse collection, so we'll call it like this

If you don’t want to understand this in depth (you really don’t), don’t look here, and definitely don’t look here.

2.2.4 Help (inside R)

Cheat sheets

We love cheat sheets. Find them online or download them directly from your RStudio menu: Help > Cheatsheets. However, you’ll have to accept that especially the tidyverse dialect (more on that in the next chapter) is developed at a higher pace than the cheat sheets are updated.

Vignettes

We love vignettes. They are not as complete as help files, but provide an excellent start for getting to know a package and some of its most important functions. Unfortunately, many packages don’t come with a vignette.

vignette("dplyr")

Examples

Some functions come with an example. It can be useful to see how a function can be used.

example("mean")

Help

R has a help function with a convenient shortcut. Use it to get help on functions and packages. Be warned, whereas the shortcut is convenient, R’s help pages are generally not.

# Find function or package
apropos("mean")  # find functions with 'mean' in their name
find("mean")  # find package that contains the function 'mean'

# Get help on a function
help("help")  # get help on the help function
?"help"  # shortcut to get help on the help function
?"?"  # shortcut to get help on the shortcut for the help function
?"="  # you can even get help on operators
?help  # also works most of the time, but not always (try ?for)
??"mean"  # find vignettes, code demonstration, help pages

# Get help on a package
help("dplyr")  # r documentation for dplyr
?"dplyr"  # shortcut
help(package = "dplyr")  # help pages for all functions in dplyr
library(help = "dplyr")  # general information on dplyr

Seized by despair?

????"mean"

Give me more. The SOS package provides advanced search functionality.

library("sos")
???"mean"
???"linear regression"  # too many results: use a more restrictive search term
???"repeated measures"(maxPages = 1)

2.2.5 Assignment

You can store and reuse pretty much anything in R, by using the assignment operator. You’ll use it a lot. There are various ways you can assign values to an object, but luckily, you only need one.

# Assignment operators
a <- 1  # this is the best, forget the rest
1 -> a
a = 1
assign(x = "a", value = 1)

Remember that we do use the = operator, but when specifying the arguments in a function.

We can assign all kinds of things to an object, although it only makes sense if we want to store and reuse it.

a <- 1
b <- mean(some_data, na.rm = TRUE)
c <- help("dplyr")  # doesn't make much sense

For clarity, use spaces around the assignment operator.

a<-1  # would this mean...
a <- 1  # ...this...
a < -1  # ...or this?

Give me more. Object names can be numbers, even sentences. Think twice before using it.

1 <- 2  # fails, thankfully
`1` <- 2  # backticks do the magic
1 - `1`  # 1 - 2
`This Variable Name is Publication Ready!` <- 2

2.2.6 Data types and structures

R can handle several different data types and structures. Below, we summarize the types and structures that come with base R. You’ll recognize the data frame as the data structure that you use for data analysis. Nonetheless, it’s good to know that other structures exist too.

Importantly, notice that data types are nested in data structures, and some data structures are nested in other data structures. Vectors hold scalars, data frames and matrices hold vectors, and list can simply hold anything.

Scalar

The four most basic data types are numeric, character, and logical. The first comes in two flavors, integer (e.g., 1 or 312) and double (e.g., 3.14). A character can be anything from "R" to "Some long questionnaire input.". A logical can be either TRUE or FALSE.

The data structure that holds a single value, whether numeric, character, or logical, is referred to as a scalar.

s_int <- 1L  # numeric (integer)
s_dbl <- 1  # numeric (double)
s_chr <- "R"  # character (some call it a string)
s_lgl <- TRUE  # logical

The logical type can be simplified to T for TRUE and F for FALSE, but this is not preferred. Also, TRUE is stored as 1 and FALSE as 0.

2 * s_lgl
## [1] 2

Vector

Use the c() function to combine scalars into a vector.

v_dbl <- c(s_dbl, 2, 3, 4)  # yes, you can simply insert the previously created numeric scalar
v_chr <- c("I", "love", s_chr, "!")  # or insert the previously created character scalar
print(v_chr)
## [1] "I"    "love" "R"    "!"

Read this if you want to get to the bottom of vectors.

Factor

The data types you’ll want to use most are numeric and categorical. So where is categorical? It’s here, it’s factor. The factor type is a special kind of integer vector that can hold categorical data with different levels. Here, we create a nominal factor vector.

v_sex <- c("male", "female", "female")
fct_sex <- factor(x = v_sex, levels = c("female", "male", "other"))
print(fct_sex)
## [1] male   female female
## Levels: female male other

Likewise, here we create an ordinal factor vector.

v_lev <- c("student", "professor", "student")
fct_lev <- factor(v_lev, levels = c("student", "phd_candidate", "professor"), order = TRUE)
print(fct_lev)
## [1] student   professor student  
## Levels: student < phd_candidate < professor

Data frame

Data frames are perfect for data analysis. You probably don’t want to use variable_a and so forth; pick a variable name that makes sense.

df <- data.frame(variable_a = v_dbl, variable_b = v_chr)  # here, we insert two previously created vectors
print(df)
##   variable_a variable_b
## 1          1          I
## 2          2       love
## 3          3          R
## 4          4          !

Matrix

What matrices are good for? Think of correlation matrices or adjacency matrices (used for specifying the links in a network).

m_dbl <- matrix(data = v_dbl, nrow = 2, ncol = 2)  # here, we insert a previously created vector
m_dbl_byrow <- matrix(data = v_dbl, nrow = 2, ncol = 2, byrow = TRUE)
print(m_dbl)
##      [,1] [,2]
## [1,]    1    3
## [2,]    2    4

print(m_dbl_byrow)
##      [,1] [,2]
## [1,]    1    2
## [2,]    3    4

Array

An array is a special type of matrix that stacks multiple matrices. Are you likely going to use it? No.

List

lst <- list(data_a = s_lgl, data_b = v_chr, data_c = m_dbl, data_d = df, data_e = list(v_dbl))  # here, we insert various previously created data structures
print(lst)
## $data_a
## [1] TRUE
## 
## $data_b
## [1] "I"    "love" "R"    "!"   
## 
## $data_c
##      [,1] [,2]
## [1,]    1    3
## [2,]    2    4
## 
## $data_d
##   variable_a variable_b
## 1          1          I
## 2          2       love
## 3          3          R
## 4          4          !
## 
## $data_e
## $data_e[[1]]
## [1] 1 2 3 4

Indeed, if you want to get funky, you can even nest a list in a list. Call it a tribute to Droste.

2.2.7 Coercion

Conveniently, we can force the one data type into the other if we need to. This is called coercion.

as.integer(c("3.14", "b"))
## Warning: NAs introduced by coercion
## [1]  3 NA

as.double(c("-1", "b"))
## Warning: NAs introduced by coercion
## [1] -1 NA

as.character(c(-1, NA))
## [1] "-1" NA

as.logical(c(0, 1, 2))
## [1] FALSE  TRUE  TRUE

as.factor(c("R", "JASP", "jamovi"))
## [1] R      JASP   jamovi
## Levels: jamovi JASP R

Give me more. Coercion is performed automatically when we try to create a vector with different data types. Values are then coerced to the value with the highest rank, where logical < integer < double < character. This is illustrated below.

c(TRUE, 33L, 3.14, "R")
## [1] "TRUE" "33"   "3.14" "R"

c(TRUE, 33L, 3.14)
## [1]  1.00 33.00  3.14

c(TRUE, 33L)
## [1]  1 33

2.2.8 Operators

There are many operators in R. Here’s a few that may come in handy.

# Arithmetic operators
1 + 2
## [1] 3

1 - 2
## [1] -1

1 * 2
## [1] 2

1 / 2
## [1] 0.5

1^2
## [1] 1

sqrt(1)  # oops, a function, the square root
## [1] 1


# Relational operators
1 == 2
## [1] FALSE

1 != 2
## [1] TRUE

1 < 2
## [1] TRUE

1 > 2
## [1] FALSE

1 <= 2
## [1] TRUE

1 >= 2
## [1] FALSE


# Logical operators
!TRUE
## [1] FALSE

TRUE & FALSE
## [1] FALSE

TRUE | FALSE
## [1] TRUE

Give me more. Modulo operations and integer divisions can also be really useful.

# Arithmetic operators
10 %% 3  # modulo operation (returns remainder 1)
10 %/% 3  # integer division (returns quotient 3)

2.2.9 Into the void

In data analysis, you’ll be mostly concerned with NA, used to indicate missing values. NA is used regardless of the type of data (character, logical, numeric). However, there’s more emptiness in R than just missing values.

NA  # NA stands for Not Available; indicates a missing element
## [1] NA

0 / 0  # NaN stands for Not a Number; indicates an incorrect computation
## [1] NaN

1 / 0  # Inf stands for Infinite; indicates an infinite element
## [1] Inf

c()  # NULL; indicates the absence of a vector or an empty vector
## NULL

You don’t need meditation to find the void:

void <- c(NA, NaN, Inf, NULL)
print(void)  # NULL pertains to vectors and is thus (silently) removed
## [1]  NA NaN Inf

is.na(void)  # both NA and NaN are considered missing elements
## [1]  TRUE  TRUE FALSE

is.nan(void)
## [1] FALSE  TRUE FALSE

is.infinite(void)
## [1] FALSE FALSE  TRUE

is.null(void)  # the vector void is not empty
## [1] FALSE

“Youth always tries to fill the void, an old man learns to live with it.” — Mark Z. Danielewski

2.3 Exercises

A good start

Make a habit of working in R projects.

Instruction

Create an R project and a script to work on these exercises.

Tips

Use efficient and clear names, and make sure to keep your scripts in the project folder.
Using R for different projects? Create different R projects, such that you can easily switch between projects. Attention supertaskers, you may even open multiple R projects in separate windows.

Data frame

Doing data analysis in R, you’ll work with data frames all the time. This is an exercise you should focus on.

Instruction

Create a data frame with five variables that each contain five observations. One logical variable, one double (numeric) variable, one character variable, one nominal categorical variable, and one ordinal categorical variable.

For each type, think of a variable that matches the type (e.g., sex as a nominal categorical variable), and give it a name that clearly communicates it.

Let’s make it look like real data and make sure each variable contains a missing value.

Tips

Data frames contain vectors, vectors contain scalars. Check scalars if you want to understand variable types. You’ll find all of it in the data types and structures section.
Categorical variables are represented by factors. Find them in the data types and structures section.
Missing values are indicated with NA. Check the into the void section.

Example

Here’s an example. Note that the eye_color variable was created as a character variable, the sex variable was created as a nominal variable, and the level variable was created as an ordinal variable.

##   body_length eye_color    sex         level control_condition
## 1       178.0  sky_blue   male       student              TRUE
## 2       163.1     brown female     professor              TRUE
## 3       191.8      blue female       student             FALSE
## 4       180.0  sea_blue   male phd_candidate             FALSE
## 5       175.5  greenish  other phd_candidate             FALSE

Fix it?

These exercises can be hard, but are important if you want to learn R. It doesn’t matter if you can’t do them during the masterclass, but make sure to work on them afterwards.

Instruction

Check and run the examples below.

Can you see what’s going on?
Do you think it does what it’s supposed to do? Why or why not? ‘It depends’ can also be answer.
If you think it doesn’t do what it’s supposed to do, can you fix it?
Stuck on one? Don’t waste all your time on it and continue.

Tips

Work through the resources in the previous sections to find the answers.

# 1
s <- "Is R case-sensitive?"
print(S)

# 2
r <- n <- o <- v <- i <- c <- e <- 1

# 3
r < -1

# 4
v <- c(1, 2, 3, NA)
mean(v)

# 5
v <- c(1, 2, 3, "NA")
mean(v)

# 6
v <- c("a", "b", "c", "d")
mean(v)

# 7
v <- c(TRUE, FALSE, TRUE, FALSE)
mean(v)

# 8
mean <- c(1, 2, 3, 4)
mean(mean)

# 9
v <- c("1", "2", "3", "NA")
as.numeric(v)

# 10
TRUE != FALSE & (TRUE == !TRUE | TRUE >= FALSE)

# 11
NA == NA

# 12
?"?"

# 13
????"mean"

Bug hunt

Instruction

Run the following line of code. What happens? Why? How did you notice?

data.frame(sex = factor(x = c("male", "female", "female"), levels = c("female", "male", "other"))

Tips

Don’t know how to see that something is wrong? Give special attention to the > and + signs in the left margin of the Console panel.

Solutions

You wish it would be easy like that.

Thankfully, you’re not the only one learning R. Ask and help each other. Use Piazza. Consult us. R is not easy, but we’re 100% sure you’ll grasp it when you put effort in it!

2.4 Break

xkcd.com

2.5 Discussion

Lingering questions or concerns? Use Piazza to follow up on our plenary discussion or post a new question.

3 Tidyverse

Yeh

Some of the tidyverse dialect’s distinctive syntax and vocabulary.

Nah

All the fun stuff you can do with your newly obtained knowledge; it will be covered in the other masterclasses. Check the sneak preview!

3.1 Demonstration

Again, let’s first go through a demonstration script. The data we read into R in this script is from a paper published in 2011 that aimed to show how flexibility in data collection, analysis, and reporting can dramatically increase false-positive rates (Simmons, Nelson, & Simonsohn, 2011). They demonstrate how flexibility on these different aspects can result in statistically significant evidence for a false hypothesis. Here we only read in their data, but in the next masterclass—We R Analysts—you will work with this data in the exercises yourself!

# Load the tidyverse collection of packages
library("tidyverse")

# Read in and view data
study_1 <- readr::read_delim("FalsePositive_Data_in_/Study 1 .txt", delim = "\t", escape_double = FALSE, trim_ws = TRUE)
study_1
tibble::view(study_1)
tibble::glimpse(study_1)

# Use the pipe operator %>%
study_1 %>% tibble::glimpse()

# %>% more useful when performing sequences of operations
x <- c(2, 3, 7, 6, 9, 4)
x %>%
  range() %>%
  sum() %>%
  sqrt()

# Subset data: columns
study_1 %>% dplyr::select(political)
study_1 %>% dplyr::select(contains("m"))

# Subset data: rows
study_1 %>% dplyr::slice(c(5:10)) # by their position
study_1 %>% dplyr::filter(dad >= 60) # by a certain criteria
study_1 %>% dplyr::filter(cond == "control")

3.2 Sneak preview

This masterclass covers the R basics and the tidyverse dialect. These are the foundations of much more exciting visualizations and computations, which will be covered in the next masterclasses. The following code gives a sneak preview of what’s to come.

# Load additional packages and data
library("tidymodels")
data("gss")

# View data
gss %>% print()

# Transform data (We R Transformers)
gss %>%
  dplyr::mutate(income = stringr::str_extract(income, "[\\d]{1,5}")) %>%
  dplyr::mutate(income = as.numeric(income)) %>%
  dplyr::mutate(weighted_income = income / hours)

# Explore and visualize data (We R Visualizers)
p <- gss %>%
  ggplot2::ggplot(aes(x = year, y = hours, color = class, size = weight, shape = sex)) +
  ggplot2::geom_point()
p

# Create standard error function (We R Programmers)
se <- function(x) {
  sd(x) / sqrt(length(x))
}

# Summarize data (We R Analysts)
gss %>%
  dplyr::group_by(class, sex) %>%
  dplyr::summarize(hours_mean = mean(hours),
                   hours_se = se(hours))

# Perform simple regression (We R Analysts)
lm_gss <- gss %>% stats::lm(hours ~ sex + class, data = .)
lm_gss %>% broom::tidy()
lm_gss %>% broom::glance()
lm_gss %>% broom::augment()

# Check assumptions (We R Analysts)
library("ggfortify")
lm_gss %>%
  ggplot2::autoplot() +
  ggplot2::theme_minimal()

# Publication-ready visualization (We R Visualizers)
gss %>%
  ggplot2::ggplot(aes(x = class, y = hours, linetype = sex)) +
  ggplot2::geom_violin() +
  ggplot2::theme_classic() +
  ggplot2::labs(x = "Socioeconomic Class",
                y = "No. of Hours Worked",
                linetype = "Sex",
                title = "You'll Figure It Out",
                subtitle = "These violins look more like flutes.") +
  ggplot2::ggsave(filename = "gss.pdf",
                  width = 15,
                  height = 15,
                  units = "cm",
                  dpi = 300)

# Interactive visualization (as promised!)
library("plotly")
p %>% plotly::ggplotly()  # created from the previous visualization

# And a newly created 3d scatter plot to whet your appetite
gss %>%
  plotly::plot_ly(x = ~year, y = ~hours, z = ~class, color = ~class, symbol = ~sex, symbols = c(15, 16)) %>%
  plotly::add_markers()

3.3 Resources

3.3.1 Tidyverse packages

You can’t be introduced to the tidyverse without loading the tidyverse package.

install.packages("tidyverse")  # installs the tidyverse collection
library("tidyverse")  # loads the tidyverse collection
tidyverse_update()  # updates the tidyverse collection
tidyverse_packages()  # shows the tidyverse packages

A collection of R packages

Although to be honest, the tidyverse is not really a package, but rather a collection of packages designed for data science. Installing and loading it installs and loads all the packages in the collection. We’ll use various of the included packages throughout the masterclasses, including:

readr for reading in data
tibble for the tidyverse version of a data frame
dplyr for data transformation
ggplot2 for data visualization

An R dialect

The tidyverse is the currently most popular R dialect, and is being developed at breakneck speed. Reading and writing tidyverse code is to reading and writing base R code, as being read to by your older sister taking drama classes is to trying to understand your grumpy mumbling younger brother looking for missing Lego pieces.

A single-purpose-single-method philosophy

Whereas base R allows you to do virtually anything, and in annoyingly many different ways, the tidyverse is specifically built for the purpose of data science, and tries to give you a single ‘best’ method for performing an action. That means that the tidyverse is much more restricted than base R, but makes the insurmountable power of the R language—that makes paid and proprietary software like SPSS and Stata cry of embarrassment—much more accessible.

Base versus tidy. When starting to learn R, it can be hard to distinguish base R functions from tidyverse functions. Don’t worry, you’ll learn to distinguish them simply by using R. Passionate bird watchers may have an advantage though; they know how to spot subtle differences:

Tidyverse functions are always lowercase and use no special characters other than underscores, such as as_factor. There is no such uniformity in base R functions. If you spot a beautifully feathered as.factor defending its nest, you can be sure its songs won’t contain a single tidy verse.
Although the pipe operator %>% (more on that below) is not tied to the tidyverse, you won’t see them being used a lot in base R.

3.3.2 Tibble

The tibble is the tidyverse version of a data frame. On the surface the differences are small, and some people don’t bother giving them different names. Creating a tibble works very similar to creating a data frame (but compare how both are printed).

tb <- tibble::tibble(variable_a = v_dbl, variable_b = v_chr)
print(tb)
## # A tibble: 4 x 2
##   variable_a variable_b
##        <dbl> <chr>     
## 1          1 I         
## 2          2 love      
## 3          3 R         
## 4          4 !

If you already have a data frame, you can coerce it into a tibble.

tb <- df %>% tibble::as_tibble()

3.3.3 Pipe

Tidyverse functions gratefully exploit the pipe operator: %>%. The pipe operator moves the object to its left into the first argument of the function on its right. Let’s compare the syntax for computing the mean of a vector with an without the use of the pipe.

some_data <- c(1, 4, 4, 2.356, 25)
mean(x = some_data)  # without pipe operator
## [1] 7.2712

some_data %>% mean()  # with pipe operator
## [1] 7.2712

Now, let’s create the vector c(1, 2, 3) using the pipe operator:

1 %>% c(2) %>% c(3)
## [1] 1 2 3

Here, 1 %>% c(2) evaluates to c(1, 2), such that we get c(1, 2) %>% c(3), which in turn evaluates to c(c(1, 2), 3). Think this through!

You’re right to think that in this example the pipe operator only complicates things. But now let’s do a series of transformations on a data frame or tibble, first in base R and then in the tidyverse. You don’t need to understand what it does (though you may try if you want), but notice that while the results are very similar, the syntax is very different. The example uses the iris data set that is readily available in R.

# base R transformations without using the pipe operator
iris_2 <- iris[iris[, 1] < 5, c(3, 5)]
iris_2[, 1] <- log(iris_2[, 1])
by(iris_2, list(iris_2$Species), function(x) mean(x$Petal.Length))
## : setosa
## [1] 0.3384166
## ------------------------------------------------------------ 
## : versicolor
## [1] 1.193922
## ------------------------------------------------------------ 
## : virginica
## [1] 1.504077

# piped tidyverse transformations
iris %>%
  dplyr::filter(Sepal.Length < 5) %>%
  dplyr::select(Petal.Length, Species) %>%
  dplyr::mutate(Petal.Length.Log = log(Petal.Length)) %>%
  dplyr::group_by(Species) %>%
  dplyr::summarize(average = mean(Petal.Length.Log))
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 3 x 2
##   Species    average
##   <fct>        <dbl>
## 1 setosa       0.338
## 2 versicolor   1.19 
## 3 virginica    1.50

First, the tidyverse syntax explicitly communicates the function of the code (e.g., filter, summarize). Second, the pipe operator enables you to read the functions sequentially (that is, object %>% function() %>% function()), whereas without the pipe operator functions are often nested within functions (that is, function(function(object))). As a result, the tidyverse code actually reads like a verse: first filter specific rows, then select specific columns, then mutate one column into a new column, then group_by one of the selected columns to summarize the newly created one.

You may safely skip this box; we’ll remind you of it when you need it.

Base versus tidy. We’ll continue to use lots of base R functions. The pipe operator works fine with most of those functions, but there are two things you must know.

Piping with a placeholder

The pipe operator moves the object to its left into the first argument of the function on its right. Whereas in tidyverse functions the first argument is reserved for the data (stored in a tibble), in many base R’s functions it is not. Long story short, you can use the pipe operator with base R’s statistical functions, but often you must use the magical . placeholder to specify the location of the data argument.

sleep %>% infer::t_test(formula = extra ~ group)  # tidyverse's t-test
sleep %>% infer::t_test(x = ., formula = extra ~ group)  # identical t-test with explicit placeholder at the data argument
sleep %>% stats::t.test(formula = extra ~ group, data = .)  # base R's t-test with placeholder at the data argument

Piping a vector out of a tibble

Tidyverse functions often work with tibbles, whereas some base R functions request vectors. Before piping a tibble into a function that requests a vector, you can pull out the vector.

res_lm %>%
  broom::augment() %>%
  dplyr::pull(.std.resid) %>%
  stats::shapiro.test()

Give me more. Also, the magrittr package has another special pipe operator %$% that allows you to use the variable names from your tibble, as if they were vectors.

library("magrittr")
res_lm %>%
  broom::augment() %$%
  stats::shapiro.test(.std.resid)

3.3.4 Evaluation

Give me more. Another key characteristic of the tidyverse, is the use of tidy evaluation. This is a very advanced topic that we don’t want you to understand. To quote Thomas Gray (1972): “Where ignorance is bliss, ’tis folly to be wise.”

Base versus tidy. Speaking about quotes, because of tidy evaluation (really, to us it sounds like spaghetti too), in the tidyverse we refer to a variable without using quotation marks (such as group), whereas in base R we would use quotation marks (such as "group"). This might be helpful when trying to distinguish tidyverse from base R solutions that you find on the internet.

sleep %>% dplyr::select(group)  # tidyverse syntax to select the group variable from the sleep data
sleep[, "group"]  # base R syntax to do the exact same

3.3.5 Read in data

The tidyverse contains various packages for reading data into R.

readr for reading in text files (e.g., .txt and .csv)
haven for reading in SPSS, Stata, and SAS files
readxl for reading in Excel files (i.e., .xls and .xlsx)

You may read in your data in two ways.

Coded

readr::read_csv("file.csv")  # comma delimited files
readr::read_csv2("file2.csv")  # semi-colon delimited files
readr::read_delim("file.txt", delim = "|")  # files with any delimiter (e.g., |)
haven::read_sav("file.sav")  # SPSS
haven::read_dta("file.dta")  # Stata
haven::read_sas("file.sas7bdat")  # SAS
readxl::read_excel("file.xlsx")  # Excel

Visualized

In RStudio, you may also go to the Environment tab and hit the Import Dataset dropdown menu. Choose the From Text (readr), From Excel, From SPSS, From SAS, or From Stata option. You’ll be guided through the process and able to preview your choices. Fancy pancy!

Note that during this process R guesses the type of each variable, but in the data preview panel you can manually change it, by clicking the dropdown menu next to the variable name of interest and selecting the desired type. If you change the type to factor, you’ll be asked to provide a list with the names of the factors. For instance, if the factor consist of two values, 1 for females and 0 for males, simply enter 1, 0 in the dialogue window.

Finally, as we love reproducibility, we force you to copy the generated code to your script. Of course, you should just pretend as if you wrote it all by yourself.

Give me more. If you work with big data, vroom is a fast alternative to readr and there are other packages that can be used for web scraping, databases, APIs, what not.

3.3.6 View data

We will show some examples using the built-in data set in R called iris (as in the flower).

?iris  # explanation of this data
data("iris")  # load iris data
iris <- iris %>% dplyr::as_tibble()  # make it a tibble

There are various ways to take a look at your data, here are a couple.

iris %>% tibble::glimpse()  # get a concise overview of the data
## Rows: 150
## Columns: 5
## $ Sepal.Length <dbl> 5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0, 4.4, 4.9, 5.4, 4…
## $ Sepal.Width  <dbl> 3.5, 3.0, 3.2, 3.1, 3.6, 3.9, 3.4, 3.4, 2.9, 3.1, 3.7, 3…
## $ Petal.Length <dbl> 1.4, 1.4, 1.3, 1.5, 1.4, 1.7, 1.4, 1.5, 1.4, 1.5, 1.5, 1…
## $ Petal.Width  <dbl> 0.2, 0.2, 0.2, 0.2, 0.2, 0.4, 0.3, 0.2, 0.2, 0.1, 0.2, 0…
## $ Species      <fct> setosa, setosa, setosa, setosa, setosa, setosa, setosa, …

iris %>% base::print()  # view the first ten rows of the data
iris %>% base::print(n = Inf)  # view the whole shebang in the console (n determines the number of rows)
iris %>% tibble::view()  # view the whole shebang in a spreadsheet-style data viewer
iris %>% base::names()  # print the column names (i.e., the variable names in a data set)

For those of you that think that Petal and Sepal are two forgotten Marvel characters, they’re not. And if you think below is a picture of an iris, it’s not.

wikipedia.com

3.3.7 Subset data

The dplyr package in the tidyverse collection contains three useful functions for subsetting your data:

select for subsetting columns
filter for subsetting rows matching certain conditions
slice for subsetting rows using their positions

Select columns

You can select columns in a great many ways. Various helper functions can help you to select columns that satisfy a certain condition, such as columns with numeric values.

iris %>% dplyr::select(Species)  # select one variable
iris %>% dplyr::select(Petal.Length, Species)  # select more variables
iris %>% dplyr::select(Petal.Length:Species)  # select all variables from Petal.Length to Species
iris %>% dplyr::select(-Species)  # select all but one variable
iris %>% dplyr::select(starts_with("Petal"))  # select variables whose names start with Petal
iris %>% dplyr::select(where(is.numeric))  # select numerical variables
iris %>% dplyr::select(where(is.numeric) & contains("S"))  # select numerical variables that contain a capital S

Slice rows

Filtering rows using their row numbers is easy. Check the help file of the slice function for more interesting options.

iris %>% dplyr::slice(c(2:5))  # filter rows 2 to 5
iris %>% dplyr::slice(-c(2:5))  # filter everything but rows 2 to 5
iris %>% dplyr::slice_sample(n = 5)  # filter 5 randomly selected rows

Filter rows

Rows can also be filtered by matching a certain condition. Finally, a good use for the relational operators that you learned in the R chapter.

iris %>% dplyr::filter(Sepal.Length < 6)  # filter rows where the sepal length is smaller than 6
iris %>% dplyr::filter(Species == "versicolor")  # filter rows of the versicolor species
iris %>% dplyr::filter(Species == "setosa" & Petal.Length > 1.5)  # filter rows of the setosa species where the petal length is larger than 1.5
iris %>% dplyr::filter(Sepal.Width > mean(Sepal.Width))  # filter rows where the sepal width is bigger than the mean

3.4 Exercises

Tibble

Instruction

Recreate the data frame from the exercises of the previous section, but this time make it a tidy tibble.

Tips

The tibble function works exactly like the data.frame function.
Coercion works too.

Nest <> pipe

Instruction

Functions can be nested and piped. In the following examples, determine whether the functions are nested or piped, and write them the other way.

Tips

If you don’t see the distinctive pipe operator, it’s probably not piped ;)
In a pipe, the data—such as a tibble or vector—always goes first.
We only use arithmetic functions as examples, like the sum, mean, and square root. See the Help (inside R) section from the R chapter for what to do if you want to know more about these functions.

First run this bit of code to create the two example vectors.

x <- c(9, 1, 9, 6, 7)
y <- c(9, 3, 1, 10, 9)

Exercises:

# 1
mean(x)

# 2
y %>%
  sqrt()

# 3
sqrt(max(x))

# 4
(x - y) %>% 
  median() %>% 
  abs() %>%
  sqrt()

# 5
sqrt(sum((x - y)^2))

Tidyverse

Instruction

Explore the tidyverse website and its various packages.

Tips

The tidyverse’s package websites, such as the one for dplyr, all have the same structure. Notice that the landing page gives you a summary of the package, usage examples, and cheat sheets. The reference tab gives you links to all functions in the dplyr package, and their documentation.

Read in data

Instruction

Read in the data we used in the Demonstration script above. You can download the data for Study 1 of the paper here. Open the .zip file, and find the data file called Study 1 .txt. Read this data into R.

Alternatively, read in some of your own data.

Tips

A text file called Codebook.txt provides a short description of each variable.
In the resources you will find two ways to read in data. Try to do it in a way that is reproducible.

Subset data

Instruction

Sometimes you want to select only a part of your data. Try to:

Select the columns indicating the experimental condition participants were assigned to, and participants’ age in years.
Select all non-numeric columns.
Select both the tenth and the fifteenth row.
Select only participants whose age in years is higher than the mean age in years.

Tips

Find an example line of code in the demonstration or resources that is similar to the selection you intend to make, and try to adapt the code to make it work in your case.

Spot the species

It’s helpful to be able to distinguish a base R function from a tidyverse function. However, it’s certainly not necessary in most cases, so you may safely skip this exercise if you don’t feel like doing it.

Instruction

Below are a couple of functions. Can you distinguish the base R species from the tidyverse species?

Tips

Remember that some functions live in multiple packages.
Remember that you can use your help operator ? binoculars to get a closer look.

# 1
is.single()  # function cannot be used on persons

# 2
as_tibble()

# 3
numeric_version()

# 4
list2DF()

# 5
table()

Solutions

Learning R is a journey, not a destination

3.5 Break

xkcd.com

3.6 Discussion

Lingering questions or concerns? Use Piazza to follow up on our plenary discussion or post a new question.

4 Wrap Up

4.1 Clear your workspace

Bugs hate a clean desk policy. That’s why you shouldn’t save—nor restore—your workspace. RStudio does it by default, but we can tell it not to. Do save your scripts though! Those will help you reproduce exactly what you did the last time.

Go to RStudio’s Preferences, hit the General tab, change the settings below, and hit Apply.

Remove the tickmark at Restore .RData into workspace at startup.
Set Save workspace to .RData on exit to Never.

Marie Kondo will be proud of you.

Give me more. Here are some more advanced options. You won’t need these most of the time, especially if you’re already not saving and restoring your workspace.

rm(list = ls())  # clear environment
rm(list = ls(all.names = TRUE))  # clear environment including hidden objects
cat("\014")  # clear console
dev.off(dev.list()["RStudioGD"])  # clear plots
gc()  # free up memory with the garbage collector (R collects garbage automatically, thus it's only needed if you can't wait and want to directly free up the memory after having removed a large object)

4.2 Help (outside R)

Take-away: (1) research your question (2) make it specific (3) most problems have been solved before: find and learn (4) if not: make sure others will learn from it.

Where and how to search

Use your favorite general purpose search engine or the R search engine RSeek
- try to be specific in your search query
- add terms like ‘r’, ‘tidyverse’ or the specific package you need help for
Search Stack Overflow
- [tag]: r
- -exclude
- “exact phrase”
- combined: r tidyverse “exact phrase” -exclude
Search the RStudio Community

Where to ask

Ask your fellow students on Piazza
Ask the Rstudio Community
Ask the world on Stack Overflow

How to ask

Tips from Stack Overflow: 1, 2
Add a reproducible example with reprex
Show what you’ve done to solve it yourself

4.3 What’s next?

If learning R would be mountaineering, you made it to base camp. Barely. At base camp, there are three things you can do. First, you can descent back into the valley, order a flat white with oat milk in the nearest coffee bar, and marvel at your once aspirations of climbing that mountain. Second, overtaken by hubris, you can disdainfully pass base camp, and take the first steep turn towards the top. Third, you can take a bit of a rest, absorb everything you’ve seen during the past climb, and start exploring base camp with your fellow mountaineers.

So, what’s next is up to you!

The valley

Okay, we failed. But you should nevertheless try the great projects below. They use R in the background, but have very friendly interfaces. They are like diet R; not as versatile as writing your own scripts, but easy to use, free and open source, and accompanied with great text books.

Software: JASP. Book: Learning statistics with JASP.
Software: jamovi. Book: Learning statistics with jamovi.

Also, our way of teaching might just not be your way of learning. You can find many other approaches online, for instance on DataCamp or Coursera.

The top

Check out the syllabus to see the other masterclasses in this series. And check the graphics you can create with R and what you can do with R Markdown, fancy right!

Can’t wait for the next masterclass? You can explore the following resources on your own.

Free resources
- The book R for Data Science, by Hadley Wickham and Garrett Grolemund, is a great resource.
- More resources for learning the tidyverse.
- Have a look at RStudio Education.
- Download, print, and familiarize yourself with RStudio’s cheat sheets. Pick the ones you need and put them in clear plastic sleeves. You’ll love to have them on your desk!
Paid resources
- If you like Andy Field’s books, you might like his Discovering Statistics using R book.
- If you’re coming from Stata, you might like the R for Stata Users book.
- If you’re coming from SAS or SPSS, you might like the R for SAS and SPSS Users book.

But. You’ll probably first want to practice your newly obtained skills. There’s a lot to do at base camp!

Base camp

Great idea, it’s time to practice, practice, practice, practice, and conquer the learning curve! We’ll give you some tips and resources.

Keep practicing, without delay
- If you don’t start practicing this week, you’ll quickly forget, and picking up is going to be much harder.
- Check the take home exercises for suggestions.
Get help
- Don’t forget you can consult your fellow year group on Piazza.
Read the news
- Follow #rstats.
- Read the tidyverse blog.
- Read R blogs from all around the world.

4.4 Take home exercises

Reserve time

If you haven’t done so already, reserve a fixed time and day of the week for learning R. Mark it in you calendar. Now.

Finish the exercises

Continue with the exercises, use Piazza to ask questions.

Practice together

Sure, solo mountaineers exist, but there’s a reason most form a group. Find a colleague, practice together.

Read in your data

Finished all the exercises? Read in your own data, and play around with it.

5 Colophon

Created with R Markdown and generated on November 19, 2020.

Reproducibility receipt

Session information


─ Session info ───────────────────────────────────────────────────────────────
 setting  value                       
 version  R version 4.0.2 (2020-06-22)
 os       macOS Catalina 10.15.7      
 system   x86_64, darwin17.0          
 ui       X11                         
 language (EN)                        
 collate  en_US.UTF-8                 
 ctype    en_US.UTF-8                 
 tz       Europe/Amsterdam            
 date     2020-11-19                  

─ Packages ───────────────────────────────────────────────────────────────────
 package      * version    date       lib source        
 assertthat     0.2.1      2019-03-21 [1] CRAN (R 4.0.0)
 backports      1.1.10     2020-09-15 [1] CRAN (R 4.0.2)
 BiocManager    1.30.10    2019-11-16 [1] CRAN (R 4.0.0)
 brew         * 1.0-6      2011-04-13 [1] CRAN (R 4.0.0)
 broom        * 0.7.2      2020-10-20 [1] CRAN (R 4.0.2)
 cellranger     1.1.0      2016-07-27 [1] CRAN (R 4.0.0)
 class          7.3-17     2020-04-26 [1] CRAN (R 4.0.2)
 cli            2.1.0      2020-10-12 [1] CRAN (R 4.0.2)
 clipr          0.7.0      2019-07-23 [1] CRAN (R 4.0.0)
 codetools      0.2-16     2018-12-24 [1] CRAN (R 4.0.2)
 colorspace     1.4-1      2019-03-18 [1] CRAN (R 4.0.0)
 crayon         1.3.4      2017-09-16 [1] CRAN (R 4.0.0)
 crosstalk      1.1.0.1    2020-03-13 [1] CRAN (R 4.0.0)
 data.table     1.13.0     2020-07-24 [1] CRAN (R 4.0.2)
 DBI            1.1.0      2019-12-15 [1] CRAN (R 4.0.0)
 dbplyr         2.0.0      2020-11-03 [1] CRAN (R 4.0.2)
 desc           1.2.0      2018-05-01 [1] CRAN (R 4.0.0)
 details      * 0.2.1      2020-01-12 [1] CRAN (R 4.0.0)
 dials        * 0.0.9      2020-09-16 [1] CRAN (R 4.0.2)
 DiceDesign     1.8-1      2019-07-31 [1] CRAN (R 4.0.0)
 digest         0.6.25     2020-02-23 [1] CRAN (R 4.0.0)
 dplyr        * 1.0.2      2020-08-18 [1] CRAN (R 4.0.2)
 ellipsis       0.3.1      2020-05-15 [1] CRAN (R 4.0.0)
 evaluate       0.14       2019-05-28 [1] CRAN (R 4.0.0)
 fansi          0.4.1      2020-01-08 [1] CRAN (R 4.0.0)
 farver         2.0.3      2020-01-16 [1] CRAN (R 4.0.0)
 forcats      * 0.5.0      2020-03-01 [1] CRAN (R 4.0.0)
 foreach        1.5.0      2020-03-30 [1] CRAN (R 4.0.0)
 fs             1.5.0      2020-07-31 [1] CRAN (R 4.0.2)
 furrr          0.1.0      2018-05-16 [1] CRAN (R 4.0.0)
 future         1.19.1     2020-09-22 [1] CRAN (R 4.0.2)
 generics       0.1.0      2020-10-31 [1] CRAN (R 4.0.2)
 ggimage        0.2.8      2020-04-02 [1] CRAN (R 4.0.0)
 ggplot2      * 3.3.2      2020-06-19 [1] CRAN (R 4.0.0)
 ggplotify      0.0.5      2020-03-12 [1] CRAN (R 4.0.0)
 git2r        * 0.27.1     2020-05-03 [1] CRAN (R 4.0.0)
 globals        0.13.0     2020-09-17 [1] CRAN (R 4.0.2)
 glue           1.4.2      2020-08-27 [1] CRAN (R 4.0.2)
 gower          0.2.2      2020-06-23 [1] CRAN (R 4.0.2)
 GPfit          1.0-8      2019-02-08 [1] CRAN (R 4.0.0)
 gridGraphics   0.5-0      2020-02-25 [1] CRAN (R 4.0.0)
 gtable         0.3.0      2019-03-25 [1] CRAN (R 4.0.0)
 haven          2.3.1      2020-06-01 [1] CRAN (R 4.0.2)
 hexbin         1.28.1     2020-02-03 [1] CRAN (R 4.0.0)
 hexSticker   * 0.4.7      2020-06-01 [1] CRAN (R 4.0.0)
 highr          0.8        2019-03-20 [1] CRAN (R 4.0.0)
 hms            0.5.3      2020-01-08 [1] CRAN (R 4.0.0)
 htmltools      0.5.0      2020-06-16 [1] CRAN (R 4.0.2)
 htmlwidgets    1.5.1      2019-10-08 [1] CRAN (R 4.0.0)
 httr           1.4.2      2020-07-20 [1] CRAN (R 4.0.2)
 infer        * 0.5.3      2020-07-14 [1] CRAN (R 4.0.2)
 ipred          0.9-9      2019-04-28 [1] CRAN (R 4.0.0)
 iterators      1.0.12     2019-07-26 [1] CRAN (R 4.0.0)
 jsonlite       1.7.1      2020-09-07 [1] CRAN (R 4.0.2)
 knitr          1.30       2020-09-22 [1] CRAN (R 4.0.2)
 labeling       0.3        2014-08-23 [1] CRAN (R 4.0.0)
 lattice        0.20-41    2020-04-02 [1] CRAN (R 4.0.2)
 lava           1.6.8      2020-09-26 [1] CRAN (R 4.0.2)
 lazyeval       0.2.2      2019-03-15 [1] CRAN (R 4.0.0)
 lhs            1.1.0      2020-09-29 [1] CRAN (R 4.0.2)
 lifecycle      0.2.0      2020-03-06 [1] CRAN (R 4.0.0)
 listenv        0.8.0      2019-12-05 [1] CRAN (R 4.0.0)
 lubridate      1.7.9      2020-06-08 [1] CRAN (R 4.0.2)
 magick         2.4.0      2020-06-23 [1] CRAN (R 4.0.2)
 magrittr       1.5        2014-11-22 [1] CRAN (R 4.0.0)
 MASS           7.3-53     2020-09-09 [1] CRAN (R 4.0.2)
 Matrix         1.2-18     2019-11-27 [1] CRAN (R 4.0.2)
 modeldata    * 0.1.0      2020-10-22 [1] CRAN (R 4.0.2)
 modelr         0.1.8      2020-05-19 [1] CRAN (R 4.0.2)
 munsell        0.5.0      2018-06-12 [1] CRAN (R 4.0.0)
 nnet           7.3-14     2020-04-26 [1] CRAN (R 4.0.2)
 parsnip      * 0.1.4      2020-10-27 [1] CRAN (R 4.0.2)
 pillar         1.4.6      2020-07-10 [1] CRAN (R 4.0.2)
 pkgconfig      2.0.3      2019-09-22 [1] CRAN (R 4.0.0)
 plotly         4.9.2.1    2020-04-04 [1] CRAN (R 4.0.2)
 plotrix      * 3.7-8      2020-04-16 [1] CRAN (R 4.0.2)
 plyr           1.8.6      2020-03-03 [1] CRAN (R 4.0.0)
 png            0.1-7      2013-12-03 [1] CRAN (R 4.0.0)
 pROC           1.16.2     2020-03-19 [1] CRAN (R 4.0.0)
 prodlim        2019.11.13 2019-11-17 [1] CRAN (R 4.0.0)
 purrr        * 0.3.4      2020-04-17 [1] CRAN (R 4.0.0)
 R6             2.4.1      2019-11-12 [1] CRAN (R 4.0.0)
 RColorBrewer   1.1-2      2014-12-07 [1] CRAN (R 4.0.0)
 Rcpp           1.0.5      2020-07-06 [1] CRAN (R 4.0.2)
 readr        * 1.4.0      2020-10-05 [1] CRAN (R 4.0.2)
 readxl         1.3.1      2019-03-13 [1] CRAN (R 4.0.0)
 recipes      * 0.1.15     2020-11-11 [1] CRAN (R 4.0.2)
 reprex         0.3.0      2019-05-16 [1] CRAN (R 4.0.0)
 rlang          0.4.8      2020-10-08 [1] CRAN (R 4.0.2)
 rmarkdown      2.4        2020-09-30 [1] CRAN (R 4.0.2)
 rpart          4.1-15     2019-04-12 [1] CRAN (R 4.0.2)
 rprojroot      1.3-2      2018-01-03 [1] CRAN (R 4.0.0)
 rsample      * 0.0.8      2020-09-23 [1] CRAN (R 4.0.2)
 rstudioapi     0.13       2020-11-12 [1] CRAN (R 4.0.2)
 rvcheck        0.1.8      2020-03-01 [1] CRAN (R 4.0.0)
 rvest          0.3.6      2020-07-25 [1] CRAN (R 4.0.2)
 scales       * 1.1.1      2020-05-11 [1] CRAN (R 4.0.0)
 sessioninfo  * 1.1.1      2018-11-05 [1] CRAN (R 4.0.0)
 showtext       0.9        2020-08-13 [1] CRAN (R 4.0.2)
 showtextdb     3.0        2020-06-04 [1] CRAN (R 4.0.0)
 sos          * 2.0-0      2017-07-03 [1] CRAN (R 4.0.0)
 stringi        1.5.3      2020-09-09 [1] CRAN (R 4.0.2)
 stringr      * 1.4.0      2019-02-10 [1] CRAN (R 4.0.0)
 survival       3.2-7      2020-09-28 [1] CRAN (R 4.0.2)
 sysfonts       0.8.1      2020-05-08 [1] CRAN (R 4.0.0)
 tibble       * 3.0.4      2020-10-12 [1] CRAN (R 4.0.2)
 tidymodels   * 0.1.1      2020-07-14 [1] CRAN (R 4.0.2)
 tidyr        * 1.1.2      2020-08-27 [1] CRAN (R 4.0.2)
 tidyselect     1.1.0      2020-05-11 [1] CRAN (R 4.0.0)
 tidyverse    * 1.3.0      2019-11-21 [1] CRAN (R 4.0.2)
 timeDate       3043.102   2018-02-21 [1] CRAN (R 4.0.0)
 tune         * 0.1.1      2020-07-08 [1] CRAN (R 4.0.2)
 utf8           1.1.4      2018-05-24 [1] CRAN (R 4.0.0)
 vctrs          0.3.4      2020-08-29 [1] CRAN (R 4.0.2)
 viridisLite    0.3.0      2018-02-01 [1] CRAN (R 4.0.0)
 withr          2.3.0      2020-09-22 [1] CRAN (R 4.0.2)
 workflows    * 0.2.1      2020-10-08 [1] CRAN (R 4.0.2)
 xfun           0.18       2020-09-29 [1] CRAN (R 4.0.2)
 xml2           1.3.2      2020-04-23 [1] CRAN (R 4.0.0)
 yaml           2.2.1      2020-02-01 [1] CRAN (R 4.0.0)
 yardstick    * 0.0.7      2020-07-13 [1] CRAN (R 4.0.2)

[1] /Library/Frameworks/R.framework/Versions/4.0/Resources/library

License

We R Novices by Alexander Savi & Simone Plak is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. An Open Educational Resource. Approved for Free Cultural Works.

We R Novices

Part of the We R Champions masterclass series

Alexander Savi, Amsterdam Center for Learning Analytics

Simone Plak, Amsterdam Center for Learning Analytics

October 20, 2020

1 Welcome

1.1 Academic countdown

1.2 Expectations

1.3 Prerequisites

1.4 Reading guide

2 R

2.1 Demonstration

2.2 Resources

2.2.1 Projects

2.2.2 Functions

Argument names

Argument defaults

Argument order

2.2.3 Packages

2.2.4 Help (inside R)

2.2.5 Assignment

2.2.6 Data types and structures

Scalar

Vector

Factor

Data frame

Matrix

Array

List

2.2.7 Coercion

2.2.8 Operators

2.2.9 Into the void

2.3 Exercises

A good start

Data frame

Fix it?

Bug hunt

Solutions

2.4 Break

2.5 Discussion

3 Tidyverse

3.1 Demonstration

3.2 Sneak preview

3.3 Resources

3.3.1 Tidyverse packages

3.3.2 Tibble

3.3.3 Pipe

3.3.4 Evaluation

3.3.5 Read in data

3.3.6 View data

3.3.7 Subset data

3.4 Exercises

Tibble

Nest <> pipe

Tidyverse

Read in data

Subset data

Spot the species

Solutions

3.5 Break

3.6 Discussion

4 Wrap Up

4.1 Clear your workspace

4.2 Help (outside R)

4.3 What’s next?

The valley

The top

Base camp

4.4 Take home exercises

Reserve time

Finish the exercises

Practice together

Read in your data

5 Colophon