Welcome to R

STA 199

Today

By the end of today you will…

  • begin to know your way around RStudio
  • be able to define package, data frame, variable, function, argument
  • use the function glimpse()

Getting started

Clone the ae1 repo from the GitHub organization

R as a calculator

  • Use R as a calculator by typing the following into the console:
5 * 5 + 10

x = 3
x + x^2

x = 1:10
x * 7

In the last couple examples we save some value as the object “x”.

We can “print” x to the screen by typing the name of the object (“x”) in the console or in a code chunk.

Tour of RStudio

  • environment
  • R functions
  • loading and viewing a data frame

Load a package

library(tidyverse) 

Load data

roster = read_csv("data/roster.csv")
survey = read_csv("data/survey.csv")

Question: What objects store the data in the code chunk above? Can you print them to the screen?

Create a new code chunk with CMD+OPTION+I (mac) or CTRL+ALT+I (windows/linux)

So far we’ve already seen two functions. library and read_csv. Functions in R are attached to parentheses and take an input, aka an argument, and often (but not always) return an output. To learn more about a function, you can check the documentation with ?, e.g. ?library.

Demos

Let’s glimpse the data frame.

glimpse(survey)
Rows: 12
Columns: 5
$ name                 <chr> "A", "Appa", "Bumi", "Soka", "Katara", "Suki", "Z…
$ email                <chr> "the-last-Rbender@duke.edu", "yip-yip-appa@duke.e…
$ bender               <chr> "Airbender", "Airbender", "Earthbender", "None", …
$ previous_programming <chr> "No", "No", "No", "Somewhat", "Yes", "Yes", "Yes"…
$ year                 <dbl> 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4

To look at all of it, we can use view()

view(survey)

View the roster data in the console

Terminology: “columns” of a dataframe are called variables whereas “rows” are observations.

Question: How many variables are in the data frame survey? How many observations? What about the data frame roster?

Why must I input specific email formats?

roster |>
  left_join(survey, by = "email")