library(tidyverse)
library(tidymodels)
HW 5: Sleep case study and beyond
Getting Started
Go to the Github Organization page and open your
hw5-username
repoClone the repository, open a new project in RStudio. It contains the starter documents you need to complete the homework assignment.
Packages
Data
Load the data:
= read_csv("https://sta101-fa22.netlify.app/static/labs/data/sleep.csv") sleep
The data we are working with today comes from Herbie Lee. Please read information about the data set here, where you will also find a codebook.
Exercises
Investigating sleep
Find the mean time it takes the participant to fall asleep. Check whether or not you can use central limit theorem to construct a 95% confidence interval to report with your estimate. For purposes of this exercise, you may consider samples of time to sleep as random. If you can use CLT, please do so to construct and report the confidence interval. Otherwise, construct a bootstrap confidence interval with
set.seed(5)
and10000
reps. Interpret your interval in context.Does the participant take more than 35 minutes on average to fall asleep? Perform a hypothesis test to investigate. What is the null? What is the alternative? Use
set.seed(2)
andreps=5000
to simulate under the null.
Visualize the null distribution and shade the p-value. Be sure to label the x-axis. Make a conclusion based on significance level. Given your response from the previous exercise, does this result surprise you? Why or why not? What is the probability of rejecting the null if it’s actually true? What type of error is this (type 1 or type 2)?
- Do calcium-magnesium supplements reduce median time to fall asleep? Construct a hypothesis test to investigate. To begin, state the null and alternative hypothesis in words and in statistical notation. Use simulation-based inference to generate
5000
samples from the null distribution withset.seed(5)
. Compute and state your observed statistic. Finally, compute the p-value and report it as well as significance at the alpha = 0.05 level and state your conclusion in context.
- The more time the individual spent lying in bed, awake, and trying to sleep, the worse their overall quality of sleep. To begin investigating quality of sleep, create a new column
timeAwake
that shows the total number of minutes the individual spent in bed but not asleep. WhentimeAwake
is large, sleep is awful and interrupted. Save your data frame with the new column assleep2
.
Example: TTS
is the time to fall asleep, TBT
is total bed time, i.e. total time spent, lying in bed and trying to sleep.
So if the participant gets into bed at 9:00pm and then falls asleep at 9:30pm and wakes up at 1:00am and lies awake for an hour only to finally sleep until 6:00am and gets out of bed then
TTS
is 30 minutes,TBT
is 9 hours,- Total Sleep Time (
TST
) is 3.5h + 4h = 7.5 hours and the total time awake is 1.5 hours.
- Hypothesis: Most nights, the participant spends over fifty minutes lying in bed awake. Construct a hypothesis test to investigate. To begin, state the null and alternative hypothesis in words and in statistical notation. Use simulation-based inference to generate
5000
samples from the null distribution withset.seed(8)
. Compute and state your observed statistic. Finally, compute the p-value and report it as well as significance at the alpha = 0.01 level and state your conclusion in context.
Beyond this class: pick a package
- In this exercise, you will find a new package that you haven’t used in class and learn how to use it. Being able to find new packages, install them, read the documentation and finally use the package is an essential skill beyond the classroom.
To begin,
Pick a package. You can choose one from the list below, or venture into the great unknown and find another online. If you have trouble getting a package to work, ask for help on slack or come to office hours.
Install the package. Be sure to do this in the Console, not in your quarto document because you do not want to keep re-installing every time you render the document.
Depending on where the package comes from, how you install the package differs:
If the package is on CRAN (Comprehensive R Archive Network), you can install it with
install.packages
.If the package is only on Github (most likely because it is still under development), you need to use the
install_github
function, click here for more details.Load the package. Regardless of how you installed the package you can load it with the
library
function.Do something with the package. Typically, simpler is better. The goal is for you to read and understand the package documentation to carry out a simple task.
Finally, write a short 3-4 sentence statement (at the beginning of your solution to this exercise) and include:
The name of the package you use and whether it is from CRAN or GitHub
A short description of what the package does (in your own words)
A short description of what you do with the package.
Sample list of packages on CRAN:
package name | description |
---|---|
ape | Manipulate, plot and interact with phylogenetic trees and models. Comes with sample data |
astrodatR | Astronomy datasets |
cowsay | Allows printing of character strings as messages/warnings/etc. with ASCII animals, including cats, cows, frogs, chickens, ghosts, and more |
babynames | US Baby Names 1880-2015 |
dragracer | These are data sets for the hit TV show, RuPaul’s Drag Race. Data right now include episode-level data, contestant-level data, and episode-contestant-level data |
datapasta | RStudio addins and R functions that make copy-pasting vectors and tables to text painless |
DiagrammeR | Graph/Network Visualization |
janeaustenr | Full texts for Jane Austen’s 6 completed novels, ready for text analysis. These novels are “Sense and Sensibility”, “Pride and Prejudice”, “Mansfield Park”, “Emma”, “Northanger Abbey”, and “Persuasion” |
ggimage | Supports image files and graphic objects to be visualized in ‘ggplot2’ graphic system |
gganimate | Create easy animations with ggplot2 |
gt | Easily Create Presentation-Ready Display Tables |
leaflet | Create Interactive Web Maps with the JavaScript ‘Leaflet’ Library |
praise | Build friendly R packages that praise their users if they have done something good, or they just need it to feel better |
plotly | Create interactive web graphics from ggplot2 graphs and/or a custom interface to the JavaScript library plotly.js inspired by the grammar of graphics |
suncalc | R interface to suncalc.js library, part of the SunCalc.net project, for calculating sun position, sunlight phases (times for sunrise, sunset, dusk, etc.), moon position and lunar phase for the given location and time |
schrute | The complete scripts from the American version of the Office television show in tibble format |
statebins | The cartogram heatmaps generated by the included methods are an alternative to choropleth maps for the United States and are based on work by the Washington Post graphics department in their report on “The states most threatened by trade” |
ttbbeer | An R data package of beer statistics from U.S. Department of the Treasury, Alcohol and Tobacco Tax and Trade Bureau (TTB) |
ttbbeer | An R data package of beer statistics from U.S. Department of the Treasury, Alcohol and Tobacco Tax and Trade Bureau (TTB) |
ukbabynames | Full listing of UK baby names occurring more than three times per year between 1996 and 2015, and rankings of baby name popularity by decade from 1904 to 1994 |
wesanderson | Color palettes from Wes Anderson films |
scatterplot3d | Create 3D plots |
Assessment
A research team at Duke University is creating an intro data science assessment. They are hopeful to have students in an intro data science course take the assessment to better pin down how students think about and reason through data science assessment questions. If you are able and willing, please consider participating in this research.
Participation is completely optional, and has no impact on your academic standing in STA199.
Rubric
Ex 1: 5 pts.
Ex 2: 10 pts.
Ex 3: 8 pts.
Ex 4: 2 pts.
Ex 5: 8 pts.
Ex 6: 12 pts.
Workflow and formatting - 5 pts
Submission
- Go to http://www.gradescope.com and click Log in in the top right corner.
- Click School Credentials Duke Net ID and log in using your Net ID credentials.
- Click on your STA 199 course.
- Click on the assignment, and you’ll be prompted to submit it.
- Mark all the pages associated with exercise. All the pages of your homework should be associated with at least one question (i.e., should be “checked”). If you do not do this, you will be subject to lose points on the assignment.
- Select all pages of your PDF submission to be associated with the “Workflow & formatting” question.