library(tidyverse)
library(tidymodels)
HW 3 - Do you even lift?
Getting Started
Go to the Github Organization page and open your
hw3-username
repoClone the repository, open a new project in RStudio. It contains the starter documents you need to complete the homework assignment.
Packages
Data
Today, we will be working with data from www.openpowerlifting.org. This data was sourced from tidy tuesday and contains international powerlifting records at various meets. At each meet, each lifter gets three attempts at lifting max weight on three lifts: the bench press, squat and deadlift. The data dictionary for this dataset from tidytuesday is reproduced below:
Dictionary
variable | class | description |
---|---|---|
name | character | Individual lifter name |
sex | character | Binary gender (M/F) |
event | character | The type of competition that the lifter entered. Values are as follows: - SBD: Squat-Bench-Deadlift, also commonly called “Full Power”. - BD: Bench-Deadlift, also commonly called “Ironman” or “Push-Pull” - SD: Squat-Deadlift, very uncommon. - SB: Squat-Bench, very uncommon. - S: Squat-only. - B: Bench-only. - D: Deadlift-only. |
equipment | character | The equipment category under which the lifts were performed. Values are as follows: - Raw: Bare knees or knee sleeves. - Wraps: Knee wraps were allowed. - Single-ply: Equipped, single-ply suits. - Multi-ply: Equipped, multi-ply suits (includes Double-ply). - Straps: Allowed straps on the deadlift (used mostly for exhibitions, not real meets). |
age | double | The age of the lifter on the start date of the meet, if known. |
age_class | character | The age class in which the filter falls, for example 40-45 |
division | character | Free-form UTF-8 text describing the division of competition, like Open or Juniors 20-23 or Professional . |
bodyweight_kg | double | The recorded bodyweight of the lifter at the time of competition, to two decimal places. |
weight_class_kg | character | The weight class in which the lifter competed, to two decimal places. Weight classes can be specified as a maximum or as a minimum. Maximums are specified by just the number, for example 90 means “up to (and including) 90kg.” minimums are specified by a + to the right of the number, for example 90+ means “above (and excluding) 90kg.” |
best3squat_kg | double | Maximum of the first three successful attempts for the lift. Rarely may be negative: that is used by some federations to report the lowest weight the lifter attempted and failed. |
best3bench_kg | double | Maximum of the first three successful attempts for the lift. Rarely may be negative: that is used by some federations to report the lowest weight the lifter attempted and failed. |
best3deadlift_kg | double | Maximum of the first three successful attempts for the lift. Rarely may be negative: that is used by some federations to report the lowest weight the lifter attempted and failed. |
place | character | The recorded place of the lifter in the given division at the end of the meet. Values are as follows: - Positive number: the place the lifter came in. - G: Guest lifter. The lifter succeeded, but wasn’t eligible for awards. - DQ: Disqualified. Note that DQ could be for procedural reasons, not just failed attempts. - DD: Doping Disqualification. The lifter failed a drug test. - NS: No-Show. The lifter did not show up on the meet day. |
date | double | ISO 8601 Date of the event |
federation | character | The federation that hosted the meet. (limited to IPF for this data subset) |
meet_name | character | The name of the meet. The name is defined to never include the year or the federation. For example, the meet officially called 2019 USAPL Raw National Championships would have the MeetName Raw National Championshps . |
Exercises
For all of the following exercises, you should include units on axes labels, e.g. “Bench press (lbs)” or “Bench press (kg)”. “Age (years)” etc. This is good practice.
- Let’s begin by taking a look at the squat powerlifting records. To begin, remove any observations that are negative for squat. Next, create a new column called
best3_squat_lbs
that converts the record from kg to lbs (you may have to google the conversion). Save your data frame asipf_squat
.
- Using
ipf_squat
, create a scatter plot to investigate the relationship between squat (in lbs) and age. Age should be on the x-axis. Add a linear trend-line. Remove the standard error. Be sure to label all axes and give the plot a title. Comment on what you observe.
Write down the full linear model to predict lift squat lbs from age in \(x\), \(y\), \(\beta\) notation. What is \(x\)? What is \(y\)? Next, fit the linear model. Use the
ipf_squat
data frame. Re-write your previous equation replacing \(\beta\) with the numeric estimates. This is called the “fitted” linear model. Interpret each estimate of \(\beta\). Are the interpretations reasonable?Building on your
ipf_squat
data frame, create a new column calledage2
that takes the age of each lifter and squares it. Save your data frame with an appropriate name. Next, plot squat in lbs vsage2
and add a linear best fit line. Does this model look like it fits the data better? Is this still a linear model?One metric to assess the fit of a model is the correlation squared, also known as \(R^2\). Fit the age\(^2\) model and save the object as
age2Fit
. Subsequently report the \(R^2\). Compare \(R^2\) of the age\(^2\) model to the model from exercise 2. Which model do you prefer?
- If you were to add body weight as a second predictor to the age\(^2\) model, would \(R^2\) increase or decrease? Explain.
Starting with the original
ipf
dataframe, filter and mutate the data as we did in exercise 1, but this time filtering forbest3bench_kg
\(>0\) and creating abest3_bench_lbs
variable, abodyweight_lbs
variable, and asex
variable that is a factor rather than a character. Fit an interaction effects model with bodywieght (in lbs) and sex as predictors of best bench press (in lbs). Write down the fitted model equation only, replacing \(\hat{\beta}\) with the fitted estimates. Interpret the \(\hat{\beta}\).Visualize the interaction effects model we built in exercise 5. Hint: there should be two lines with different slopes. Bodyweight should be on the x-axis. Add a linear trend-line. Be sure to label all axes and give the plot a title. Comment on what you observe.
Do lifters who fail a drug test perform better or worse at bench press than other lifters? Does this vary across sexes? We’ll answer this question in two parts. First, remove all observations from the data frame that have
NA
listed under bench press. Next, create a new column calleddoping_status
that takes valuedoping
if the lifter failed a drug test andnot doping
otherwise. Save this data frame asipf_dope
.
- Using
ipf_dope
from the previous exercise, compute the 5%, 50%, 95% quantiles for bench press across bothsex
anddoping_status
. You can use either bench press in kg or lbs here. With this information, answer the question “Do lifters who fail a drug test perform better or worse at bench press than other lifters? Is this consistent across sex and quantiles?”
Wrap up
Reminder:
- All plots should follow the best visualization practices: include an informed title, label axes, and carefully consider aesthetic choices.
- All code should follow the tidyverse style guidelines, including not exceeding the 80 character limit.
Submission
- Go to http://www.gradescope.com and click Log in in the top right corner.
- Click School Credentials Duke Net ID and log in using your Net ID credentials.
- Click on your STA 199 course.
- Click on the assignment, and you’ll be prompted to submit it.
- Mark all the pages associated with exercise. All the pages of your homework should be associated with at least one question (i.e., should be “checked”). If you do not do this, you will be subject to lose points on the assignment.
- Select all pages of your PDF submission to be associated with the “Workflow & formatting” question.
Rubric
Ex 1: 6 pts.
Ex 2: 7 pts.
Ex 3: 6 pts.
Ex 4: 5 pts.
Ex 5: 8 pts.
Ex 6: 4 pts.
Ex 7: 4 pts
Ex 8: 5 pts
Workflow and formatting - 5 pts