HW 3 - Do you even lift?

Homework
Important

This homework is due Wednesday, March 8 at 5:00pm.

Getting Started

  • Go to the Github Organization page and open your hw3-username repo

  • Clone the repository, open a new project in RStudio. It contains the starter documents you need to complete the homework assignment.

Packages

library(tidyverse)
library(tidymodels)

Data

Today, we will be working with data from www.openpowerlifting.org. This data was sourced from tidy tuesday and contains international powerlifting records at various meets. At each meet, each lifter gets three attempts at lifting max weight on three lifts: the bench press, squat and deadlift. The data dictionary for this dataset from tidytuesday is reproduced below:

Dictionary

variable class description
name character Individual lifter name
sex character Binary gender (M/F)
event character The type of competition that the lifter entered.

Values are as follows:
- SBD: Squat-Bench-Deadlift, also commonly called “Full Power”.
- BD: Bench-Deadlift, also commonly called “Ironman” or “Push-Pull”
- SD: Squat-Deadlift, very uncommon.
- SB: Squat-Bench, very uncommon.
- S: Squat-only.
- B: Bench-only.
- D: Deadlift-only.
equipment character The equipment category under which the lifts were performed.

Values are as follows:
- Raw: Bare knees or knee sleeves.
- Wraps: Knee wraps were allowed.
- Single-ply: Equipped, single-ply suits.
- Multi-ply: Equipped, multi-ply suits (includes Double-ply).
- Straps: Allowed straps on the deadlift (used mostly for exhibitions, not real meets).
age double The age of the lifter on the start date of the meet, if known.
age_class character The age class in which the filter falls, for example 40-45
division character Free-form UTF-8 text describing the division of competition, like Open or Juniors 20-23 or Professional.
bodyweight_kg double The recorded bodyweight of the lifter at the time of competition, to two decimal places.
weight_class_kg character The weight class in which the lifter competed, to two decimal places.
Weight classes can be specified as a maximum or as a minimum. Maximums are specified by just the number, for example 90 means “up to (and including) 90kg.” minimums are specified by a + to the right of the number, for example 90+ means “above (and excluding) 90kg.”
best3squat_kg double Maximum of the first three successful attempts for the lift.
Rarely may be negative: that is used by some federations to report the lowest weight the lifter attempted and failed.
best3bench_kg double Maximum of the first three successful attempts for the lift.
Rarely may be negative: that is used by some federations to report the lowest weight the lifter attempted and failed.
best3deadlift_kg double Maximum of the first three successful attempts for the lift.
Rarely may be negative: that is used by some federations to report the lowest weight the lifter attempted and failed.
place character The recorded place of the lifter in the given division at the end of the meet.

Values are as follows:
- Positive number: the place the lifter came in.
- G: Guest lifter. The lifter succeeded, but wasn’t eligible for awards.
- DQ: Disqualified. Note that DQ could be for procedural reasons, not just failed attempts.
- DD: Doping Disqualification. The lifter failed a drug test.
- NS: No-Show. The lifter did not show up on the meet day.
date double ISO 8601 Date of the event
federation character The federation that hosted the meet. (limited to IPF for this data subset)
meet_name character The name of the meet.
The name is defined to never include the year or the federation. For example, the meet officially called 2019 USAPL Raw National Championships would have the MeetName Raw National Championshps.

Exercises

For all of the following exercises, you should include units on axes labels, e.g. “Bench press (lbs)” or “Bench press (kg)”. “Age (years)” etc. This is good practice.

  1. Let’s begin by taking a look at the squat powerlifting records. To begin, remove any observations that are negative for squat. Next, create a new column called best3_squat_lbs that converts the record from kg to lbs (you may have to google the conversion). Save your data frame as ipf_squat.
  • Using ipf_squat, create a scatter plot to investigate the relationship between squat (in lbs) and age. Age should be on the x-axis. Add a linear trend-line. Remove the standard error. Be sure to label all axes and give the plot a title. Comment on what you observe.
  1. Write down the full linear model to predict lift squat lbs from age in \(x\), \(y\), \(\beta\) notation. What is \(x\)? What is \(y\)? Next, fit the linear model. Use the ipf_squat data frame. Re-write your previous equation replacing \(\beta\) with the numeric estimates. This is called the “fitted” linear model. Interpret each estimate of \(\beta\). Are the interpretations reasonable?

  2. Building on your ipf_squat data frame, create a new column called age2 that takes the age of each lifter and squares it. Save your data frame with an appropriate name. Next, plot squat in lbs vs age2 and add a linear best fit line. Does this model look like it fits the data better? Is this still a linear model?

  3. One metric to assess the fit of a model is the correlation squared, also known as \(R^2\). Fit the age\(^2\) model and save the object as age2Fit. Subsequently report the \(R^2\). Compare \(R^2\) of the age\(^2\) model to the model from exercise 2. Which model do you prefer?

  • If you were to add body weight as a second predictor to the age\(^2\) model, would \(R^2\) increase or decrease? Explain.
  1. Starting with the original ipf dataframe, filter and mutate the data as we did in exercise 1, but this time filtering for best3bench_kg \(>0\) and creating a best3_bench_lbs variable, a bodyweight_lbs variable, and a sex variable that is a factor rather than a character. Fit an interaction effects model with bodywieght (in lbs) and sex as predictors of best bench press (in lbs). Write down the fitted model equation only, replacing \(\hat{\beta}\) with the fitted estimates. Interpret the \(\hat{\beta}\).

  2. Visualize the interaction effects model we built in exercise 5. Hint: there should be two lines with different slopes. Bodyweight should be on the x-axis. Add a linear trend-line. Be sure to label all axes and give the plot a title. Comment on what you observe.

  3. Do lifters who fail a drug test perform better or worse at bench press than other lifters? Does this vary across sexes? We’ll answer this question in two parts. First, remove all observations from the data frame that have NA listed under bench press. Next, create a new column called doping_status that takes value doping if the lifter failed a drug test and not doping otherwise. Save this data frame as ipf_dope.

Hint

Check the data dictionary at the top to figure out what variables will help you build the doping_status column.

  1. Using ipf_dope from the previous exercise, compute the 5%, 50%, 95% quantiles for bench press across both sex and doping_status. You can use either bench press in kg or lbs here. With this information, answer the question “Do lifters who fail a drug test perform better or worse at bench press than other lifters? Is this consistent across sex and quantiles?”

Wrap up

Reminder:

  • All plots should follow the best visualization practices: include an informed title, label axes, and carefully consider aesthetic choices.
  • All code should follow the tidyverse style guidelines, including not exceeding the 80 character limit.

Submission

  • Go to http://www.gradescope.com and click Log in in the top right corner.
  • Click School Credentials Duke Net ID and log in using your Net ID credentials.
  • Click on your STA 199 course.
  • Click on the assignment, and you’ll be prompted to submit it.
  • Mark all the pages associated with exercise. All the pages of your homework should be associated with at least one question (i.e., should be “checked”). If you do not do this, you will be subject to lose points on the assignment.
  • Select all pages of your PDF submission to be associated with the “Workflow & formatting” question.

Rubric

  • Ex 1: 6 pts.

  • Ex 2: 7 pts.

  • Ex 3: 6 pts.

  • Ex 4: 5 pts.

  • Ex 5: 8 pts.

  • Ex 6: 4 pts.

  • Ex 7: 4 pts

  • Ex 8: 5 pts

  • Workflow and formatting - 5 pts

Note

The “Workflow & formatting” grade is to assess the reproducible workflow. This includes:

  • linking all pages appropriately on Gradescope

  • putting your name in the YAML at the top of the document

  • committing the submitted version of your .qmd to GitHub

  • Are you under the 80 character code limit? (You shouldn’t have to scroll to see all your code).

  • Pipes %>%, |> and ggplot layers + should be followed by a new line

  • You should be consistent with stylistic choices, e.g. only use 1 of = vs <- and %>% vs |>

  • All binary operators should be surrounded by space. For example x + y is appropriate. x+y is not.