Project proposal

Authors
Affiliation
Published

January 1, 2023

Project proposal steps

  1. Find a data set that satisfies the guidelines.

  2. Write about:

    • the source of data
    • when and how it was originally collected (by the curator, not necessarily how you found the data)
    • a brief description of the observations
  3. Choose 1-2 research questions and write a hypothesis for each.

  4. put your data in the data folder and push. glimpse the data

Demo: data in data folder

load data from your computer

  • upload button in R files pane

read and write a csv to file from a URL

  • dinosaur_dataset = read_csv("website-url-here")

  • write_csv(dataset, "data/dinosaur_dataset.csv")

  • Note read_csv and write_csv are from the readr package and loaded automatically with library(tidyverse)

read in an excel spreadsheet

  • readxl::read_xlsx()
  • readxl::read_xls()

Ex: Introduction and Data

Data set #1: NC Courage Homefield Advantage Our first data set comes from the National Women’s Soccer League (NSWL) Github and was sourced from nwslsoccer.com.

The dataset contains 78 observations (soccer games) played by the NC courage spanning three seasons: 2017, 2018, 2019. There are 10 variables in this dataset. Some of the variables we care about are home_team, away_team, and result (of the game).

Ex: Research question(s):

Does NC Courage have a home-field advantage? We hypothesize that NC Courage is more likely to win on their home field than another team’s field.

  • To answer this question we will use information about the home_team, and the result of the game.

Does winning propagate winning? When NC Courage win a game, does it increase the probability of winning the very next game?

  • To answer this question we will use information about the result of the game and the game_number.

Ex: Glimpse

glimpse(courage)
Rows: 78
Columns: 10
$ game_id     <chr> "washington-spirit-vs-north-carolina-courage-2017-04-15", …
$ game_date   <chr> "4/15/2017", "4/22/2017", "4/29/2017", "5/7/2017", "5/14/2…
$ game_number <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,…
$ home_team   <chr> "WAS", "NC", "NC", "BOS", "ORL", "NC", "NC", "CHI", "NC", …
$ away_team   <chr> "NC", "POR", "ORL", "NC", "NC", "CHI", "NJ", "NC", "KC", "…
$ opponent    <chr> "WAS", "POR", "ORL", "BOS", "ORL", "CHI", "NJ", "CHI", "KC…
$ home_pts    <dbl> 0, 1, 3, 0, 3, 1, 2, 3, 2, 3, 0, 0, 2, 1, 1, 0, 1, 2, 2, 2…
$ away_pts    <dbl> 1, 0, 1, 1, 1, 3, 0, 2, 0, 1, 1, 1, 0, 0, 0, 1, 2, 0, 3, 1…
$ result      <chr> "win", "win", "win", "win", "loss", "loss", "win", "loss",…
$ season      <dbl> 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017…

Remember

  • list data sets in your proposal in your order of preference

  • all work for the proposal should go into proposal.qmd within your project repo.

  • render to pdf and submit to Gradescope, selecting all for the exercise “proposal”

  • link your teammates in Gradescope when you submit