Project proposal
Project proposal steps
Find a data set that satisfies the guidelines.
Write about:
- the source of data
- when and how it was originally collected (by the curator, not necessarily how you found the data)
- a brief description of the observations
Choose 1-2 research questions and write a hypothesis for each.
put your data in the data folder and push.
glimpse
the data
Demo: data in data folder
load data from your computer
- upload button in R files pane
read and write a csv
to file from a URL
dinosaur_dataset = read_csv("website-url-here")
write_csv(dataset, "data/dinosaur_dataset.csv")
Note
read_csv
andwrite_csv
are from thereadr
package and loaded automatically withlibrary(tidyverse)
read in an excel spreadsheet
readxl::read_xlsx()
readxl::read_xls()
Ex: Introduction and Data
Data set #1: NC Courage Homefield Advantage Our first data set comes from the National Women’s Soccer League (NSWL) Github and was sourced from nwslsoccer.com.
The dataset contains 78 observations (soccer games) played by the NC courage spanning three seasons: 2017, 2018, 2019. There are 10 variables in this dataset. Some of the variables we care about are home_team
, away_team
, and result
(of the game).
Ex: Research question(s):
Does NC Courage have a home-field advantage? We hypothesize that NC Courage is more likely to win on their home field than another team’s field.
- To answer this question we will use information about the
home_team
, and theresult
of the game.
Does winning propagate winning? When NC Courage win a game, does it increase the probability of winning the very next game?
- To answer this question we will use information about the
result
of the game and thegame_number
.
Ex: Glimpse
glimpse(courage)
Rows: 78
Columns: 10
$ game_id <chr> "washington-spirit-vs-north-carolina-courage-2017-04-15", …
$ game_date <chr> "4/15/2017", "4/22/2017", "4/29/2017", "5/7/2017", "5/14/2…
$ game_number <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,…
$ home_team <chr> "WAS", "NC", "NC", "BOS", "ORL", "NC", "NC", "CHI", "NC", …
$ away_team <chr> "NC", "POR", "ORL", "NC", "NC", "CHI", "NJ", "NC", "KC", "…
$ opponent <chr> "WAS", "POR", "ORL", "BOS", "ORL", "CHI", "NJ", "CHI", "KC…
$ home_pts <dbl> 0, 1, 3, 0, 3, 1, 2, 3, 2, 3, 0, 0, 2, 1, 1, 0, 1, 2, 2, 2…
$ away_pts <dbl> 1, 0, 1, 1, 1, 3, 0, 2, 0, 1, 1, 1, 0, 0, 0, 1, 2, 0, 3, 1…
$ result <chr> "win", "win", "win", "win", "loss", "loss", "win", "loss",…
$ season <dbl> 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017…
Remember
list data sets in your proposal in your order of preference
all work for the proposal should go into
proposal.qmd
within your project repo.render to pdf and submit to Gradescope, selecting all for the exercise “proposal”
link your teammates in Gradescope when you submit