Lab 4 - Merge Conflicts and Misrepresentation

Lab
Important

This lab is due Monday, February 27th at 5:00pm.

Learning Goals

  • Collaborating on GitHub and resolving merge conflicts
  • Critiquing visualizations that misrepresent data
  • Improving data visualizations to better convey the right message

Getting started

  • Go to the sta199-sp23-1 organization on GitHub. Click on the repo with the prefix lab-04. It contains the starter documents you need to complete the lab.
  • Clone the repo and start a new project in RStudio. See the Lab 0 instructions for details on cloning a repo and starting a new R project.

Bonus Setup Instructions

  1. Find your team and have all team members clone the repo to create a new project on their rstudio container.
  2. Designate one student in each group to take Student Role 0
  3. STUDENT ROLE 0: add *.pdf to a newline in the file .gitignore in your project directory and then save, commit and push the changes you made to that file.
  4. All other students in the group: delete your .gitignore files and pull the changes that were pushed by team member 0.
  5. Now, all students should have .gitignore files in their directory that include the line *.pdf

Why Teams: Rationale

In the real world, data scientists and statisticians often work in research teams. It is a skill to be able to communicate and work together on common projects. Thus, the remaining labs + your project will be team based.

Teams work better when members have a common understanding of the team’s goals and expectations for collaboration. The purpose of this activity is to help your team making a plan for working together during lab and outside of the scheduled lab time.

Each team member will have some ideas about how a team should operate. These ideas may be very different. This is your opportunity to share your thoughts and ideas to promote optimal team function and prevent misunderstandings in the future.

Team Name

Discuss with your group a team name to be called. Your GitHub repos will be created for this team name moving forward. Report your team name to your Lab Leader before moving on.

Instructions

Before completing the lab today, you will complete a team agreement. Discuss each of the items below with all in-person team members. If necessary, also follow up this week with any missing team members.

Have one person act as the recorder and type the team’s decisions in the team-agreement.qmd file.

Be sure the team agrees on an item before it is added to the document.

Once the document is complete, the recorder should render, commit, and push the team agreement to GitHub. All team members can refer to this document throughout the semester.

Team Agreement

Weekly meetings

Identify a 1 - 2 hour weekly block outside of lab where the team can meet to work on assignments. All team members should block off this time on their calendar in case the group needs to meet to finish lab or work on the project.

Meeting “location”

How the team will meet to work together (e.g. in-person, Zoom, Facetime, Google Hangouts). Be sure every member is able to access the virtual meeting space, if needed. If you are unable to find a weekly time when the team can meet, briefly outline a plan to work on assignments outside of lab. Otherwise, you can delete this item.

Primary method of communication

The team’s primary method of communication outside of meetings (e.g. Slack, text messages, etc.)

How should someone notify the other members if they are unable to attend lab or a scheduled team meeting?

By when should everyone have their portion of the lab completed?

Keep in mind your team may want to have time to review the lab before turning it in to make sure it is a cohesive write up.

Any other items the team would like to discuss or plan.

Missing Teammates

If you are missing teammates for today’s lab, it is your responsibility to reach out, say hello, and communicate with them that they must contribute to the above and below questions before submitting lab-04 to Gradescope. You can find their email in the teams repo. The link to the teams repo is located on our website on the teaching team tab.

Resolving merge conflicts

Working in teams includes using a shared GitHub repo for labs and projects. Sometimes things will go swimmingly, and sometimes you’ll run into merge conflicts.  

When you and your teammates work on the lines of code within a document, and both try to push your changes, you will run into issues. Merge conflicts happen when you merge branches that have competing renders, and Git needs your help to decide which changes to incorporate in the final merge.

Our first task today is to walk you through a merge conflict!

If you haven’t yet done so since filling out the Team Agreement, now is a good time to render, commit, and push. Make sure that you render and push all changed documents and your Git pane is completely empty before proceeding.

  • Pushing to a repo replaces the code on GitHub with the code you have on your computer.
  • If a collaborator has made a change to your repo on GitHub that you haven’t incorporated into your local work, GitHub will stop you from pushing to the repo because this could overwrite your collaborator’s work!
  • So you need to explicitly “merge” your collaborator’s work before you can push.
  • If your and your collaborator’s changes are in different files or in different parts of the same file, git merges the work for you automatically when you *pull*.
  • If you both changed the same part of a file, git will produce a **merge conflict** because it doesn’t know how which change you want to keep and which change you want to overwrite.

Git will put conflict markers in your code that look like:

<<<<<<< HEAD 

See also: [dplyr documentation](https://dplyr.tidyverse.org/)   

======= 

See also [ggplot2 documentation](https://ggplot2.tidyverse.org/)  

>>>>>>> some1alpha2numeric3string4

The ===s separate your changes (top) from their changes (bottom).

Note that on top you see the word HEAD, which indicates that these are your changes.

And at the bottom you see some1alpha2numeric3string4 (well, it probably looks more like 28e7b2ceb39972085a0860892062810fb812a08f).

This is the hash (a unique identifier) of the render your collaborator made with the conflicting change.

Your job is to reconcile the changes: edit the file so that it incorporates the best of both versions and delete the <<<, ===, and >>> lines. Then you can stage and render the result.

Merge conflict activity

Setup

  • Clone the repo and open the .qmd file.
  • Assign the numbers 1, 2, 3, and 4 to each of the team members. If your team has fewer than 4 people, some people will need to have multiple numbers. If your team has more than 4 people, some people will need to share some numbers.
  • Update your git settings! Copy and paste git config --global pull.rebase false into your Terminal in RStudio and press enter. (Look for the tab next to the Console, and ask if you can’t find it.)

Let’s cause a merge conflict!

Our goal is to see two different types of merges: first we’ll see a type of merge that git can’t figure out on its own how to do on its own (a merge conflict) and requires human intervention, then another type of where that git can figure out how to do without human intervention.

Doing this will require some tight choreography, so pay attention!

Take turns in completing the exercises, only one member at a time. Others should just watch, not doing anything on their own projects (this includes not even pulling changes!) until they are instructed to. If you feel like you won’t be able to resist the urge to touch your computer when it’s not your turn, we recommend putting your hands in your pockets or sitting on them!

Before starting: everyone on your team should have followed the instructions in the Setup section above. Repeating from earlier: everyone should clone the repo, open the lab4.qmd document, run git config --global pull.rebase false in the Terminal, and know which role number(s) they are.

🛑 Make sure everyone has completed this step before moving on.

Role 1:

  • Change “TEAM NAME” in the YAML to your actual team name.
  • render, commit, push.

🛑 Make sure the previous role has finished before moving on to the next step.

Role 2:

  • Change “TEAM NAME” to some other word.
  • Render, commit, push. You should get an error.
  • Pull. Take a look at the document with the merge conflict.
  • Clear the merge conflict by editing the document to choose the correct/preferred change.
  • Render.
  • Click the Stage checkbox for all files in your Git tab. Make sure they all have check marks, not filled-in boxes.
  • commit and push.

🛑 Make sure the previous role has finished before moving on to the next step.

Role 3:

  • Add a code chunk for Exercise 2 and give it a label.
  • Render, commit, push. You should get an error.
  • Pull. No merge conflicts should occur, but you should see a message about merging.
  • Now push.

🛑 Make sure the previous role has finished before moving on to the next step.

Role 4:

  • Add a code chunk to Exercise 2 and give it a different label.
  • Render, commit, push. You should get an error.
  • Pull. Take a look at the document with the merge conflict. Clear the merge conflict by choosing the correct/preferred change. render, and push.

🛑 Make sure the previous role has finished before moving on to the next step.

Everyone: Pull, and observe the changes in your document.

Tips for collaborating via GitHub

  • Always pull first before you start working.
  • Resolve a merge conflict (render and push) before continuing your work. Never do new work while resolving a merge conflict.
  • Render, commit, and push often to minimize merge conflicts and/or to make merge conflicts easier to resolve.
  • If you find yourself in a situation that is difficult to resolve, ask questions ASAP. Don’t let it linger and get bigger.

Conveying the right message through visualization

Packages

We’ll use the tidyverse package for much of the data wrangling and visualization.

library(tidyverse)

Since the exercises for this week are short it’s possible not every team member will get to commit and push during the workshop. However each team member should review what was created, fix typos, make edits for better presentation, etc. either during or after the workshop, and before the deadline.

Note: Everyone should intellectually contribute to every question. However, no two teammates should work on the same question at the same time below.

Data

In this lab you’ll construct the data set!

Exercises

The following visualization was shared on Twitter as “extraordinary misleading”.

  1. What is misleading about this visualization and how might you go about fixing it?

If you haven’t yet done so, now is a good time to render, commit, and push. Make sure that you commit and push all changed documents and your Git pane is completely empty before proceeding.


Everyone now pull

  1. Create a data frame that could be used to recreate this visualization. You may need to guess some of the numbers – just make them as accurate as you can. You should first think about how many rows and columns you’ll need and what you want to call your variables. Then, you can use the tribble() function to create the data frame. To get a sense of how tribble() works, try running the code below in your console and then looking at the structure of the resulting data frame df.
df <- tribble(
  ~date, ~count,
  "1/1/2020", 15,
  "2/1/2020", 20,
  "3/1/2020", 22,
)

Print the first 5 rows of the data frame after creating it. You may name the data frame what you like. We often try and keep data frame names small and informative.

If you haven’t yet done so, now is a good time to render, commit, and push. Make sure that you commit and push all changed documents and your Git pane is completely empty before proceeding.


Everyone now pull

  1. Make a visualization that more accurately (and honestly) tells the story.

If you haven’t yet done so, now is a good time to render, commit, and push. Make sure that you commit and push all changed documents and your Git pane is completely empty before proceeding.


Everyone now pull

  1. What message is more clear in your visualization than it was in the original visualization? Why is it more clear?

If you haven’t yet done so, now is a good time to render, commit, and push. Make sure that you commit and push all changed documents and your Git pane is completely empty before proceeding.


Everyone now pull

  1. What, if any, useful information do these data and your visualization tell us about mask wearing and COVID? Answers should focus only on what the visualization tells.

If you haven’t yet done so, now is a good time to render, commit, and push. Make sure that you commit and push all changed documents and your Git pane is completely empty before proceeding.


Everyone now pull

Submission

Once you are finished with the lab, you will your final PDF document to Gradescope.

Warning

Before you wrap up the assignment, make sure all documents are updated on your GitHub repo. We will be checking these to make sure you have been practicing how to render and push changes.

You must turn in a PDF file to the Gradescope page by the submission deadline to be considered “on time”.

Make sure your data are tidy! That is, your code should not be running off the pages and spaced properly. See: https://style.tidyverse.org/ggplot2.html

To submit your assignment:

  • Go to http://www.gradescope.com and click Log in in the top right corner.
  • Click School Credentials \(\rightarrow\) Duke NetID and log in using your NetID credentials.
  • Click on your STA 199 course.
  • Click on the assignment, and you’ll be prompted to submit it.
  • Mark all the pages associated with exercise. All the pages of your lab should be associated with at least one question (i.e., should be “checked”). If you do not do this, you will be subject to lose points on the assignment.
  • Select all pages of your .pdf submission to be associated with the “Workflow & formatting” question.

Grading

Component Points
Team Agreement 10
Ex 1 5
Ex 2 5
Ex 3 15
Ex 4 5
Ex 5 5
Workflow & formatting 5
Total 50
Note

The “Workflow & formatting” grade is to assess the reproducible workflow. This includes:

  • linking all pages appropriately on Gradescope
  • putting your team and member names in the YAML at the top of the document
  • committing the submitted version of your .qmd to GitHub
  • Are you under the 80 character code limit? (You shouldn’t have to scroll to see all your code). Pipes %>%, |> and ggplot layers + should be followed by a new line
  • You should be consistent with stylistic choices, e.g. only use 1 of = vs <- and %>% vs |>
  • All binary operators should be surrounded by space. For example x + y is appropriate. x+y is not.