Conditional probability

STA 199

Bulletin

  • this ae is due for grade. Push your completed ae to GitHub within 48 hours to receive credit
  • Click here to make your ae-11 repo
  • team announcement in slack. sign-up for teams by 5:00pm today.
  • homework 02 released

Recap: last time

  • What is our definition of probability?

  • Let \(A\) be an event. What is \(A^C\)? What is pr(\(A\)) + pr(\(A^C\))?

  • Based on the contingency table from last time, which events are disjoint?

Getting started

Clone your ae11-username repo from the GitHub organization.

Today

By the end of today you will

  • be able to define and compute marginal, joint and conditional probabilities
  • identify when events are independent
  • apply Bayes’ theorem to examine COVID test specificity
library(tidyverse)
library(knitr)

Definitions

Let A and B be events.

  • Marginal probability: The probability an event occurs regardless of values of other events
    • P(A)
    • Example: What’s the probability a student in STA199 favors dogs?
  • Joint probability: The probability two or more events simultaneously occur
    • Example: What’s the probability a student is a junior and favors dogs?
    • P(A and B)
  • Conditional probability: The probability an event occurs given the other has occurred
    • P(A|B) or P(B|A)
    • Eample: What is the probability a student is a junior given they favor dogs?
  • Independent events: Knowing one event has occurred does not lead to any change in the probability we assign to another event.
    • P(A|B) = P(A) or P(B|A) = P(B)
    • Example: P(Junior | dogs) = P(junior)

Bayes’ Theorem

In 2020, the FDA enacted emergency authorization for a number of serological tests for COVID-19. From the FDA website:

Serology tests detect the presence of antibodies in the blood from the body’s adaptive immune response to an infection, like COVID-19. They do not detect the virus itself. In the early days of an infection when the body’s adaptive immune response is still building, antibodies may not be detected.

Full details of these tests may be found on the FDA’s website here.

We will define the following events:

  • Pos: The event the Alinity test returns positive.
  • Neg: The event the Alinity test returns negative.
  • Covid: The event a person has COVID
  • No Covid: The event a person does not have COVID

Assume the Abbott Alinity test has an estimated sensitivity of 100%, P(Pos | Covid) = 1, and specificity of 99%, P(Neg | No Covid) = 0.99.

Suppose the prevalence of COVID-19 in the general population is about 2%, P(Covid) = 0.02.

Exercise 1

Use the “hypothetical 10,000” to calculate the probability a person has COVID given they get a positive test result, i.e. P(Covid | Pos).

Covid No Covid Total
Pos
Neg
Total 10000

Exercise 2

Use Bayes’ Theorem to calculate P(Covid|Pos).

Simpson’s paradox

This example comes from Confounding and Simpson’s paradox1 by Julious and Mullee.

The data examines 901 individuals with diabetes and includes the following variables

  • insulin_dep: whether or not the patient has insulin dependent or non-insulin dependent diabetes
  • less_than_40: whether or not the individual is less than 40 years old
  • survival: whether or not the individual survived the length of the study
diabetes = read_csv("https://sta101.github.io/static/appex/data/diabetes.csv")

One might be interested in the mortality associated with each type of diabetes. What’s the marginal probability of survival for insulin independent and insulin dependent diabetes?

# code here

What about the probability conditional on age group?

# code here

Footnotes

  1. Julious, S A, and M A Mullee. “Confounding and Simpson’s paradox.” BMJ (Clinical research ed.) vol. 309,6967 (1994): 1480-1. doi:10.1136/bmj.309.6967.1480↩︎