Assistant Professor | Bloch School of Management

Plan of the day

  • Quiz review
  • Team deliverable review
  • Homework questions
  • Conditional probability

Well done on the quiz!

  • Median: 9

  • Mean: 8.85

  • SD: 0.81

Team deliverable review and grades

Lets talk a little bit about flying—one of my favorite subjects!

Cleaned version of the data…

library(readr)
flying.ds <- read_csv("http://www.drbanderson.com/data/USAirlineAccidents_Cleaned.csv")
head(flying.ds, 5)
## # A tibble: 5 x 6
##    Year Accidents FatalAccidents FatalitiesAboard FlightHours Departures
##   <int>     <int>          <int>            <int>       <dbl>      <dbl>
## 1  1983        22              4               14     6914969    5235262
## 2  1984        13              1                4     7736037    5666076
## 3  1985        17              4              196     8265332    6068893
## 4  1986        21              2                4     9495158    6928103
## 5  1987        32              4              229    10115407    7293025

Ok, so for any given flight (measured by departures), what's the probability of being in an accident?

\(Pr(A)\) = \(\frac{\text{# of favorable outomes}}{\text{# of possible outcomes}}\)


Lets start by counting up the total number of accidents, the total number of fatal accidents, and the total number of flights (departures).

library(tidyverse)
flying.df <- as.data.frame(flying.ds) %>%
  summarise(TotalAccidents = sum(Accidents),
            TotalFatalAccidents = sum(FatalAccidents),
            TotalFlights = sum(Departures))
flying.df
##   TotalAccidents TotalFatalAccidents TotalFlights
## 1            891                  65    282694160

\(Pr(A)\) = \(\frac{\text{# of favorable outomes}}{\text{# of possible outcomes}}\)

\(Pr(Accident)\) = \(\frac{\text{Total Accidents}}{\text{Total Flights}}\)


pr.accident = round(100 * (flying.df$TotalAccidents/flying.df$TotalFlights), 5)
paste0("There is a ", pr.accident, "%", 
       " probability of being in a accident for any given random flight." )
## [1] "There is a 0.00032% probability of being in a accident for any given random flight."

What about being in a fatal accident?

\(Pr(\text{Fatal Accident})\) = \(\frac{\text{Total Fatal Accidents}}{\text{Total Flights}}\)


pr.fatalaccident = round(100 * (flying.df$TotalFatalAccidents/flying.df$TotalFlights), 7)
paste0("There is a ", format(pr.fatalaccident, scientific = FALSE), "%",
       " probability of being in a fatal accident for any given random flight." )
## [1] "There is a 0.000023% probability of being in a fatal accident for any given random flight."

By point of comparison, you have a…

.002% probability of dying in a car crash

.0007% probability of dying from falling on the stairs/steps

.00001% probability of dying from a being bit by a dog

… in any given year.

Ok, so the probability of being in an accident flying commercial is (fortunately) very small.

The probability of dying in a commercial airline accident is also (yeah!) very small.

The two independent probabilities really aren't the interesting questions though.

The interesting question is this…If an accident occurs for a given random commercial airline flight, what is the probability of there being at least one fatality?

To answer this question, we need to move to a discussion of conditional probability.

\(Pr(A|B)\)

A = Event

| = "given that"

B = Condition

\(Pr(\text{Fatal accident } | \text{ That a commercial airline accident happened?})\)



Figuring this one out is pretty straightforward, because there cannot be a fatal accident without an accident happening first. So we use our probability formula from before…

cond.pr.accident = round(100 * (flying.df$TotalFatalAccidents/flying.df$TotalAccidents), 2)
paste0("If a commercial airline accident occurs, there is a ", cond.pr.accident, "%",
       " probability of at least one fatality." )
## [1] "If a commercial airline accident occurs, there is a 7.3% probability of at least one fatality."

So what's so useful about understanding conditional probabilities? Well, think about it as a problem solving/deduction tool.

Lets revisit the article from today about why people quit their jobs.

\(Pr(A|B)\)


One of the observations in the article is that job hunting spikes around people's birthdays. So lets assign some values to our formula.

\(A\) = Probability of job hunting

\(B\) = Probability of being my birthday

\(Pr(A|B)\)


Putting these values back in our formula, we would read \(Pr(A|B)\) as the probability of my job hunting given that it's my birthday.

In the article, they give us a number (kind of) for this, and it's a 12% increase over the baseline rate of job hunting (which we don't have).

So what can we do with this information? Well, using a little bit of Bayesian logic, we can ask the inverse of the conditional probability.

\(Pr(B|A)\)


What is the probability that is my birthday, given that I am job hunting?

\(Pr(A|B)\) = \(\frac{Pr(B|A)Pr(A)}{Pr(B)}\)


Bayes' theorem works like any other algebraic equation. So long as we have at least three parameters, we can solve for the forth.

\(Pr(A|B)\) = \(\frac{Pr(B|A)Pr(A)}{Pr(B)}\)


\(Pr(A)\)…This is called our prior, and in this case it represents the baseline probability that a given individual is job hunting. We get this probability from prior research, logic, separate data analysis, and so forth. For our purposes, lets say that the baseline probability that a person is job hunting is 10%.

\(Pr(A|B)\) = \(\frac{Pr(B|A)Pr(A)}{Pr(B)}\)


\(Pr(A|B)\)…The article actually gave us this data (yeah!), and it's a 12% increase over the baseline rate. So, with our 10% baseline, the probability of job hunting when it's your birthday is 11.2%.

\(Pr(A|B)\) = \(\frac{Pr(B|A)Pr(A)}{Pr(B)}\)


\(Pr(B)\)…This is the probability that any given day is the person's birthday. Leaving aside leap year, on any given day there is a 0.27% chance (1 / 365) that the day is the person's birthday.

So now lets put it together…

\(Pr(A|B)\) = \(\frac{Pr(B|A)Pr(A)}{Pr(B)}\)


\(.12\) = \(\frac{Pr(B|A)(.10)}{(.0027)}\)

\(.12*.0027\) = \(Pr(B|A)(.10)\)


\(0.000324/.10\) = \(Pr(B|A)\)

\(Pr(B|A)\) = 0.00324 = 0.324%


For a given person, the probability that it's his or her birthday given that he or she is job hunting is 0.324%. Recall that the probability of a given day being that person's birthday is 0.27%, and a person job hunting given that it is his or her birthday (\(Pr(A|B)\)) is 11.2%.

The single most important thing to remember about conditional probabilities is this…

\(Pr(A|B)\) \(\ne\) \(Pr(B|A)\)

Lets go beyond birthdays though.

We are going to consider the impact of an individual having a bod boss on the probability that he or she will start hunting for a job.

\(Pr(B)\) = ?

Now lets consider \(Pr(A|B)\). We know from prior research (and the article mentions this too) that a top reason why someone leaves a job is because of a bad boss. It's hard to put an exact number on it, but we can conservatively say that having a bad boss doubles the baseline rate job hunting.

\(Pr(A|B)\) = ?

So now put it together…

\(Pr(A|B)\) = \(\frac{Pr(B|A)Pr(A)}{Pr(B)}\)


What is the probability of having a bad boss given that we see an employee job hunting?

Why would this information be valuable?

As much as I would love to, we don't have the time to go into more depth on conditional probability and Bayes' theorem. What I do want you to remember though is this…

  • Conditional probabilities are extremely valuable in making sense of the contextual factors of a decision.

  • Baseline probabilities can be very misleading, and often don't tell you want you really want to know.

  • Conditional probabilities are also very helpful in scenario analysis and 'what if' conversations.

  • \(Pr(A|B)\) \(\ne\) \(Pr(B|A)\).

TRAINING TIME OUT

Wrap-up

What's next?

  • Next Class: Regression and its assumptions
  • Deliverables: Homework 4 Due 9:00AM Wednesday