Assistant Professor | Bloch School of Management

## Plan of the day

• Quiz review
• Team deliverable review
• Homework questions
• Conditional probability

Well done on the quiz!

• Median: 9

• Mean: 8.85

• SD: 0.81

Lets talk a little bit about flying—one of my favorite subjects!

Cleaned version of the data…

library(readr)
head(flying.ds, 5)
## # A tibble: 5 x 6
##    Year Accidents FatalAccidents FatalitiesAboard FlightHours Departures
##   <int>     <int>          <int>            <int>       <dbl>      <dbl>
## 1  1983        22              4               14     6914969    5235262
## 2  1984        13              1                4     7736037    5666076
## 3  1985        17              4              196     8265332    6068893
## 4  1986        21              2                4     9495158    6928103
## 5  1987        32              4              229    10115407    7293025

Ok, so for any given flight (measured by departures), what's the probability of being in an accident?

$$Pr(A)$$ = $$\frac{\text{# of favorable outomes}}{\text{# of possible outcomes}}$$

Lets start by counting up the total number of accidents, the total number of fatal accidents, and the total number of flights (departures).

library(tidyverse)
flying.df <- as.data.frame(flying.ds) %>%
summarise(TotalAccidents = sum(Accidents),
TotalFatalAccidents = sum(FatalAccidents),
TotalFlights = sum(Departures))
flying.df
##   TotalAccidents TotalFatalAccidents TotalFlights
## 1            891                  65    282694160

$$Pr(A)$$ = $$\frac{\text{# of favorable outomes}}{\text{# of possible outcomes}}$$

$$Pr(Accident)$$ = $$\frac{\text{Total Accidents}}{\text{Total Flights}}$$

pr.accident = round(100 * (flying.df$TotalAccidents/flying.df$TotalFlights), 5)
paste0("There is a ", pr.accident, "%",
" probability of being in a accident for any given random flight." )
## [1] "There is a 0.00032% probability of being in a accident for any given random flight."

What about being in a fatal accident?

$$Pr(\text{Fatal Accident})$$ = $$\frac{\text{Total Fatal Accidents}}{\text{Total Flights}}$$

pr.fatalaccident = round(100 * (flying.df$TotalFatalAccidents/flying.df$TotalFlights), 7)
paste0("There is a ", format(pr.fatalaccident, scientific = FALSE), "%",
" probability of being in a fatal accident for any given random flight." )
## [1] "There is a 0.000023% probability of being in a fatal accident for any given random flight."

By point of comparison, you have a…

.002% probability of dying in a car crash

.0007% probability of dying from falling on the stairs/steps

.00001% probability of dying from a being bit by a dog

… in any given year.

Ok, so the probability of being in an accident flying commercial is (fortunately) very small.

The probability of dying in a commercial airline accident is also (yeah!) very small.

The two independent probabilities really aren't the interesting questions though.

The interesting question is this…If an accident occurs for a given random commercial airline flight, what is the probability of there being at least one fatality?

To answer this question, we need to move to a discussion of conditional probability.

$$Pr(A|B)$$

A = Event

| = "given that"

B = Condition
$$Pr(\text{Fatal accident } | \text{ That a commercial airline accident happened?})$$

Figuring this one out is pretty straightforward, because there cannot be a fatal accident without an accident happening first. So we use our probability formula from before…

cond.pr.accident = round(100 * (flying.df$TotalFatalAccidents/flying.df$TotalAccidents), 2)
paste0("If a commercial airline accident occurs, there is a ", cond.pr.accident, "%",
" probability of at least one fatality." )
## [1] "If a commercial airline accident occurs, there is a 7.3% probability of at least one fatality."

So what's so useful about understanding conditional probabilities? Well, think about it as a problem solving/deduction tool.

Lets revisit the article from today about why people quit their jobs.

$$Pr(A|B)$$

One of the observations in the article is that job hunting spikes around people's birthdays. So lets assign some values to our formula.

$$A$$ = Probability of job hunting

$$B$$ = Probability of being my birthday

$$Pr(A|B)$$

Putting these values back in our formula, we would read $$Pr(A|B)$$ as the probability of my job hunting given that it's my birthday.

In the article, they give us a number (kind of) for this, and it's a 12% increase over the baseline rate of job hunting (which we don't have).

So what can we do with this information? Well, using a little bit of Bayesian logic, we can ask the inverse of the conditional probability.

$$Pr(B|A)$$

What is the probability that is my birthday, given that I am job hunting?

$$Pr(A|B)$$ = $$\frac{Pr(B|A)Pr(A)}{Pr(B)}$$

Bayes' theorem works like any other algebraic equation. So long as we have at least three parameters, we can solve for the forth.

$$Pr(A|B)$$ = $$\frac{Pr(B|A)Pr(A)}{Pr(B)}$$

$$Pr(A)$$…This is called our prior, and in this case it represents the baseline probability that a given individual is job hunting. We get this probability from prior research, logic, separate data analysis, and so forth. For our purposes, lets say that the baseline probability that a person is job hunting is 10%.

$$Pr(A|B)$$ = $$\frac{Pr(B|A)Pr(A)}{Pr(B)}$$

$$Pr(A|B)$$…The article actually gave us this data (yeah!), and it's a 12% increase over the baseline rate. So, with our 10% baseline, the probability of job hunting when it's your birthday is 11.2%.

$$Pr(A|B)$$ = $$\frac{Pr(B|A)Pr(A)}{Pr(B)}$$

$$Pr(B)$$…This is the probability that any given day is the person's birthday. Leaving aside leap year, on any given day there is a 0.27% chance (1 / 365) that the day is the person's birthday.

So now lets put it together…

$$Pr(A|B)$$ = $$\frac{Pr(B|A)Pr(A)}{Pr(B)}$$

$$.12$$ = $$\frac{Pr(B|A)(.10)}{(.0027)}$$
$$.12*.0027$$ = $$Pr(B|A)(.10)$$

$$0.000324/.10$$ = $$Pr(B|A)$$
$$Pr(B|A)$$ = 0.00324 = 0.324%

For a given person, the probability that it's his or her birthday given that he or she is job hunting is 0.324%. Recall that the probability of a given day being that person's birthday is 0.27%, and a person job hunting given that it is his or her birthday ($$Pr(A|B)$$) is 11.2%.

The single most important thing to remember about conditional probabilities is this…

$$Pr(A|B)$$ $$\ne$$ $$Pr(B|A)$$

Lets go beyond birthdays though.

We are going to consider the impact of an individual having a bod boss on the probability that he or she will start hunting for a job.

$$Pr(B)$$ = ?

Now lets consider $$Pr(A|B)$$. We know from prior research (and the article mentions this too) that a top reason why someone leaves a job is because of a bad boss. It's hard to put an exact number on it, but we can conservatively say that having a bad boss doubles the baseline rate job hunting.

$$Pr(A|B)$$ = ?

So now put it together…

$$Pr(A|B)$$ = $$\frac{Pr(B|A)Pr(A)}{Pr(B)}$$

What is the probability of having a bad boss given that we see an employee job hunting?

Why would this information be valuable?

As much as I would love to, we don't have the time to go into more depth on conditional probability and Bayes' theorem. What I do want you to remember though is this…

• Conditional probabilities are extremely valuable in making sense of the contextual factors of a decision.

• Baseline probabilities can be very misleading, and often don't tell you want you really want to know.

• Conditional probabilities are also very helpful in scenario analysis and 'what if' conversations.

• $$Pr(A|B)$$ $$\ne$$ $$Pr(B|A)$$.

TRAINING TIME OUT

Wrap-up

## What's next?

• Next Class: Regression and its assumptions
• Deliverables: Homework 4 Due 9:00AM Wednesday