Assistant Professor | Bloch School of Management

Plan of the day

  • Quiz review
  • Team deliverable review
  • Homework questions
  • Conditional probability

Well done on the quiz!

  • Median: 9

  • Mean: 8.85

  • SD: 0.81

Team deliverable review and grades

Lets talk a little bit about flying—one of my favorite subjects!

Cleaned version of the data…

flying.ds <- read_csv("")
head(flying.ds, 5)
## # A tibble: 5 x 6
##    Year Accidents FatalAccidents FatalitiesAboard FlightHours Departures
##   <int>     <int>          <int>            <int>       <dbl>      <dbl>
## 1  1983        22              4               14     6914969    5235262
## 2  1984        13              1                4     7736037    5666076
## 3  1985        17              4              196     8265332    6068893
## 4  1986        21              2                4     9495158    6928103
## 5  1987        32              4              229    10115407    7293025

Ok, so for any given flight (measured by departures), what's the probability of being in an accident?

\(Pr(A)\) = \(\frac{\text{# of favorable outomes}}{\text{# of possible outcomes}}\)

Lets start by counting up the total number of accidents, the total number of fatal accidents, and the total number of flights (departures).

flying.df <- %>%
  summarise(TotalAccidents = sum(Accidents),
            TotalFatalAccidents = sum(FatalAccidents),
            TotalFlights = sum(Departures))
##   TotalAccidents TotalFatalAccidents TotalFlights
## 1            891                  65    282694160

\(Pr(A)\) = \(\frac{\text{# of favorable outomes}}{\text{# of possible outcomes}}\)

\(Pr(Accident)\) = \(\frac{\text{Total Accidents}}{\text{Total Flights}}\)

pr.accident = round(100 * (flying.df$TotalAccidents/flying.df$TotalFlights), 5)
paste0("There is a ", pr.accident, "%", 
       " probability of being in a accident for any given random flight." )
## [1] "There is a 0.00032% probability of being in a accident for any given random flight."

What about being in a fatal accident?

\(Pr(\text{Fatal Accident})\) = \(\frac{\text{Total Fatal Accidents}}{\text{Total Flights}}\)

pr.fatalaccident = round(100 * (flying.df$TotalFatalAccidents/flying.df$TotalFlights), 7)
paste0("There is a ", format(pr.fatalaccident, scientific = FALSE), "%",
       " probability of being in a fatal accident for any given random flight." )
## [1] "There is a 0.000023% probability of being in a fatal accident for any given random flight."

By point of comparison, you have a…

.002% probability of dying in a car crash

.0007% probability of dying from falling on the stairs/steps

.00001% probability of dying from a being bit by a dog

… in any given year.

Ok, so the probability of being in an accident flying commercial is (fortunately) very small.

The probability of dying in a commercial airline accident is also (yeah!) very small.

The two independent probabilities really aren't the interesting questions though.

The interesting question is this…If an accident occurs for a given random commercial airline flight, what is the probability of there being at least one fatality?

To answer this question, we need to move to a discussion of conditional probability.


A = Event

| = "given that"

B = Condition

\(Pr(\text{Fatal accident } | \text{ That a commercial airline accident happened?})\)

Figuring this one out is pretty straightforward, because there cannot be a fatal accident without an accident happening first. So we use our probability formula from before… = round(100 * (flying.df$TotalFatalAccidents/flying.df$TotalAccidents), 2)
paste0("If a commercial airline accident occurs, there is a ",, "%",
       " probability of at least one fatality." )
## [1] "If a commercial airline accident occurs, there is a 7.3% probability of at least one fatality."

So what's so useful about understanding conditional probabilities? Well, think about it as a problem solving/deduction tool.

Lets revisit the article from today about why people quit their jobs.


One of the observations in the article is that job hunting spikes around people's birthdays. So lets assign some values to our formula.

\(A\) = Probability of job hunting

\(B\) = Probability of being my birthday


Putting these values back in our formula, we would read \(Pr(A|B)\) as the probability of my job hunting given that it's my birthday.

In the article, they give us a number (kind of) for this, and it's a 12% increase over the baseline rate of job hunting (which we don't have).

So what can we do with this information? Well, using a little bit of Bayesian logic, we can ask the inverse of the conditional probability.


What is the probability that is my birthday, given that I am job hunting?

\(Pr(A|B)\) = \(\frac{Pr(B|A)Pr(A)}{Pr(B)}\)

Bayes' theorem works like any other algebraic equation. So long as we have at least three parameters, we can solve for the forth.

\(Pr(A|B)\) = \(\frac{Pr(B|A)Pr(A)}{Pr(B)}\)

\(Pr(A)\)…This is called our prior, and in this case it represents the baseline probability that a given individual is job hunting. We get this probability from prior research, logic, separate data analysis, and so forth. For our purposes, lets say that the baseline probability that a person is job hunting is 10%.

\(Pr(A|B)\) = \(\frac{Pr(B|A)Pr(A)}{Pr(B)}\)

\(Pr(A|B)\)…The article actually gave us this data (yeah!), and it's a 12% increase over the baseline rate. So, with our 10% baseline, the probability of job hunting when it's your birthday is 11.2%.

\(Pr(A|B)\) = \(\frac{Pr(B|A)Pr(A)}{Pr(B)}\)

\(Pr(B)\)…This is the probability that any given day is the person's birthday. Leaving aside leap year, on any given day there is a 0.27% chance (1 / 365) that the day is the person's birthday.

So now lets put it together…

\(Pr(A|B)\) = \(\frac{Pr(B|A)Pr(A)}{Pr(B)}\)

\(.12\) = \(\frac{Pr(B|A)(.10)}{(.0027)}\)

\(.12*.0027\) = \(Pr(B|A)(.10)\)

\(0.000324/.10\) = \(Pr(B|A)\)

\(Pr(B|A)\) = 0.00324 = 0.324%

For a given person, the probability that it's his or her birthday given that he or she is job hunting is 0.324%. Recall that the probability of a given day being that person's birthday is 0.27%, and a person job hunting given that it is his or her birthday (\(Pr(A|B)\)) is 11.2%.

The single most important thing to remember about conditional probabilities is this…

\(Pr(A|B)\) \(\ne\) \(Pr(B|A)\)

Lets go beyond birthdays though.

We are going to consider the impact of an individual having a bod boss on the probability that he or she will start hunting for a job.

\(Pr(B)\) = ?

Now lets consider \(Pr(A|B)\). We know from prior research (and the article mentions this too) that a top reason why someone leaves a job is because of a bad boss. It's hard to put an exact number on it, but we can conservatively say that having a bad boss doubles the baseline rate job hunting.

\(Pr(A|B)\) = ?

So now put it together…

\(Pr(A|B)\) = \(\frac{Pr(B|A)Pr(A)}{Pr(B)}\)

What is the probability of having a bad boss given that we see an employee job hunting?

Why would this information be valuable?

As much as I would love to, we don't have the time to go into more depth on conditional probability and Bayes' theorem. What I do want you to remember though is this…

  • Conditional probabilities are extremely valuable in making sense of the contextual factors of a decision.

  • Baseline probabilities can be very misleading, and often don't tell you want you really want to know.

  • Conditional probabilities are also very helpful in scenario analysis and 'what if' conversations.

  • \(Pr(A|B)\) \(\ne\) \(Pr(B|A)\).



What's next?

  • Next Class: Regression and its assumptions
  • Deliverables: Homework 4 Due 9:00AM Wednesday