At the beginning I tried the traditional approach of first teaching probability and then statistics, but it didn’t work. Students had a hard time connecting probabilistic concepts to statistical techniques, which often forced me to cover those concepts all over again.
Because of this, I decided to interleave probability and statistics from the beginning, showing how to estimate each probabilistic object (probabilities, probability mass function, probability density function, mean, variance, etc.) from data after its theoretical definition. For example, I covered nonparametric and parametric estimation (e.g. histograms, kernel density estimation and maximum likelihood) right after introducing the probability density function. This allowed me to use real-data examples throughout, which is something students had consistently asked for.
I also decided to interleave causal inference from the beginning. This can be challenging, as some of the concepts are a bit tricky, but it exposes students to the challenges of interpreting conditional probabilities and averages in the real world straight away, which I think is worth it.
I didn’t find any material that allowed me to perform this restructuring, so I wrote my own notes and eventually a book following this philosophy. The materials include a free pdf, Python code for the real-data examples, solutions to the exercises, and supporting videos and slides.