Developing Hierarchical Models for Sports Analytics

Grasp the intricacies of hierarchical models in the realm of sports analytics. This article presents a comprehensive analysis of these advanced techniques, highlighting their potential in transforming data-driven sports strategies.


AUTHORED BY

Chris Fonnesbeck

DATE

2023-09-26


Introduction

Decision-making in sports has become increasingly data-driven with GPS, cameras, and other sensors providing streams of information at high spatial and temporal resolution. While machine learning is a popular approach for turning these data streams into actionable information, Bayesian statistical methods offer a robust alternative. They allow for the combining of multiple data sources, a natural means for imputing missing data, as well as full accounting for various system uncertainties. In particular, hierarchical models provide a means for integrating information at multiple scales and adjusting for biases associated with small sample sizes. I will demonstrate a Bayesian workflow for model development using PyMC version 5, from data preparation through to the summarization of estimates and predictions, using baseball data.

About Speaker

Chris is the Principal Quantitative Analyst in Baseball Research & Development for the Philadelphia Phillies. He is interested in computational statistics, machine learning, Bayesian methods, and applied decision analysis. He hails from Vancouver, Canada and received his Ph.D. from the University of Georgia.​

Timestamps

00:00:00 Welcome

00:07:24 Presentation begins

00:09:11 Data Science in Baseball

00:09:36 Sabermetrics

00:10:33 Canoncial Baseball statistcs

00:12:02 Advanced metrics

00:13:03 Ball Tracking technology

00:13:44 Trackman

00:14:08 Hawkeye

00:17:36 Bayesian inference

00:18:58 PyMC

00:19:59 Home run rate estimation

00:23:37 Prior predictive checks

00:25:00 Nuts about MCMC

00:28:14 Posterior predictive sampling

00:28:48 Informative priors

00:31:18 Unpooled Model

00:31:40 Hierarchical Model

00:32:16 Partial pooling

00:32:40 HyperPriors

00:32:56 Partial Pooling Model

00:34:06 Group Covariate Model

00:36:12 Park Effects

00:38:24 Model Comparison with Expected Log Predictive Density

00:39:08 Leave One Out Cross Validation

00:40:18 Individual covariates

00:42:03 Variable interactions

00:42:27 Gaussian processes

00:43:55 Accelerated Sampling

00:45:13 Out-Of-Sample Prediction

00:47:05 Prediction Model

00:48:38 Workflow steps

00:50:51 Q/A Could you explain the kernel function ...?

00:52:30 Q/A What is the advantage of ...?

00:54:23 Q/A How would you handle categorical variables in the individual ...?

00:56:37 Q/A How Bayesian analytics is bringing value to ...?

01:00:26 Q/A Can you give insights into how you interact ...?

01:01:40 Q/A Do you have recommended ...?

01:03:32 Q/A Any advice if I'm new and want to improve?

01:04:28 Q/A Does it happen that a selected model is not good at ...?

01:06:13 Q/A Could you comment on the usage of Bayesian decision-making...?

01:08:10 Webinar Ends

Slides

Modeling spatial data with Gaussian processes in PyMC

Using Bayesian decision making

PyMC Labs

Intuitive bayes course

Repository


Work with PyMC Labs

If you are interested in seeing what we at PyMC Labs can do for you, then please email info@pymc-labs.com. We work with companies at a variety of scales and with varying levels of existing modeling capacity. We also run corporate workshop training events and can provide sessions ranging from introduction to Bayes to more advanced topics.