+ - 0:00:00
Notes for current slide
Notes for next slide

SBC Intro

Martin Modrák

2021/08/24 (updated: 2022-01-26)

1 / 13

This work was supported by ELIXIR CZ research infrastructure project (MEYS Grant No: LM2018131) including access to computing and storage facilities.

2 / 13

Model workflow

  • Prior predictive checks
3 / 13

Model workflow

  • Prior predictive checks

  • Validate computation

3 / 13

Model workflow

  • Prior predictive checks

  • Validate computation

  • Evaluate model fit to real data

    • Posterior predictive checks
3 / 13

Model workflow

  • Prior predictive checks

  • Validate computation

  • Evaluate model fit to real data

    • Posterior predictive checks
  • Use the model

3 / 13

Model workflow

  • Prior predictive checks

  • Validate computation ⬅ SBC lives here

  • Evaluate model fit to real data

    • Posterior predictive checks
  • Use the model

4 / 13

SBC Context

5 / 13

SBC Context

  • SBC is a tool to check you implemented your model correctly.
5 / 13

SBC Context

  • SBC is a tool to check you implemented your model correctly.

  • It just one of tools to validate your model in a Bayesian workflow

5 / 13

SBC Context

  • SBC is a tool to check you implemented your model correctly.

  • It just one of tools to validate your model in a Bayesian workflow

  • SBC can be run even before you collect data

5 / 13

SBC Context

  • SBC is a tool to check you implemented your model correctly.

  • It just one of tools to validate your model in a Bayesian workflow

  • SBC can be run even before you collect data

  • The goal of this tutorial is to show you that with the SBC package, there is little cost to including (some form of) SBC in your everyday modelling workflow.

5 / 13

SBC Context

  • SBC is a tool to check you implemented your model correctly.

  • It just one of tools to validate your model in a Bayesian workflow

  • SBC can be run even before you collect data

  • The goal of this tutorial is to show you that with the SBC package, there is little cost to including (some form of) SBC in your everyday modelling workflow.

  • In this tutorial, we'll let you use SBC to find and diagnose a bug.

5 / 13

Two types of problems with Stan model

  1. Bug in model
6 / 13

Two types of problems with Stan model

  1. Bug in model

  2. Data - model mismatch

6 / 13

Two types of problems with Stan model

  1. Bug in model

  2. Data - model mismatch

We will focus on 1.

6 / 13

Simulation to the rescue!

7 / 13

Simulation to the rescue!

How to seperate bugs from model-data mismatch?

7 / 13

Simulation to the rescue!

How to seperate bugs from model-data mismatch?

  • If we simulate data exactly as the model assumes, any problem has to be a bug.
7 / 13

Stan

data {
int<lower=0> N;
vector<lower=0>[N] y;
vector[N] x;
}
parameters {
real a;
real b;
real<lower=0> sigma;
}
model {
vector[N] mu = a + b * x;
a ~ normal(5, 3);
b ~ normal(0, 1);
sigma ~ normal(0, 2);
y ~ lognormal(mu, sigma);
}

R

# N and x is not modelled,
# choose to match your actual data
N <- 10
x <- rnorm(n = N, mean = 0, sd = 2)
a <- rnorm(n = 1, mean = 5, sd = 3)
b <- rnorm(n = 1, mean = 0, sd = 1)
sigma <- abs(
rnorm(n = 1, mean = 0, sd = 2))
mu <- a + b * x
y <- rlnorm(n = N,
meanlog = mu, sdlog = sigma)
8 / 13

How to spot failure?

variable mean sd q5 q95
alpha 0.372 0.224 0.00794 0.737

Is this OK, if we simulated

  • alpha = 0.3
9 / 13

How to spot failure?

variable mean sd q5 q95
alpha 0.372 0.224 0.00794 0.737

Is this OK, if we simulated

  • alpha = 0.3

  • alpha = 0.007

9 / 13

How to spot failure?

variable mean sd q5 q95
alpha 0.372 0.224 0.00794 0.737

Is this OK, if we simulated

  • alpha = 0.3

  • alpha = 0.007

  • alpha = -30

9 / 13

How to spot failure?

variable mean sd q5 q95
alpha 0.372 0.224 0.00794 0.737

Is this OK, if we simulated

  • alpha = 0.3

  • alpha = 0.007

  • alpha = -30

Diagnostics (divergences, Rhat, ESS)

How to spot success?!

9 / 13

Consistency requirement

"In 95% of simulations, the true variable lies within the central 95% posterior credible interval."

  • Variable = parameter or a function of parameter(s)
10 / 13

Consistency requirement

"In 95% of simulations, the true variable lies within the central 95% posterior credible interval."

  • Variable = parameter or a function of parameter(s)

"In x% of simulations, the true varible lies within the x% posterior credible interval (of any kind)"

10 / 13

Consistency requirement

"In 95% of simulations, the true variable lies within the central 95% posterior credible interval."

  • Variable = parameter or a function of parameter(s)

"In x% of simulations, the true varible lies within the x% posterior credible interval (of any kind)"

SBC for single parameter:

10 / 13

Consistency requirement

"In 95% of simulations, the true variable lies within the central 95% posterior credible interval."

  • Variable = parameter or a function of parameter(s)

"In x% of simulations, the true varible lies within the x% posterior credible interval (of any kind)"

SBC for single parameter:

  1. Thin posterior to get \(S\) independent samples.
10 / 13

Consistency requirement

"In 95% of simulations, the true variable lies within the central 95% posterior credible interval."

  • Variable = parameter or a function of parameter(s)

"In x% of simulations, the true varible lies within the x% posterior credible interval (of any kind)"

SBC for single parameter:

  1. Thin posterior to get \(S\) independent samples.

  2. For each simulation take the rank of the true value within the samples

    • Rank: no. of samples < true value
10 / 13

Consistency requirement

"In 95% of simulations, the true variable lies within the central 95% posterior credible interval."

  • Variable = parameter or a function of parameter(s)

"In x% of simulations, the true varible lies within the x% posterior credible interval (of any kind)"

SBC for single parameter:

  1. Thin posterior to get \(S\) independent samples.

  2. For each simulation take the rank of the true value within the samples

    • Rank: no. of samples < true value
  3. Across simulations, this rank should be uniformly distributed between \(0\) and \(S\)

10 / 13

SBC visualisations - Ranks

11 / 13

SBC visualisations - ECDF

a.k.a. "black line outside blue region is a problem"

12 / 13

SBC visualisations - ECDF diff

a.k.a. "black line outside blue region is a problem, rotated for readability"

13 / 13

This work was supported by ELIXIR CZ research infrastructure project (MEYS Grant No: LM2018131) including access to computing and storage facilities.

2 / 13
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow