+ - 0:00:00
Notes for current slide
Notes for next slide

Excercise 2

Martin Modrák

2021/08/24 (updated: 2022-01-26)

1 / 17

Install SBC package

If you don't have it already:

install.package("remotes")
remotes::install_github("hyunjimoon/SBC")

The package is already installed if you've ran the setup script.

2 / 17

Overview of the SBC package

3 / 17

Code notes

We expect you to copy-paste code and store it in any way you prefer working with R either:

  • Directly paste to console
  • Create a script file and paste there
  • Paste into chunks in a .Rmd file

Stan models should generally go into a specific Stan file (names given on the slides).

4 / 17

Task 1/6 - a Stan model

A Stan model (already saved in gamma.stan if you ran the setup script, otherwise copy and paste to that file)

data {
int N;
vector<lower=0>[N] y;
}
parameters {
real<lower = 0> shape;
real<lower = 0> scale;
}
model {
y ~ gamma(shape, scale);
shape ~ lognormal(0, 1);
scale ~ lognormal(0, 1.5);
}
5 / 17

Task 2/6 - build a backend

Using cmdstanr:

library(cmdstanr)
library(SBC)
model_gamma <- cmdstan_model("gamma.stan")
backend_gamma <- SBC_backend_cmdstan_sample(
model_gamma, # <- model is first argument
# additional args passed to to model$sample() directly
iter_warmup = 500, iter_sampling = 1000, chains = 2)

Using rstan:

library(rstan)
library(SBC)
model_gamma <- stan_model("gamma.stan", model_name = "gamma_mod")
backend_gamma <- SBC_backend_rstan_sample(
model_gamma, # <- model is first argument
# additional args passed to sampling directly
iter = 1500, warmup = 1000, chains = 2)
6 / 17

Task 3/6 - generating a single simulation

In a format the SBC package undertands...

# N - the number of observed datapoints
single_sim_gamma <- function(N) {
shape <- rlnorm(n = 1, meanlog = 0, sdlog = 1)
scale <- rlnorm(n = 1, meanlog = 0, sdlog = 1.5)
y <- rgamma(N, shape = shape, scale = scale)
list(
# True, unobserved
variables = list(
shape = shape,
scale = scale),
# Observed data, passed directly to Stan
generated = list(
N = N,
y = y)
)
}

Now calling single_sim_gamma(<insert number here>) creates a single dataset. Try it! Do you understand the output?

7 / 17

Question 1/2

Do you see a mismatch between the way single_sim_gamma creates data match the Stan model?

If yes, write down your guess!

Don't spend too much time looking for it, if you don't see it - SBC is going to help us!

8 / 17

Task 4/6 - generating multiple simulations

Here we build a "generator" object that can be used to create multiple simulations and combine them together into an SBC_datasets object.

generator_gamma <- SBC_generator_function(
# Passing single_sim_gamma as a parameter,
# not calling it
single_sim_gamma,
# Additional arguments are passed to the function
# Here we only use N
N = 40)
datasets_gamma <- generate_datasets(
generator_gamma,
n_sims = 10)

It is good practice to start with just a few simulations to quickly check that the model roughly works without wasting too much time.

For this problem 10 simulations should be more than enough.

9 / 17

Task 5/6 - run SBC

results_gamma <- compute_SBC(
datasets_gamma, backend_gamma)

You may get a warning similar to:

- 10 (100%) fits had some steps rejected.
Maximum number of rejections was 5.

Here, we can safely ignore it (there were numerical issues limited to early warmup), but in general inspecting the results further would be warranted.

10 / 17

Task 6/6 - plot results

You can plot the results in various formats:

plot_rank_hist(results_gamma, bins = 10)
plot_ecdf(results_gamma)
plot_ecdf_diff(results_gamma)

Question 2/2

What do the results tell you? What is the problem with the model?

Note: there is a low probability you'll get unlucky and your plots will look completely OK. If that happens, just rerun the dataset generation step and the SBC step.

No idea what the problem is? That's OK. There's a hint on the next slide!

11 / 17

Question 2/2 - Hint 1

Compare the intro to Wikipedia on Gamma distribution with Stan manual on Gamma distribution

Still no idea? Don't worry, this stuff is hard. One more hint on the next slide.

12 / 17

Question 2/2 - Hint 2

Gamma distribution has multiple parametrizations which are we using?

13 / 17

Bonus Task 1

Create a copy of gamma.stan called gamma2.stan. Modify it to match the simulator. (don't know how? there's a hint on the next slide)

Create a new backend (backend_gamma2) with the modified model and rerun SBC with the same simulations:

results_gamma2 <- compute_SBC(
datasets_gamma, backend_gamma2)
14 / 17

Bonus Task 1 - Hint

Transform the parameter the model currently uses into one needed for Stan's parametrization in your model or transformed parameters block. You don't need to change anything else.

15 / 17

Bonus Task 2

z-score is another measure on how extreme the "true" simulated value is with respect to the posterior. It is the difference between the true value and posterior mean standardized by dividing by the standard deviations of the posterior.

z=θtruemeanpostsdpost

You can inspect a detailed report of the "true" simulated values, their rank and z-score in the posterior and summaries of the posterior distribution via results_gamma$stats.

Would the z-score column let you figure out the problem more easily than the plots? Would inspecting the statistics let you see the failure with fewer simulations than using the plots?

16 / 17

Bonus Task 3

Play with N (the number of datapoints) when creating generator_gamma and n_sims (the number of simulations) when creating datasets_gamma.

Which values let you reliably detect the problem? Which values can keep the problem hidden?

17 / 17

Install SBC package

If you don't have it already:

install.package("remotes")
remotes::install_github("hyunjimoon/SBC")

The package is already installed if you've ran the setup script.

2 / 17
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow