Using brms to model reaction times contaminated with errors

Nathaniel Haines made a neat tweet showing off his model of reaction times that handles possible contamination with both implausibly short reaction times (e.g., if people make an anticipatory response that is not actually based on processing the stimulus of interest) or implausibly large reaction times (e.g., if their attention drifts away from the task, but they snap back to it after having “zoned out” for a few seconds). Response times that arise from such processes are not actually what we aim to measure in most cognitive tasks — we are instead interested in how people process and respond to a particular stimulus. Therefore, by explicitly modeling these “contamination” response times, we can get better estimates of the decision-making parameters that we actually care about. Such a model often makes more sense than just throwing away a part of the data.

Several people asked, if you can do that in brms. This started a vortex of productive procrastrination on my side - it sure should be easy to do this, right? And while Nathaniel didn’t have a brms code ready, I assure you that, yes, it is possible in brms, it is not completely straightforward, but I’ll show you the ~~path~~ code.

Nathaniel was kind enough to provide a bit of feedback on the post (I have no experience with reaction-time data or cogsci in general), but I should repeat that the clarity of the idea is his while all errors are mine. The overall idea of using mixtures of “real” and “contaminating” distributions to model responses is however not new - see e.g. Ratcliff & Tuerlinckx 2002.

In this model we will take a shifted lognormal representing the actual decision process and a uniform distribution modelling the contamination. For this to make sense, we need to have some upper limit on the possible contamination times, representing the maximum times we could have observed. In most cases, the limit should be larger than the maximum time we observed, although this is not strictly mathematically necessary. We then assume that each trial has a small probability of being contaminated.

Here is how generating a single data point for such a model could look in R code:

shift <- 0.1 # Shortest reaction time possible if not contaminated
mu <- log(0.5)
sigma <- 0.6
mix <- 0.06 # Probability of contamination
upper <- 5 # Maximum time of contamination

if(runif(1) < mix) {
  # Contaminated
  y <- runif(1, 0, upper)
} else {
  # Non-contaminated
  y <- shift + rlnorm(1, mu, sigma)
}

The same could be expressed in math as:

$y_{i} = {\begin{cases} u_{i} & i f z_{i} = 0 \\ s_{i} + r_{i} & i f z_{i} = 1 \end{cases} u_{i} \sim U n i f o r m (0, α) \log (r_{i}) \sim N o r m a l (μ_{i}, σ) P (z_{i} = 0) = θ$

Where $θ$ corresponds to mix, $α$ to upper and $s_{i}$ to shift.

Technically, the non-contaminated signal is allowed to take values larger than upper. In practice we would however usually want upper to be large enough that larger values do not really occur.

There is one important detail in how brms does handle the shifted lognormal: brms does treat shift as unknown and estimates it, but does not allow the shift parameter to be larger than any actually observed y. We will therefore mimic this behaviour, but since we also have the contamination process, shift can in principle be larger than some y. This can potentially introduce problems for the sampler as the posterior density is not smooth when shift crosses some of the observed y values (the lognormal component is added/removed, resulting in a sharp change).

It however turns out that if shift crossing some y is rare enough, the sampling works just fine. To ensure this rarity we introduce max_shift as the upper bound for shift. In most cases, this will be the same for the whole dataset. Instead of shift, the model would then work with shiftprop = shift / max_shift - a value between 0 and 1 that is easier to work with mathematically.

Of the model parameters, we take max_shift and upper as known (but possibly differ between observations) while mu, sigma, mix and shiftprop are to be estimated and can depend on predictors. However, shiftprop is a bit complicated here and the model will make most sense if observations that have different max_shift are also allowed to have different shiftprop by putting a suitable predictor on shiftprop. Different shiftprop with the same max_shift is however definitely not an issue. So while you need to be careful with varying max_shift, varying shiftprop is OK, just note the implied logit scale. For a review on why varying shift might be important see e.g. Dully, McGovern & O’Connell 2018.

For some use cases, one could also want to set the lower bound of the contamination distribution. To keep things simple we don’t do that here, but basically the same result can then be achieved by adding/subtracting a suitable number to the response (y) and bounds (max_shift, upper)

Some experimental designs also involve a limit on the maximum time the response could have taken. In such contexts, it might make sense to treat the values as right-censored. brms supports censoring for most families, so we want our implementation to be compatible with it.

Our goal is that at the end of the post we will be able to write models like

brm(bf(y | vreal(max_shift, upper) + cens(censoring) ~ 1 + condition + (1 | subject_id),
       sigma ~ condition,
       mix ~ (1 | subject_id),
       family = RTmixture), ...)

And let brms handle all the rest. The final result, packaged in a single file you can just load into your project is at https://github.com/martinmodrak/blog/blob/master/content/post/RTmixture.R However, be advised that the code was only quite shallowly tested, so look both ways before crossing and test if you can recover parameters from simulated data before trusting me too much.

You may know that, brms has good support for mixtures, so why not just write family = mixture(uniform, shifted_lognormal)? It turns out brms has as one of its core assumptions that every family has at least one parameter to be estimated - our uniform distribution for the contamination parameter however does not have that and thus cannot be used with brms directly. So instead we’ll have to implement a full blown custom family.

The necessary background for implementing custom families in brms can be found in the vignette on custom distributions. Here, we will explain only the more weird stuff.

Setting up

Let’s set up and get our hands dirty.

library(cmdstanr)
library(brms)
library(tidyverse)
library(knitr)
library(patchwork)
library(bayesplot)

source("RTmixture.R")

ggplot2::theme_set(cowplot::theme_cowplot())
options(mc.cores = parallel::detectCores(), brms.backend = "cmdstanr")

cache_dir <- "_RTmixture_cache"
if(!dir.exists(cache_dir)) {
  dir.create(cache_dir)
}

First, we’ll generate some fake data to test the model against. Below is just a more concise and optimized version of the random generation scheme I showed earlier.

rRTmixture <- function(n, meanlog, sdlog, mix, shift, upper) { 
  ifelse(runif(n) < mix,  
         runif(n, 0, upper),  
         shift + rlnorm(n, meanlog = meanlog, sdlog = sdlog)) 
}

Then let us generate some data

set.seed(31546522)
# Bounds of the data
max_shift <- 0.3
shift <- runif(1) * max_shift
upper <- 10
mix <- 0.1

N <- 100
Intercept <- 0.3
beta <- 0.5
X <- rnorm(N)
mu <- rep(Intercept, N) + beta * X
sigma <- 0.5

rt <- rRTmixture(N, meanlog = mu, sdlog = sigma, mix = mix, shift = shift, upper = upper)

dd <- data.frame(rt = rt, x = X, max_shift = max_shift, upper = upper)

Looking nice!

ggplot(dd, aes(x = rt)) + geom_density()

Core of the family

Now we need the Stan implementation of the family. That is probably the most technical part. Stan user’s guide has some background on mixture models in Stan. We’ll note that times before shift can only come from the uniform component and times after upper can only come from the lognormal component. For others we mix both a lognormal and the uniform via log_mix.

With the Stan code ready, we then define the parameters of the distribution in a way that brms understands.

stan_funs_base <- stanvar(block = "functions", scode = " 
  real RTmixture_lpdf(real y, real mu, real sigma, real mix,  
                      real shiftprop, real max_shift, real upper) { 
    real shift = shiftprop * max_shift; 
    if(y <= shift) { 
      // Could only be created by the contamination 
      return log(mix) + uniform_lpdf(y | 0, upper); 
    } else if(y >= upper) { 
      // Could only come from the lognormal 
      return log1m(mix) + lognormal_lpdf(y - shift | mu, sigma); 
    } else { 
      // Actually mixing 
      real lognormal_llh = lognormal_lpdf(y - shift | mu, sigma); 
      real uniform_llh = uniform_lpdf(y | 0, upper); 
      return log_mix(mix, uniform_llh, lognormal_llh); 
    } 
  } 
 
") 
 
 
 
RTmixture <- custom_family( 
  "RTmixture",  
  dpars = c("mu", "sigma", "mix", "shiftprop"), # Those will be estimated 
  links = c("identity", "log", "logit", "logit"), 
  type = "real", 
  lb = c(NA, 0, 0, 0), # bounds for the parameters  
  ub = c(NA, NA, 1, 1), 
  vars = c("vreal1[n]", "vreal2[n]") # Data for max_shift and upper (known) 
)

And we are ready to fit! We will put a weakly informative beta(1,5) prior on the proportion of contamination - this means we a prior believe that there is a 95% chance that the contamination is lower than qbeta(0.95, 1, 5) = 0.4507197. One could definitely be justified in tightening this prior even further toward zero for many tasks. vreal is just brms’s way of annotating arbitrary additional data for the distribution. We need to pass both the family and the associated stanvars.

fit_mix <- brm(rt | vreal(max_shift, upper) ~ x, data = dd, family = RTmixture, 
               stanvars = stan_funs_base, 
               refresh = 0,
               file = paste0(cache_dir, "/mix"), file_refit = "on_change",
               prior = c(prior(beta(1, 5), class = "mix")))
fit_mix

##  Family: RTmixture 
##   Links: mu = identity; sigma = identity; mix = identity; shiftprop = identity 
## Formula: rt | vreal(max_shift, upper) ~ x 
##    Data: dd (Number of observations: 100) 
##   Draws: 4 chains, each with iter = 1000; warmup = 0; thin = 1;
##          total post-warmup draws = 4000
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept     0.33      0.07     0.20     0.48 1.00     2097     2242
## x             0.44      0.05     0.33     0.54 1.00     2368     2460
## 
## Family Specific Parameters: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma         0.45      0.05     0.36     0.56 1.00     2379     1976
## mix           0.13      0.05     0.05     0.24 1.00     3159     2589
## shiftprop     0.82      0.18     0.32     1.00 1.00     1640     1623
## 
## Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

We note that we have quite good recovery of the effect of x (simulated as 0.5) and of sigma (which was 0.5), but 100 observations are not enough to constrain the mix parameter really well (simulated as 0.1).

For comparison, we also fit the default shifted lognormal as implemented in brms.

fit_base <- brm(rt ~ x, data = dd, family = shifted_lognormal, refresh = 0,
                file = paste0(cache_dir, "/base"), file_refit = "on_change")

fit_base

##  Family: shifted_lognormal 
##   Links: mu = identity; sigma = identity; ndt = identity 
## Formula: rt ~ x 
##    Data: dd (Number of observations: 100) 
##   Draws: 4 chains, each with iter = 1000; warmup = 0; thin = 1;
##          total post-warmup draws = 4000
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept     0.34      0.09     0.16     0.53 1.00     1958     1949
## x             0.47      0.07     0.33     0.61 1.00     2711     2495
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     0.69      0.07     0.56     0.82 1.00     1883     1802
## ndt       0.34      0.07     0.17     0.44 1.00     1634     1420
## 
## Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

We see that the inferences for sigma are a bit biased but this is not necessarily only due to the mixture, another potentially biasing is the different handling of the shift.

Censoring + constant shift

To support censoring in brms the family has to come with log CDF (cumulative distribution function) and log CCDF (complementary CDF) implementations in Stan, which we provide below. Those match the _lpdf pretty closely.

 
stan_funs <- stan_funs_base + stanvar(block = "functions", scode = " 
  real RTmixture_lcdf(real y, real mu, real sigma, real mix,  
                      real shiftprop, real max_shift, real upper) { 
    real shift = shiftprop * max_shift; 
    if(y <= shift) { 
      return log(mix) + uniform_lcdf(y | 0, upper); 
    } else if(y >= upper) { 
      // The whole uniform part is below, so the mixture part is log(1) = 0 
      return log_mix(mix, 0, lognormal_lcdf(y - shift | mu, sigma)); 
    } else { 
      real lognormal_llh = lognormal_lcdf(y - shift | mu, sigma); 
      real uniform_llh = uniform_lcdf(y | 0, upper); 
      return log_mix(mix, uniform_llh, lognormal_llh); 
    } 
  } 
   
  real RTmixture_lccdf(real y, real mu, real sigma, real mix,  
                      real shiftprop, real max_shift, real upper) { 
 
    real shift = shiftprop * max_shift; 
    if(y <= shift) { 
      // The whole lognormal part is above, so the mixture part is log(1) = 0 
      return log_mix(mix, uniform_lccdf(y | 0, upper), 0); 
    } else if(y >= upper) { 
      return log1m(mix) + lognormal_lccdf(y - shift | mu, sigma); 
    } else { 
      real lognormal_llh = lognormal_lccdf(y - shift | mu, sigma); 
      real uniform_llh = uniform_lccdf(y | 0, upper); 
      return log_mix(mix, uniform_llh, lognormal_llh); 
    } 
 
  } 
")

To test if this work, we’ll do quite aggressive censoring and treat anything larger than 1.5 as censored. In most cases it makes sense to have upper be the same as the censoring bound, so we’ll do that

set.seed(25462255)
shift <- 0.15
cens_bound <- upper <- 1.5
mix <- 0.08

N <- 110
Intercept <- 0.5
beta <- -0.3
X <- rnorm(N)
mu <- rep(Intercept, N) + beta * X
sigma <- 0.4

rt <- rRTmixture(N, meanlog = mu, sdlog = sigma, 
                 mix = mix, shift = shift, upper = upper)
censored <- rt > cens_bound
rt[censored] <- cens_bound

dd_cens <- data.frame(rt = rt, 
                      censored = if_else(censored, "right", "none"),  
                      x = X, max_shift = shift, upper = upper)

Finally, this model starts to be problematic if we try to estimate shift (well, actually shiftprop) as well. An easy way to to make shift always equal to max_shift is to set a constant prior on shiftprop, as we do below.

fit_mix_cens <- brm(rt | vreal(max_shift, upper) + cens(censored) ~ x, 
                    data = dd_cens, 
                    family = RTmixture, 
                    stanvars = stan_funs, 
                    refresh = 0,
                    file = paste0(cache_dir, "/mix_cens"), 
                    file_refit = "on_change",
                    prior = c(prior(beta(1, 5), class = "mix"),
                              prior(constant(1), class = "shiftprop")))
fit_mix_cens

##  Family: RTmixture 
##   Links: mu = identity; sigma = identity; mix = identity; shiftprop = identity 
## Formula: rt | vreal(max_shift, upper) + cens(censored) ~ x 
##    Data: dd_cens (Number of observations: 110) 
##   Draws: 4 chains, each with iter = 1000; warmup = 0; thin = 1;
##          total post-warmup draws = 4000
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept     0.46      0.08     0.34     0.65 1.00     2022     1674
## x            -0.23      0.07    -0.38    -0.12 1.00     2094     1606
## 
## Family Specific Parameters: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma         0.36      0.07     0.26     0.53 1.00     2019     2565
## mix           0.15      0.05     0.07     0.25 1.00     2943     2595
## shiftprop     1.00      0.00     1.00     1.00   NA       NA       NA
## 
## Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

It works and the inferences are reasonably close to what we simulated with. A more thorough evaluation would require simulation-based calibration, which would be nice, but would require a bit more energy than I have now. But it seems that at least the models are not completely wrong.

If you want to model varying shift but having issues fitting, it might make sense to adjust max_shift on a per-group basis to have max_shift larger only than a small proportion of observations in this group. As noted above, if you set different max_shift per for example subject_id, you should also have shiftprop ~ subject_id or the model might not make sense.

Making predictions

We successfully fitted a few models, but there are some tweaks we need to do to make full use of the family. We might for example want to make predictions - e.g. to make posterior predictive checks - so we also need to implement prediction code. You’ll notice that we are just extracting the parameters from the prepared predictions and passing those to the generator function we defined earlier.

 
posterior_predict_RTmixture <- function(i, prep, ...) { 
  if((!is.null(prep$data$lb) && prep$data$lb[i] > 0) ||  
     (!is.null(prep$data$ub) && prep$data$ub[i] < Inf)) { 
    stop("Predictions for truncated distributions not supported") 
  }   
   
  mu <- brms:::get_dpar(prep, "mu", i = i) 
  sigma <- brms:::get_dpar(prep, "sigma", i = i) 
  mix <- brms:::get_dpar(prep, "mix", i = i) 
  shiftprop <- brms:::get_dpar(prep, "shiftprop", i = i) 
   
  max_shift <- prep$data$vreal1[i] 
  upper <- prep$data$vreal2[i] 
  shift = shiftprop * max_shift 
   
  rRTmixture(prep$ndraws, meanlog = mu, sdlog = sigma,  
             mix = mix, shift = shift, upper = upper) 
}

Note that the get_dpar helper that simplifies some bookeeping is currently internal in brms, but will be exposed in upcoming release.

With that, we can do a posterior predictive check for both models. We use only single core for predictions, because on Windows, multicore is slow and will not be able to access the custom prediction functions.

pp_mix <- pp_check(fit_mix, type = "dens_overlay", ndraws = 100,  cores = 1)  +
  ggtitle("Mixture")
pp_base <- pp_check(fit_base, type = "dens_overlay", ndraws = 100,  cores = 1) +
  ggtitle("Shifted lognormal")
pp_mix / pp_base

For this dataset, the mixture is not doing that much in improving the bulk of the predictions, but it manages to avoid the very long tail the lognormal-only model needs to accomodate the larger values.

We might also look at checks of the censored model. brms does not directly support predicting censored variables (because the data passed to the model are not enough to completely determine all censoring), but we can easily do this manually:

set.seed(123566)
pred_cens <- posterior_predict(fit_mix_cens, cores = 1)
pred_cens_cens <- pred_cens
# Do the censoring
pred_cens_cens[pred_cens > cens_bound] <- cens_bound 
samples_dens <- sample(1:(dim(pred_cens)[1]), size = 50)
ppc_cens1 <- ppc_dens_overlay(dd_cens$rt, pred_cens_cens[samples_dens,])  + 
  ggtitle("Censored dataset")
ppc_cens2 <- ppc_stat(1.0 * (dd_cens$censored == "right"), 
                      1.0 * (pred_cens >= cens_bound), 
                      binwidth = 0.02) + 
  ggtitle("Proportion censored")

ppc_cens1 + ppc_cens2

The model seems to do OK.

Using loo

Similarly, we might want to do model comparison or stacking with loo, so we also implement the log_lik function.

## Needed for numerical stability 
## from http://tr.im/hH5A 
logsumexp <- function (x) { 
  y = max(x) 
  y + log(sum(exp(x - y))) 
} 
 
 
RTmixture_lpdf <- function(y, meanlog, sdlog, mix, shift, upper) { 
  unif_llh = dunif(y , min = 0, max = upper, log = TRUE) 
  lognormal_llh = dlnorm(y - shift, meanlog = meanlog, sdlog = sdlog, log = TRUE) -  
    plnorm(upper - shift, meanlog = meanlog, sdlog = sdlog, log.p = TRUE) 
   
   
  # Computing logsumexp(log(mix) + unif_llh, log1p(-mix) + lognormal_llh)     
  # but vectorized 
  llh_matrix <- array(NA_real_, dim = c(2, max(length(unif_llh), length(lognormal_llh)))) 
  llh_matrix[1,] <- log(mix) + unif_llh 
  llh_matrix[2,] <- log1p(-mix) + lognormal_llh 
  apply(llh_matrix, MARGIN = 2, FUN = logsumexp) 
} 
 
log_lik_RTmixture <- function(i, draws) { 
  mu <- brms:::get_dpar(draws, "mu", i = i) 
  sigma <- brms:::get_dpar(draws, "sigma", i = i) 
  mix <- brms:::get_dpar(draws, "mix", i = i) 
  shiftprop <- brms:::get_dpar(draws, "shiftprop", i = i) 
   
  max_shift <- draws$data$vreal1[i] 
  upper <- draws$data$vreal2[i] 
  shift = shiftprop * max_shift 
   
  y <- draws$data$Y[i] 
  RTmixture_lpdf(y, meanlog = mu, sdlog = sigma,  
                 mix = mix, shift = shift, upper = upper) 
   
}

And now, we can compare the models:

fit_mix <- add_criterion(fit_mix, "loo", cores = 1)
fit_base <- add_criterion(fit_base, "loo", cores = 1)
loo_compare(fit_mix, fit_base)

##          elpd_diff se_diff
## fit_mix   0.0       0.0   
## fit_base -7.5       5.3

No surprise here - we simulated the data with the mixture model and indeed, this is preferred to a different model. Also, the shifted-lognormal model has one very influential observation, which turns out to be the smallest observed reaction time.

dd$rt[fit_base$criteria$loo$diagnostics$pareto_k > 0.7]

## [1] 0.4909057

min(dd$rt)

## [1] 0.4909057

This once again shows that the lognormal has problem accomodating both the high and low contamination (while it is plausible it could accomodate a small amount of just high or just low contamination quite well).

Crazy models

Since brms is great, we can now do all sorts of stuff like put predictors on the mix parameter - e.g. to get a per-subject estimate of the amount of contamination.

To do this, we’ll also put a weakly informative prior on the intercept for the mixture that assumes low contamination and we don’t expect huge variability in the amount of contamination (with wider priors the model starts to diverge as we would need much more data to constrain it well).

set.seed(35486622)
dd_subj <- dd_cens
dd_subj$subject_id <- sample(1:12, size = nrow(dd_cens), replace = TRUE)

fit_mix_all <- brm(
  bf(rt | vreal(max_shift, upper) + cens(censored) ~ x, 
     mix ~ 1 + (1 | subject_id),
     family = RTmixture),
  data = dd_subj,
  stanvars = stan_funs, 
               refresh = 0,
               file = paste0(cache_dir, "/mix_all"), file_refit = "on_change",
               prior = c(prior(normal(-3, 1), class = "Intercept", dpar = "mix"),
                         prior(normal(0,0.5), class = "sd", dpar = "mix"),
                         prior(constant(1), class = "shiftprop")))

fit_mix_all

##  Family: RTmixture 
##   Links: mu = identity; sigma = identity; mix = logit; shiftprop = identity 
## Formula: rt | vreal(max_shift, upper) + cens(censored) ~ x 
##          mix ~ 1 + (1 | subject_id)
##    Data: dd_subj (Number of observations: 110) 
##   Draws: 4 chains, each with iter = 1000; warmup = 0; thin = 1;
##          total post-warmup draws = 4000
## 
## Group-Level Effects: 
## ~subject_id (Number of levels: 12) 
##                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(mix_Intercept)     0.38      0.29     0.02     1.07 1.00     2706     2137
## 
## Population-Level Effects: 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept         0.45      0.07     0.33     0.62 1.00     3586     2305
## mix_Intercept    -2.05      0.44    -2.96    -1.25 1.00     4207     2807
## x                -0.22      0.06    -0.37    -0.11 1.00     3715     2666
## 
## Family Specific Parameters: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma         0.37      0.07     0.26     0.54 1.00     3534     3025
## shiftprop     1.00      0.00     1.00     1.00   NA       NA       NA
## 
## Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Checking that posterior predictions work:

set.seed(1233354)
pred_cens <- posterior_predict(fit_mix_all, cores = 1)
pred_cens_cens <- pred_cens
pred_cens_cens[pred_cens > cens_bound] <- cens_bound 
samples_dens <- sample(1:(dim(pred_cens)[1]), size = 50)
ppc_dens_overlay(dd_cens$rt, pred_cens_cens[samples_dens,])

We can also do multivariate models where some of the predictors are correlated across answers:

set.seed(0245562)
# Build a dataset containing two separate predictions
dd_both <- dd
dd_both$rt2 <- dd_cens$rt[1:nrow(dd_both)]
dd_both$x2 <- dd_cens$x[1:nrow(dd_both)]
dd_both$censored2 <- dd_cens$censored[1:nrow(dd_both)]
dd_both$max_shift2 <- dd_cens$max_shift[1:nrow(dd_both)]
dd_both$upper2 <- dd_cens$upper[1:nrow(dd_both)]
dd_both$subject_id <- sample(1:12, size = nrow(dd_both), replace = TRUE)

fit_mix_multivar <- brm(
  bf(rt | vreal(max_shift, upper)  ~ x, 
     mix ~ 1 + (1 | p | subject_id),
     family = RTmixture) +
  bf(rt2 | vreal(max_shift2, upper2) + cens(censored2) ~ x2, 
   mix ~ 1 + (1 | p | subject_id),
     family = RTmixture),
  data = dd_both,
  stanvars = stan_funs, 
  refresh = 0,
  file = paste0(cache_dir, "/mix_multivar"), file_refit = "on_change",
  prior = c(prior(normal(-3, 1), class = "Intercept", dpar = "mix", resp = "rt"),
           prior(normal(0,0.5), class = "sd", dpar = "mix", resp = "rt"),
           prior(constant(1), class = "shiftprop", resp = "rt"),
           prior(normal(-3, 1), class = "Intercept", dpar = "mix", resp = "rt2"),
           prior(normal(0,0.5), class = "sd", dpar = "mix", resp = "rt2"),
           prior(constant(1), class = "shiftprop", resp = "rt2")
           ),
  adapt_delta = 0.95
  )

## Setting 'rescor' to FALSE by default for this model

fit_mix_multivar

##  Family: MV(RTmixture, RTmixture) 
##   Links: mu = identity; sigma = identity; mix = logit; shiftprop = identity
##          mu = identity; sigma = identity; mix = logit; shiftprop = identity 
## Formula: rt | vreal(max_shift, upper) ~ x 
##          mix ~ 1 + (1 | p | subject_id)
##          rt2 | vreal(max_shift2, upper2) + cens(censored2) ~ x2 
##          mix ~ 1 + (1 | p | subject_id)
##    Data: dd_both (Number of observations: 100) 
##   Draws: 4 chains, each with iter = 1000; warmup = 0; thin = 1;
##          total post-warmup draws = 4000
## 
## Group-Level Effects: 
## ~subject_id (Number of levels: 12) 
##                                         Estimate Est.Error l-95% CI u-95% CI
## sd(mix_rt_Intercept)                        0.33      0.25     0.01     0.91
## sd(mix_rt2_Intercept)                       0.37      0.27     0.02     1.01
## cor(mix_rt_Intercept,mix_rt2_Intercept)     0.00      0.58    -0.95     0.95
##                                         Rhat Bulk_ESS Tail_ESS
## sd(mix_rt_Intercept)                    1.00     2713     1787
## sd(mix_rt2_Intercept)                   1.00     2587     2046
## cor(mix_rt_Intercept,mix_rt2_Intercept) 1.00     4331     2836
## 
## Population-Level Effects: 
##                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## rt_Intercept          0.30      0.06     0.18     0.43 1.00     4743     3085
## mix_rt_Intercept     -2.28      0.54    -3.50    -1.35 1.00     4035     1843
## rt2_Intercept         0.43      0.08     0.31     0.60 1.00     3258     1826
## mix_rt2_Intercept    -1.91      0.44    -2.80    -1.11 1.00     4456     3016
## rt_x                  0.46      0.05     0.36     0.56 1.00     5440     2978
## rt2_x2               -0.25      0.07    -0.40    -0.13 1.00     3316     2106
## 
## Family Specific Parameters: 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma_rt          0.49      0.05     0.40     0.60 1.00     4907     3014
## sigma_rt2         0.36      0.07     0.25     0.53 1.00     3449     2885
## shiftprop_rt      1.00      0.00     1.00     1.00   NA       NA       NA
## shiftprop_rt2     1.00      0.00     1.00     1.00   NA       NA       NA
## 
## Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Testing that predictions work even for multivariate models. Note that we don’t bother with censoring for rt2 so the predictions look wrong.

pp_check(fit_mix_multivar, resp = "rt", ndraws = 30, cores = 1)

pp_check(fit_mix_multivar, resp = "rt2", ndraws = 30, cores = 1)

But here we’ll also have to face possibly the biggest problem with brms: that it becomes very easy to specify a model that is too complex to be well informed by the data we have or to even build a completely broken model that no amount of data will save. The data and a few settings for the “crazy” models shown above have actually had to be tweaked for them to work well for this post. So enjoy with moderation :-).

Again, if you want the complete code, packaged in a single file you can just load into your project, go to https://github.com/martinmodrak/blog/blob/master/content/post/RTmixture.R

If you encounter problems running the models that you can’t resolve yourself, be sure to ask questions on Stan Discourse and tag me (@martinmodrak) in the question!

Original computing environment

sessionInfo()

## R version 4.1.0 (2021-05-18)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 18363)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_United States.1252 
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] bayesplot_1.8.1     patchwork_1.1.1     knitr_1.33         
##  [4] forcats_0.5.1       stringr_1.4.0       dplyr_1.0.7        
##  [7] purrr_0.3.4         readr_2.0.1         tidyr_1.1.3        
## [10] tibble_3.1.3        ggplot2_3.3.5       tidyverse_1.3.1    
## [13] brms_2.16.2         Rcpp_1.0.7          cmdstanr_0.4.0.9000
## 
## loaded via a namespace (and not attached):
##   [1] readxl_1.3.1         backports_1.2.1      plyr_1.8.6          
##   [4] igraph_1.2.6         splines_4.1.0        crosstalk_1.1.1     
##   [7] rstantools_2.1.1     inline_0.3.19        digest_0.6.27       
##  [10] htmltools_0.5.1.1    rsconnect_0.8.24     fansi_0.5.0         
##  [13] magrittr_2.0.1       checkmate_2.0.0      tzdb_0.1.2          
##  [16] modelr_0.1.8         RcppParallel_5.1.4   matrixStats_0.61.0  
##  [19] xts_0.12.1           prettyunits_1.1.1    colorspace_2.0-2    
##  [22] rvest_1.0.1          haven_2.4.3          xfun_0.25           
##  [25] callr_3.7.0          crayon_1.4.1         jsonlite_1.7.2      
##  [28] lme4_1.1-27.1        zoo_1.8-9            glue_1.4.2          
##  [31] gtable_0.3.0         V8_3.4.2             distributional_0.2.2
##  [34] pkgbuild_1.2.0       rstan_2.26.3         abind_1.4-5         
##  [37] scales_1.1.1         mvtnorm_1.1-2        DBI_1.1.1           
##  [40] miniUI_0.1.1.1       xtable_1.8-4         stats4_4.1.0        
##  [43] StanHeaders_2.26.3   DT_0.18              htmlwidgets_1.5.3   
##  [46] httr_1.4.2           threejs_0.3.3        posterior_1.0.1     
##  [49] ellipsis_0.3.2       pkgconfig_2.0.3      loo_2.4.1           
##  [52] farver_2.1.0         sass_0.4.0           dbplyr_2.1.1        
##  [55] utf8_1.2.2           labeling_0.4.2       tidyselect_1.1.1    
##  [58] rlang_0.4.11         reshape2_1.4.4       later_1.2.0         
##  [61] munsell_0.5.0        cellranger_1.1.0     tools_4.1.0         
##  [64] cli_3.0.1            generics_0.1.0       broom_0.7.9         
##  [67] ggridges_0.5.3       evaluate_0.14        fastmap_1.1.0       
##  [70] yaml_2.2.1           processx_3.5.2       fs_1.5.0            
##  [73] nlme_3.1-152         mime_0.11            projpred_2.0.2      
##  [76] xml2_1.3.2           compiler_4.1.0       shinythemes_1.2.0   
##  [79] rstudioapi_0.13      curl_4.3.2           gamm4_0.2-6         
##  [82] reprex_2.0.1         bslib_0.2.5.1        stringi_1.7.3       
##  [85] highr_0.9            ps_1.6.0             blogdown_1.5        
##  [88] Brobdingnag_1.2-6    lattice_0.20-44      Matrix_1.3-3        
##  [91] nloptr_1.2.2.2       markdown_1.1         shinyjs_2.0.0       
##  [94] tensorA_0.36.2       vctrs_0.3.8          pillar_1.6.2        
##  [97] lifecycle_1.0.0      jquerylib_0.1.4      bridgesampling_1.1-2
## [100] cowplot_1.1.1        httpuv_1.6.1         R6_2.5.0            
## [103] bookdown_0.22        promises_1.2.0.1     gridExtra_2.3       
## [106] codetools_0.2-18     boot_1.3-28          colourpicker_1.1.0  
## [109] MASS_7.3-54          gtools_3.9.2         assertthat_0.2.1    
## [112] withr_2.4.2          shinystan_2.5.0      mgcv_1.8-35         
## [115] parallel_4.1.0       hms_1.1.0            grid_4.1.0          
## [118] coda_0.19-4          minqa_1.2.4          rmarkdown_2.10      
## [121] shiny_1.6.0          lubridate_1.7.10     base64enc_0.1-3     
## [124] dygraphs_1.1.1.6

Martin Modrák's Blog