Checkpoints in stan via chkptstanr

4 minute read

Published:

In this blog, I will introduce a new R package chkptstanr, which allows checkpoints in stan. This feature is useful when you want to cache the model outputs by certain time steps or resume a long run at specific points due to an interruption. In general, this package can be considered as a wrapper for cmdstan, and thus you need to download and install cmdstan on your machine. Here are the step-by-step instruction on how you can deploy the environment and run the stan model via chkptstanr.

Checkpoints in stan via chkptstanr

1. Install cmdstan

The cmdstan is the shell interface to stan and can be integrated well with the command-line interface. It also supports GPU computation via OpenCL and multi-threading (see the latest version of cmdstan user’s guide). You can install the cmdstan by the following commands.

# download cmdstan-2.29.2
wget https://github.com/stan-dev/cmdstan/releases/download/v2.29.2/cmdstan-2.29.2.tar.gz
tar -xf cmdstan-2.29.2.tar.gz

cd cmdstan-2.29.2
make build

If you are interested in setting up cmdstan with GPU support, I am including my configuration for the Snowy cluster.

# load the necessary packages
module use /sw/EasyBuild/snowy/modules/all/
module load intelcuda/2019b
module load gcc/9.3.0
module load R/4.1.1
module load R_packages/4.1.1

# set the local configuration
cd cmdstan-2.29.2
echo "STAN_OPENCL=TRUE" > make/local
make build

Let’s check the configuration by running an example stan model (bernoulli) inside the cmdstan folder.

cd cmdstan-2.29.2
ls example/bernoulli
# bernoulli.data.R bernoulli.data.json  bernoulli.stan

# Note: compile the model without the suffix of .stan
make example/bernoulli/bernoulli
examples/bernoulli/bernoulli sample data \
file=examples/bernoulli/bernoulli.data.json output file=output.csv

2. Run stan model via chkptstanr

Now we have cmdstan installed on your computer and we can try to run a stan model via chkptstanr. Before that we need to load the necessary packages and specify the cmdstan path. Here I am using a version from github.

library(cmdstanr)
# library(devtools)
# install_github("donaldRwilliams/chkptstanr")
library(chkptstanr)
library(bayesplot)
set_cmdstan_path("../cmdstan-2.29.2")

(1) Model

For simplicity, I am using a stan model from the chkptstanr package by adjusting the model with the new array syntax, and you can of course try your own stan model.

stan_code <- "
data {
  int<lower=0> n;
  array[n] real y; 
  array[n] real<lower=0> sigma; 
}
parameters {
  real mu;
  real<lower=0> tau; 
  vector[n] eta; 
}
transformed parameters {
  vector[n] theta; 
  theta = mu + tau * eta; 
}
model {
  target += normal_lpdf(eta | 0, 1); 
  target += normal_lpdf(y | theta, sigma);  
}
"

(2) Data

stan_data <- list(n = 8,
                  y = c(28,  8, -3,  7, -1,  1, 18, 12),
                  sigma = c(15, 10, 16, 11,  9, 11, 10, 18))

(3) Run the analysis

It is worth noting that when you first run the model, you often need to create a folder via create_folder. But if you want to resume a previous run, you just need to specify the path to previously created folder. To achieve this, we include a conditional statement to check the existence of the folder.

# important: check for the existence of folder
name_folder = "stan_chkpt_m1"
if(dir.exists(name_folder)){
  path = paste0("./", name_folder)
}else{
  path = create_folder(folder_name = name_folder)
}
# important: specify the stan_code_path
stan_code_path = paste0(path, "/stan_model/model.stan") 
fit_m1 <- chkpt_stan(model_code = stan_code, data = stan_data, 
                     iter_warmup = 1000, iter_sampling = 2000,
                     iter_per_chkpt = 250, path = path)

You can directly stop the running model or specify a longer sampling iteration. Normally, chkpt_stan will resume the model from the cached checkpoints, no matter whether they occur at the warm-up or sampling stage. Another advantage is that if the checkpointing is completed, the model won’t be executed even though you run the same command once again. If you really want to rerun the model, you can delete the cached folder by using the unlink or system function.

# remove the cached folder
unlink(path, recursive = T)
# system(command = paste0("rm -r ", path))

It is worth noting that the current version of chkptstanr does not allow you to resize the warm-up iterations or change the iteration per checkpoint between two runs, since they may cause some difficulties to combine different checkpoint draws.

(4) Visualization

draws <- combine_chkpt_draws(fit_m1)
bayesplot::mcmc_trace(draws, regex_pars = c("mu", "tau", "eta")) +
geom_vline(xintercept = seq(0, 2000, 250), alpha = 0.25, size = 2)

Traceplot of MCMC draws. The gray bins indicate the checkpoints.

Useful links:

-https://github.com/stan-dev/cmdstan/releases