Cholesky factors of covariance and correlation matrices in stan

4 minute read

Published:

In this blog, I would like to give a quick overview of different types of matrices and their transformations (e.g., Cholesky decomposition) in stan. These matrices include variance-covaiance matrix ($\Sigma$), correlation matrix ($R$), Cholesky factor of covariance matrix ($L$) and Cholesky factor of correlation matrix ($L_{corr}$). Before we move to Cholesky decomposition, it is good to know the relationship between variance-covaiance matrix ($\Sigma$) and correlation matrix ($R$). Simply put, we can rewrite $\Sigma$ as the product of a diagonal matrix ($\sigma$) and a correlation matrix $(R)$ in the following way. To achieve this, you can use the quad_form_diag(R, sigma) function in stan.

\[\Sigma = \underbrace{\left(\begin{array}{c}\sigma_1 ~~ 0 ~~ 0 \\ 0 ~~ \sigma_2 ~~ 0 \\ 0 ~~ 0 ~~ \sigma_3 \end{array}\right)}_{diag(\sigma)} \underbrace{\left(\begin{array}{c}1 ~~~ \rho_{12} ~~~ \rho_{13} \\ \rho_{12} ~~~ 1 ~~~ \rho_{23} \\ \rho_{13} ~~~ \rho_{23} ~~~ 1 \end{array}\right)}_{R} \underbrace{\left(\begin{array}{c}\sigma_1 ~~ 0 ~~ 0 \\ 0 ~~ \sigma_2 ~~ 0 \\ 0 ~~ 0 ~~ \sigma_3 \end{array}\right)}_{diag(\sigma)}\]
cov_matrix[K] Sigma = quad_form_diag(R, sigma); // variance-covariance matrix Sigma
corr_matrix[K] R; // correlation matrix R

It is important to know that we can decompose a symmetric and positive semi-definite matrix into the product of a lower triangular matrix (called Cholesky factor) and its transpose via Cholesky decomposition. As illustrated below, we can apply Cholesky decomposition either to the variance-covaiance matrix ($\Sigma$), or to the correlation matrix ($R$). To distinguish between these two matrices, you just need to remember that the correlation matrix has $1$ whereas the variance-covariance matrix has $\sigma^2$ at the main diagonal.

\[\begin{align*} \Sigma &= \left(\begin{array}{c}\sigma_{1}^2 ~~~~~~ \rho_{12}\sigma_{1}\sigma_{2} ~~~~~~ \rho_{13}\sigma_1\sigma_{3} \\ \rho_{12}\sigma_{1}\sigma_{2} ~~~~~~ \sigma_2^2 ~~~~~~ \rho_{23}\sigma_2\sigma_3 \\ \rho_{13}\sigma_1\sigma_{3} ~~~~~~ \rho_{23}\sigma_2\sigma_{3} ~~~~~~ \sigma_3^2 \end{array}\right) \\ &= \underbrace{\begin{pmatrix}L_{11}&0&0\\L_{21}&L_{22}&0\\L_{31}&L_{32}&L_{33}\\\end{pmatrix}}_{L} \underbrace{\begin{pmatrix}L_{11}&L_{21}&L_{31}\\0&L_{22}&L_{32}\\0&0&L_{33}\end{pmatrix}}_{L^T} \end{align*}\] \[\begin{align*} R &= \left(\begin{array}{c}1 ~~~ \rho_{12} ~~~ \rho_{13} \\ \rho_{12} ~~~ 1 ~~~ \rho_{23} \\ \rho_{13} ~~~ \rho_{23} ~~~ 1 \end{array}\right) \\ &= \underbrace{\begin{pmatrix}L_{11}&0&0\\L_{21}&L_{22}&0\\L_{31}&L_{32}&L_{33}\\\end{pmatrix}}_{L_{corr}} \underbrace{\begin{pmatrix}L_{11}&L_{21}&L_{31}\\0&L_{22}&L_{32}\\0&0&L_{33}\end{pmatrix}}_{L_{corr}^T} \end{align*}\]

This gives us two different lower triangular matrices, $L$ and $L_{corr}$. In stan, these two matrices are defined as cholesky_factor_cov and cholesky_factor_corr, respectively.

cholesky_factor_cov[K] L; // cholesky factor of variance-covariance matrix
cholesky_factor_corr[K] L_corr; // cholesky factor of correlation matrix

Hopefully, it is clear that $L$ and $L_{corr}$ are just the cholesky factors of $\Sigma$ and $R$. Then you may wonder the relationship between the two cholesky factors. If you have a closer look at the equation (1) above, you may find that these two cholesky factors are indeed related.

\[\begin{align*} \Sigma &= \underbrace{\left(\begin{array}{c}\sigma_1 ~~ 0 ~~ 0 \\ 0 ~~ \sigma_2 ~~ 0 \\ 0 ~~ 0 ~~ \sigma_3 \end{array}\right)}_{diag(\sigma)} \underbrace{\left(\begin{array}{c}1 ~~~ \rho_{12} ~~~ \rho_{13} \\ \rho_{12} ~~~ 1 ~~~ \rho_{23} \\ \rho_{13} ~~~ \rho_{23} ~~~ 1 \end{array}\right)}_{R} \underbrace{\left(\begin{array}{c}\sigma_1 ~~ 0 ~~ 0 \\ 0 ~~ \sigma_2 ~~ 0 \\ 0 ~~ 0 ~~ \sigma_3 \end{array}\right)}_{diag(\sigma)} \\ &= \underbrace{\underbrace{\left(\begin{array}{c}\sigma_1 ~~ 0 ~~ 0 \\ 0 ~~ \sigma_2 ~~ 0 \\ 0 ~~ 0 ~~ \sigma_3 \end{array}\right)}_{diag(\sigma)} \underbrace{\begin{pmatrix}L_{11}&0&0\\L_{21}&L_{22}&0\\L_{31}&L_{32}&L_{33}\\\end{pmatrix}}_{L_{corr}}}_{L} \underbrace{\underbrace{\begin{pmatrix}L_{11}&L_{21}&L_{31}\\0&L_{22}&L_{32}\\0&0&L_{33}\end{pmatrix}}_{L_{corr}^T} \underbrace{\left(\begin{array}{c}\sigma_1 ~~ 0 ~~ 0 \\ 0 ~~ \sigma_2 ~~ 0 \\ 0 ~~ 0 ~~ \sigma_3 \end{array}\right)}_{diag(\sigma)}}_{L^T} \end{align*}\]

The cholesky factor of variance-covariance matrix $L$ can be expressed as the product of the diagonal matrix of $\sigma$ and the cholesky factor of correlation matrix $L_{corr}$, or more formally as $L = diag(\sigma)*L_{corr}$. There is a convenient function diag_pre_multiply in stan for this.

cholesky_factor_cov[K] L = diag_pre_multiply(sigma, L_corr) // cholesky factor L = diag(sig) * L_corr
cholesky_factor_cov[K] L = diag_matrix(sigma) * L_corr // alternatively 

More concretely, I will use an example of multivariate normal ($MVN$) distribution to illustrate different ways of parameterizations. Pls refer to this blog to better understand the concepts of $MVN$ distribution and Cholesky decomposition. Here I simply include the important code chunks for simplicity.

  • Method 1: decompose $\Sigma = \text{diag}(\sigma)*L_{corr} * L_{corr}^{T} * \text{diag}(\sigma)$
parameters {
  vector[K] mu;
  cholesky_factor_corr[K] Lcorr; // cholesky factor for R
  vector<lower=0>[K] sigma; 
}
transformed parameters{
  corr_matrix[K] R; // correlation matrix
  cov_matrix[K] Sigma; // VCV matrix
  R = multiply_lower_tri_self_transpose(Lcorr);
  Sigma = quad_form_diag(R, sigma); // quad_form_diag: diag_matrix(sig) * R * diag_matrix(sig)
}
model {
  y ~ multi_normal(mu, Sigma);
}
  • Method 2: decompose $\Sigma = L*L^{T}$ and use multi_normal_cholesky
parameters {
  vector[K] mu;
  cholesky_factor_corr[K] Lcorr; // cholesky factor for R
  vector<lower=0>[K] sigma; 
}
transformed parameters{
	cholesky_factor_cov[K] L = diag_pre_multiply(sigma, Lcorr); // cholesky factor for Sigma
}
model {
  y ~ multi_normal_cholesky(mu, L);
}

Based on my own experience, the second method is much more compact and efficient than the first one. If you have many data points, it is best to choose the second approach to factorize the variance-covariance matrix via Cholesky decomposition.

Useful links: