Detecting Cointegrating Relations in Non-Stationary Matrix-Valued Time Series

CFE-CM Statistics

Ivan Ricardo

Maastricht University

Alain Hecq

Maastricht University

Ines Wilms

Maastricht University

December 15, 2025

Stacked Data Structures

Univariate \((y_t)\)

\[ \begin{bmatrix} \bullet \\ \bullet \\ \bullet \end{bmatrix} \]

Multivariate \((\mathbf{y}_t)\)

\[ \begin{bmatrix} \bullet & \bullet & \bullet \\ \bullet & \bullet & \bullet \\ \bullet & \bullet & \bullet \end{bmatrix} \]

Matrix-valued \((\mathbf{Y}_t)\)

Examples of matrix-valued data:

    • Economic indicators for different countries over time

Our Setting

  • We have \(N_1 = 3\) economic indicators (GDP, Industrial Production, Interest Rates) over \(N_2 = 4\) countries (USA, DEU, FRA, GBR)
  • We are interested in whether there exists cointegrating relations for the economic indicators and the countries separately

Why Matrix-valued data?

  • “Flattening” loses dimension-specific interpretations
  • Flexibility in reduced-rank structures
  • Allows for more parsimonious models

Main Contributions:

  • Distinguish cointegration relations along the dimensions of the matrix-valued time series.
  • Utilize information criteria to determine row and column cointegration ranks
  • Validate rank selection with Monte Carlo simulations and macroeconomic application

From Vector to Matrix Valued Time Series

VECM for multivariate/vector-valued time series

In the presence of cointegration, the stacked VECM is

\[ \boldsymbol{\Delta} \mathbf{y}_t = \mathbf{d} + \boldsymbol{\alpha} \boldsymbol{\beta}'\mathbf{y}_{t-1} + \sum_{j=1}^{p-1} \boldsymbol{\phi}_j \boldsymbol{\Delta} \mathbf{y}_{t-j} + \mathbf{e}_t \]

  • \(\mathbf{d} \in \mathbb{R}^{N}\) is the deterministic term
  • \(\boldsymbol{\beta} \in \mathbb{R}^{N \times r}\) is the cointegrating matrix for the economic indicators and countries
  • \(\boldsymbol{\alpha} \in \mathbb{R}^{N \times r}\) is the corresponding adjustment coefficient matrix
  • \(\boldsymbol{\phi}_j \in \mathbb{R}^{N \times N}\) is the autoregressive coefficient matrix

Matrix Error Correction Model

The Error Correction Model with matrix-valued observations is \[ \boldsymbol{\Delta} \mathbf{Y}_t = \mathbf{D} + \mathbf{U}_1 \mathbf{U}_3'\mathbf{Y}_{t-1} \mathbf{U}_4 \mathbf{U}_2' + \sum_{j=1}^{p-1} \boldsymbol{\phi}_{1,j} \boldsymbol{\Delta} \mathbf{y}_{t-j} \boldsymbol{\phi}_{2,j}' + \mathbf{E}_t \]

\[ \mathbf{E}_t \sim MVN(\mathbf{0}, \boldsymbol{\Sigma}_1, \boldsymbol{\Sigma}_2) \]

  • \(\mathbf{U}_3 \in \mathbb{R}^{N_1 \times r_1}\) and \(\mathbf{U}_4 \in \mathbb{R}^{N_2 \times r_2}\) are the cointegrating matrices for the rows (economic indicators) and columns (countries)
  • \(\mathbf{U}_1 \in \mathbb{R}^{N_1 \times r_1}\) and \(\mathbf{U}_2 \in \mathbb{R}^{N_2 \times r_2}\) are the adjustment coefficients
  • \(\boldsymbol{\phi}_1 \in \mathbb{R}^{N_1 \times N_1}\) and \(\boldsymbol{\phi}_2 \in \mathbb{R}^{N_2 \times N_2}\) are the matrix AR coefficients.

Matrix Error Correction Model restrictions

Vectorizing the MECM1 yields \[ \boldsymbol{\Delta} \mathbf{y}_t = \mathbf{d} + \underbrace{(\mathbf{U}_2 \otimes \mathbf{U}_1)}_{\boldsymbol{\alpha}} \underbrace{(\mathbf{U}_4 \otimes \mathbf{U}_3)'}_{\boldsymbol{\beta}'} \mathbf{y}_{t-1} + \sum_{j=1}^{p-1} \underbrace{(\boldsymbol{\phi}_{2,j} \otimes \boldsymbol{\phi}_{1,j})}_{\boldsymbol{\phi}_j} \boldsymbol{\Delta} \mathbf{y}_{t-j} + \mathbf{e}_t \] \[ \operatorname{vec}(\mathbf{E}_t) \sim N(\operatorname{vec}(\mathbf{0}), \boldsymbol{\Sigma}_2 \otimes \boldsymbol{\Sigma}_1) \]

  • This form has been proposed by Li and Xiao (2024)
  • We complement their study by allowing for different numbers of cointegrating relations and different lag orders

Estimation and Selection of Cointegrating Ranks

Log-Likelihood

The log-likelihood (up to a constant) of the MECM(p) for fixed rank \(r_1\) and \(r_2\) is \[ \mathcal{L} (\boldsymbol{\Theta}) = -\frac{T N_2}{2} \log | \boldsymbol{\Sigma}_1 | - \frac{T N_1}{2} \log | \boldsymbol{\Sigma}_2 | - \frac{1}{2} \sum_{t=1}^T \operatorname{tr} \left(\boldsymbol{\Sigma}_1^{-1} \mathbf{E}_t \boldsymbol{\Sigma}_2^{-1} \mathbf{E}_t'\right) \] where \(\mathbf{E}_t = \boldsymbol{\Delta} \mathbf{Y}_t - \mathbf{U}_1 \mathbf{U}_3' \mathbf{Y}_{t-1} \mathbf{U}_4 \mathbf{U}_2' - \sum_{j=1}^{p-1} \boldsymbol{\phi}_{1,j} \boldsymbol{\Delta} \mathbf{Y}_{t-j} \boldsymbol{\phi}_{2,j}' - \mathbf{D}\) and \(\boldsymbol{\Theta}\) collects all parameters. This can be solved using gradient descent.

Algorithm Details

Selection of Cointegration ranks

In practice, the ranks \(r_1\) and \(r_2\) are unknown and must be selected. We use Akaike Information Criteria (AIC) and Bayesian Information Criteria (BIC). \[ \text{AIC}(r_1, r_2, p) = -2 \mathcal{L}(\widehat{\boldsymbol{\Theta}}) + 2 \psi(r_1, r_2, p) \\ \text{BIC}(r_1, r_2, p) = -2 \mathcal{L}(\widehat{\boldsymbol{\Theta}}) + \ln(T) \psi(r_1, r_2, p) \]

where \[ \psi(r_1, r_2, p) = r_1 (2 N_1 - r_1) + r_2 (2 N_2 - r_2) + (p-1) (N_1^2 + N_2^2) \] is the number of parameters to estimate.

Simulation Study

First Simulation Study

  • Selecting ranks with AIC and BIC
  • DGP is the MECM with 0 lags
Setting \(N_1 \times N_2 = 3 \times 4\)
Fully Reduced \(\mathbf{r} = (1,1)\)
Partially Reduced (1st Dimension) \(\mathbf{r} = (1,4)\)
Partially Reduced (2nd Dimension) \(\mathbf{r} = (3,1)\)
No Rank Reduction \(\mathbf{r} = (3,4)\)

Fully reduced

True rank of \(\mathbf{r} = (1,1)\)

Method1 Average Rank Standard Deviation Freq. Correct
AIC (100) (1.02, 1.00) (0.16, 0.00) (0.98, 1.00)
BIC (100) (1.02, 1.00) (0.13, 0.00) (0.98, 1.00)
AIC (250) (1.01, 1.31) (0.08, 0.00) (0.99, 1.00)
BIC (250) (1.01, 1.00) (0.04, 0.00) (0.99, 1.00)

Reduced Rank in First Dimension

True rank of \(\mathbf{r} = (1,4)\)

Method Average Rank Standard Deviation Freq. Correct
AIC (100) (1.13, 3.99) (0.34, 0.03) (0.87, 0.99)
BIC (100) (1.06, 3.99) (0.24, 0.03) (0.94, 0.99)
AIC (250) (1.03, 4.00) (0.18, 0.00) (0.97, 1.00)
BIC (250) (1.02, 4.00) (0.14, 0.00) (0.98, 1.00)

Reduced Rank in Second Dimension

True rank of \(\mathbf{r} = (3,1)\)

Method Average Rank Standard Deviation Freq. Correct
AIC (100) (3.00, 1.07) (0.00, 0.26) (1.00, 0.93)
BIC (100) (3.00, 1.01) (0.00, 0.04) (1.00, 0.99)
AIC (250) (3.00, 1.03) (0.00, 0.17) (1.00, 0.97)
BIC (250) (3.00, 1.00) (0.00, 0.00) (1.00, 1.00)

No rank reduction

True rank of \(\mathbf{r} = (3,4)\)

Method Average Rank Standard Deviation Freq. Correct
AIC (100) (3.00, 4.00) (0.00, 0.00) (1.00, 1.00)
BIC (100) (3.00, 4.00) (0.00, 0.00) (1.00, 1.00)
AIC (250) (3.00, 4.00) (0.00, 0.00) (1.00, 1.00)
BIC (250) (3.00, 4.00) (0.00, 0.00) (1.00, 1.00)

Second Simulation Study

  • DGP is the MECM with 1 lag
Setting \(N_1 \times N_2 = 3 \times 4\)
Fully Reduced \(\mathbf{r} = (1,1)\)
Partially Reduced (1st Dimension) \(\mathbf{r} = (1,4)\)
Partially Reduced (2nd Dimension) \(\mathbf{r} = (3,1)\)
No Rank Reduction \(\mathbf{r} = (3,4)\)

Fully reduced

True rank of \(\mathbf{r} = (1,1)\)

Method Average Rank Standard Deviation Freq. Correct
AIC (100) (1.06, 1.03) (0.23, 0.21) (0.94, 0.97)
BIC (100) (1.01, 1.00) (0.08, 0.00) (0.99, 1.00)
AIC (250) (1.06, 1.02) (0.23, 0.15) (0.94, 0.98)
BIC (250) (1.01, 1.00) (0.08, 0.00) (0.99, 1.00)

Reduced Rank in First Dimension

True rank of \(\mathbf{r} = (1,4)\)

Method Average Rank Standard Deviation Freq. Correct
AIC (100) (1.01, 4.00) (0.05, 0.00) (0.99, 1.00)
BIC (100) (1.01, 4.00) (0.03, 0.00) (0.99, 1.00)
AIC (250) (1.00, 4.00) (0.00, 0.00) (1.00, 1.00)
BIC (250) (1.00, 4.00) (0.00, 0.00) (1.00, 1.00)

Reduced Rank in Second Dimension

True rank of \(\mathbf{r} = (3,1)\)

Method Average Rank Standard Deviation Freq. Correct
AIC (100) (3.00, 1.23) (0.00, 0.59) (1.00, 0.86)
BIC (100) (3.00, 1.01) (0.00, 0.08) (1.00, 0.99)
AIC (250) (3.00, 1.10) (0.00, 0.40) (1.00, 0.93)
BIC (250) (3.00, 1.01) (0.00, 0.03) (1.00, 0.99)

No rank reduction

True rank of \(\mathbf{r} = (3,4)\)

Method Average Rank Standard Deviation Freq. Correct
AIC (100) (3.00, 4.00) (0.00, 0.00) (1.00, 1.00)
BIC (100) (3.00, 4.00) (0.00, 0.00) (1.00, 1.00)
AIC (250) (3.00, 4.00) (0.00, 0.00) (1.00, 1.00)
BIC (250) (3.00, 4.00) (0.00, 0.00) (1.00, 1.00)

Empirical Application

Macroeconomic Indicators for various countries

  • We take log of real GDP, PROD, levels of long-term interest rates
  • AIC and BIC select a rank of \((1,1)\) with one lag
  • One cointegrating relation between the countries and economic indicators

Estimated Coefficient Values

Row-wise coefficients

Indicator \(\widehat{\mathbf{U}}_1\) \(\widehat{\mathbf{U}}_3\)
GDP -0.084 1.000
PROD -0.201 -0.099
IR -7.843 -0.021

Column-wise coefficients

Country \(\widehat{\mathbf{U}}_2\) \(\widehat{\mathbf{U}}_4\)
USA 0.088 1.000
DEU 0.774 0.055
FRA 1.103 -1.092
GBR 0.674 -0.196
  • Each \(\widehat{\mathbf{U}}_1\) and \(\widehat{\mathbf{U}}_2\) entry shows how strongly a row (indicator) or column (country) responds to deviations from the long-run equilibrium
  • \(\widehat{\mathbf{U}}_3\) (indicators) and \(\widehat{\mathbf{U}}_4\) (countries) capture the linear combination of variables that form the long-run equilibrium

Cointegrating relation

  • The cointegrated series \((\widehat{\mathbf{U}}_4 \otimes \widehat{\mathbf{U}}_3)' \operatorname{vec}(\mathbf{Y}_{t-1})\) shows two spikes at the beginning of the 90s but stays stable around the mean value of -0.99

Conclusion

Conclusion

  • Investigated the matrix error-correction model for cointegration separately across row and column dimensions of matrix time series
  • Proposed AIC/BIC for joint rank and lag selection and showed finite sample performance across simulation settings
  • Applied the approach to international macroeconomic indicators, finding one stable long-run relation across both indicators and countries

Github

References

Li, Zebang, and Han Xiao. 2024. “Cointegrated Matrix Autoregression Models.” arXiv Preprint arXiv:2409.10860.
Samadi, S. Yaser, and Lynne Billard. 2024. “On a Matrix-Valued Autoregressive Model.” Journal of Time Series Analysis. https://onlinelibrary.wiley.com/doi/abs/10.1111/jtsa.12748.
Van Loan, Charles F. 2000. “The Ubiquitous Kronecker Product.” Journal of Computational and Applied Mathematics 123 (1-2): 85–100.
Yuan, Chaofeng, Zhigen Gao, Xuming He, Wei Huang, and Jianhua Guo. 2023. Two-way dynamic factor models for high-dimensional matrix-valued time series.” Journal of the Royal Statistical Society Series B: Statistical Methodology 85 (5): 1517–37. https://doi.org/10.1093/jrsssb/qkad077.

Appendix

Pseudo-Code for the Gradient Descent Algorithm

  • Initialize parameters using Nearest Kronecker Product (Van Loan 2000)
  • Cycle through parameter blocks (block coordinate updates): \(\mathbf{D}, \mathbf{U}_1, \mathbf{U}_2, \mathbf{U}_3, \mathbf{U}_4, \boldsymbol{\Sigma}_1, \boldsymbol{\Sigma}_2, \boldsymbol{\phi}_1, \boldsymbol{\phi}_2\)
  • Each block is updated by taking a step in the negative gradient direction with its own step size (calculated using the Lipschitz constant)
  • Certain matrices are renormalized after each update
  • Continue until loss improvement falls below the tolerance threshold

Back to main