Detecting Cointegrating Relations in Non-Stationary Matrix-Valued Time Series

CFE-CM Statistics

Ivan Ricardo

Maastricht University

Alain Hecq

Maastricht University

Ines Wilms

Maastricht University

December 15, 2025

Stacked Data Structures

Univariate \((y_t)\)

\[ \begin{bmatrix} \bullet \\ \bullet \\ \bullet \end{bmatrix} \]

Multivariate \((\mathbf{y}_t)\)

\[ \begin{bmatrix} \bullet & \bullet & \bullet \\ \bullet & \bullet & \bullet \\ \bullet & \bullet & \bullet \end{bmatrix} \]

Matrix-valued \((\mathbf{Y}_t)\)

Examples of matrix-valued data:

Economic indicators for different countries over time

Different air pollutants for cities over time (Yuan et al. 2023)
Employment data over different sectors for different U.S. States over time (Samadi and Billard 2024)

Our Setting

We have \(N_1 = 3\) economic indicators (GDP, Industrial Production, Interest Rates) over \(N_2 = 4\) countries (USA, DEU, FRA, GBR)
We are interested in whether there exists cointegrating relations for the economic indicators and the countries separately

Why Matrix-valued data?

“Flattening” loses dimension-specific interpretations
Flexibility in reduced-rank structures
Allows for more parsimonious models

Main Contributions:

Distinguish cointegration relations along the dimensions of the matrix-valued time series.
Utilize information criteria to determine row and column cointegration ranks
Validate rank selection with Monte Carlo simulations and macroeconomic application

From Vector to Matrix Valued Time Series

VECM for multivariate/vector-valued time series

In the presence of cointegration, the stacked VECM is

\[ \boldsymbol{\Delta} \mathbf{y}_t = \mathbf{d} + \boldsymbol{\alpha} \boldsymbol{\beta}'\mathbf{y}_{t-1} + \sum_{j=1}^{p-1} \boldsymbol{\phi}_j \boldsymbol{\Delta} \mathbf{y}_{t-j} + \mathbf{e}_t \]

\(\mathbf{d} \in \mathbb{R}^{N}\) is the deterministic term
\(\boldsymbol{\beta} \in \mathbb{R}^{N \times r}\) is the cointegrating matrix for the economic indicators and countries
\(\boldsymbol{\alpha} \in \mathbb{R}^{N \times r}\) is the corresponding adjustment coefficient matrix
\(\boldsymbol{\phi}_j \in \mathbb{R}^{N \times N}\) is the autoregressive coefficient matrix

Matrix Error Correction Model

The Error Correction Model with matrix-valued observations is \[ \boldsymbol{\Delta} \mathbf{Y}_t = \mathbf{D} + \mathbf{U}_1 \mathbf{U}_3'\mathbf{Y}_{t-1} \mathbf{U}_4 \mathbf{U}_2' + \sum_{j=1}^{p-1} \boldsymbol{\phi}_{1,j} \boldsymbol{\Delta} \mathbf{y}_{t-j} \boldsymbol{\phi}_{2,j}' + \mathbf{E}_t \]

\[ \mathbf{E}_t \sim MVN(\mathbf{0}, \boldsymbol{\Sigma}_1, \boldsymbol{\Sigma}_2) \]

\(\mathbf{U}_3 \in \mathbb{R}^{N_1 \times r_1}\) and \(\mathbf{U}_4 \in \mathbb{R}^{N_2 \times r_2}\) are the cointegrating matrices for the rows (economic indicators) and columns (countries)
\(\mathbf{U}_1 \in \mathbb{R}^{N_1 \times r_1}\) and \(\mathbf{U}_2 \in \mathbb{R}^{N_2 \times r_2}\) are the adjustment coefficients
\(\boldsymbol{\phi}_1 \in \mathbb{R}^{N_1 \times N_1}\) and \(\boldsymbol{\phi}_2 \in \mathbb{R}^{N_2 \times N_2}\) are the matrix AR coefficients.

Matrix Error Correction Model restrictions

Vectorizing the MECM¹ yields \[ \boldsymbol{\Delta} \mathbf{y}_t = \mathbf{d} + \underbrace{(\mathbf{U}_2 \otimes \mathbf{U}_1)}_{\boldsymbol{\alpha}} \underbrace{(\mathbf{U}_4 \otimes \mathbf{U}_3)'}_{\boldsymbol{\beta}'} \mathbf{y}_{t-1} + \sum_{j=1}^{p-1} \underbrace{(\boldsymbol{\phi}_{2,j} \otimes \boldsymbol{\phi}_{1,j})}_{\boldsymbol{\phi}_j} \boldsymbol{\Delta} \mathbf{y}_{t-j} + \mathbf{e}_t \] \[ \operatorname{vec}(\mathbf{E}_t) \sim N(\operatorname{vec}(\mathbf{0}), \boldsymbol{\Sigma}_2 \otimes \boldsymbol{\Sigma}_1) \]

This form has been proposed by Li and Xiao (2024)
We complement their study by allowing for different numbers of cointegrating relations and different lag orders

Estimation and Selection of Cointegrating Ranks

Log-Likelihood

The log-likelihood (up to a constant) of the MECM(p) for fixed rank \(r_1\) and \(r_2\) is \[ \mathcal{L} (\boldsymbol{\Theta}) = -\frac{T N_2}{2} \log | \boldsymbol{\Sigma}_1 | - \frac{T N_1}{2} \log | \boldsymbol{\Sigma}_2 | - \frac{1}{2} \sum_{t=1}^T \operatorname{tr} \left(\boldsymbol{\Sigma}_1^{-1} \mathbf{E}_t \boldsymbol{\Sigma}_2^{-1} \mathbf{E}_t'\right) \] where \(\mathbf{E}_t = \boldsymbol{\Delta} \mathbf{Y}_t - \mathbf{U}_1 \mathbf{U}_3' \mathbf{Y}_{t-1} \mathbf{U}_4 \mathbf{U}_2' - \sum_{j=1}^{p-1} \boldsymbol{\phi}_{1,j} \boldsymbol{\Delta} \mathbf{Y}_{t-j} \boldsymbol{\phi}_{2,j}' - \mathbf{D}\) and \(\boldsymbol{\Theta}\) collects all parameters. This can be solved using gradient descent.

Algorithm Details

Selection of Cointegration ranks

In practice, the ranks \(r_1\) and \(r_2\) are unknown and must be selected. We use Akaike Information Criteria (AIC) and Bayesian Information Criteria (BIC). \[ \text{AIC}(r_1, r_2, p) = -2 \mathcal{L}(\widehat{\boldsymbol{\Theta}}) + 2 \psi(r_1, r_2, p) \\ \text{BIC}(r_1, r_2, p) = -2 \mathcal{L}(\widehat{\boldsymbol{\Theta}}) + \ln(T) \psi(r_1, r_2, p) \]

where \[ \psi(r_1, r_2, p) = r_1 (2 N_1 - r_1) + r_2 (2 N_2 - r_2) + (p-1) (N_1^2 + N_2^2) \] is the number of parameters to estimate.

Simulation Study

First Simulation Study

Selecting ranks with AIC and BIC
DGP is the MECM with 0 lags

Setting			\(N_1 \times N_2 = 3 \times 4\)
Fully Reduced			\(\mathbf{r} = (1,1)\)
Partially Reduced (1st Dimension)			\(\mathbf{r} = (1,4)\)
Partially Reduced (2nd Dimension)			\(\mathbf{r} = (3,1)\)
No Rank Reduction			\(\mathbf{r} = (3,4)\)

Fully reduced

True rank of \(\mathbf{r} = (1,1)\)

Method¹	Average Rank	Standard Deviation	Freq. Correct
AIC (100)	(1.02, 1.00)	(0.16, 0.00)	(0.98, 1.00)
BIC (100)	(1.02, 1.00)	(0.13, 0.00)	(0.98, 1.00)
AIC (250)	(1.01, 1.31)	(0.08, 0.00)	(0.99, 1.00)
BIC (250)	(1.01, 1.00)	(0.04, 0.00)	(0.99, 1.00)

Reduced Rank in First Dimension

True rank of \(\mathbf{r} = (1,4)\)

Method	Average Rank	Standard Deviation	Freq. Correct
AIC (100)	(1.13, 3.99)	(0.34, 0.03)	(0.87, 0.99)
BIC (100)	(1.06, 3.99)	(0.24, 0.03)	(0.94, 0.99)
AIC (250)	(1.03, 4.00)	(0.18, 0.00)	(0.97, 1.00)
BIC (250)	(1.02, 4.00)	(0.14, 0.00)	(0.98, 1.00)

Reduced Rank in Second Dimension

True rank of \(\mathbf{r} = (3,1)\)

Method	Average Rank	Standard Deviation	Freq. Correct
AIC (100)	(3.00, 1.07)	(0.00, 0.26)	(1.00, 0.93)
BIC (100)	(3.00, 1.01)	(0.00, 0.04)	(1.00, 0.99)
AIC (250)	(3.00, 1.03)	(0.00, 0.17)	(1.00, 0.97)
BIC (250)	(3.00, 1.00)	(0.00, 0.00)	(1.00, 1.00)

No rank reduction

True rank of \(\mathbf{r} = (3,4)\)

Method	Average Rank	Standard Deviation	Freq. Correct
AIC (100)	(3.00, 4.00)	(0.00, 0.00)	(1.00, 1.00)
BIC (100)	(3.00, 4.00)	(0.00, 0.00)	(1.00, 1.00)
AIC (250)	(3.00, 4.00)	(0.00, 0.00)	(1.00, 1.00)
BIC (250)	(3.00, 4.00)	(0.00, 0.00)	(1.00, 1.00)

Second Simulation Study

DGP is the MECM with 1 lag

Setting			\(N_1 \times N_2 = 3 \times 4\)
Fully Reduced			\(\mathbf{r} = (1,1)\)
Partially Reduced (1st Dimension)			\(\mathbf{r} = (1,4)\)
Partially Reduced (2nd Dimension)			\(\mathbf{r} = (3,1)\)
No Rank Reduction			\(\mathbf{r} = (3,4)\)

Fully reduced

True rank of \(\mathbf{r} = (1,1)\)

Method	Average Rank	Standard Deviation	Freq. Correct
AIC (100)	(1.06, 1.03)	(0.23, 0.21)	(0.94, 0.97)
BIC (100)	(1.01, 1.00)	(0.08, 0.00)	(0.99, 1.00)
AIC (250)	(1.06, 1.02)	(0.23, 0.15)	(0.94, 0.98)
BIC (250)	(1.01, 1.00)	(0.08, 0.00)	(0.99, 1.00)

Reduced Rank in First Dimension

True rank of \(\mathbf{r} = (1,4)\)

Method	Average Rank	Standard Deviation	Freq. Correct
AIC (100)	(1.01, 4.00)	(0.05, 0.00)	(0.99, 1.00)
BIC (100)	(1.01, 4.00)	(0.03, 0.00)	(0.99, 1.00)
AIC (250)	(1.00, 4.00)	(0.00, 0.00)	(1.00, 1.00)
BIC (250)	(1.00, 4.00)	(0.00, 0.00)	(1.00, 1.00)

Reduced Rank in Second Dimension

True rank of \(\mathbf{r} = (3,1)\)

Method	Average Rank	Standard Deviation	Freq. Correct
AIC (100)	(3.00, 1.23)	(0.00, 0.59)	(1.00, 0.86)
BIC (100)	(3.00, 1.01)	(0.00, 0.08)	(1.00, 0.99)
AIC (250)	(3.00, 1.10)	(0.00, 0.40)	(1.00, 0.93)
BIC (250)	(3.00, 1.01)	(0.00, 0.03)	(1.00, 0.99)

No rank reduction

True rank of \(\mathbf{r} = (3,4)\)

Method	Average Rank	Standard Deviation	Freq. Correct
AIC (100)	(3.00, 4.00)	(0.00, 0.00)	(1.00, 1.00)
BIC (100)	(3.00, 4.00)	(0.00, 0.00)	(1.00, 1.00)
AIC (250)	(3.00, 4.00)	(0.00, 0.00)	(1.00, 1.00)
BIC (250)	(3.00, 4.00)	(0.00, 0.00)	(1.00, 1.00)

Empirical Application

Macroeconomic Indicators for various countries

We take log of real GDP, PROD, levels of long-term interest rates
AIC and BIC select a rank of \((1,1)\) with one lag
One cointegrating relation between the countries and economic indicators

Estimated Coefficient Values

Row-wise coefficients

Indicator	\(\widehat{\mathbf{U}}_1\)	\(\widehat{\mathbf{U}}_3\)
GDP	-0.084	1.000
PROD	-0.201	-0.099
IR	-7.843	-0.021

Column-wise coefficients

Country	\(\widehat{\mathbf{U}}_2\)	\(\widehat{\mathbf{U}}_4\)
USA	0.088	1.000
DEU	0.774	0.055
FRA	1.103	-1.092
GBR	0.674	-0.196

Each \(\widehat{\mathbf{U}}_1\) and \(\widehat{\mathbf{U}}_2\) entry shows how strongly a row (indicator) or column (country) responds to deviations from the long-run equilibrium
\(\widehat{\mathbf{U}}_3\) (indicators) and \(\widehat{\mathbf{U}}_4\) (countries) capture the linear combination of variables that form the long-run equilibrium

Cointegrating relation

The cointegrated series \((\widehat{\mathbf{U}}_4 \otimes \widehat{\mathbf{U}}_3)' \operatorname{vec}(\mathbf{Y}_{t-1})\) shows two spikes at the beginning of the 90s but stays stable around the mean value of -0.99

Conclusion

Investigated the matrix error-correction model for cointegration separately across row and column dimensions of matrix time series
Proposed AIC/BIC for joint rank and lag selection and showed finite sample performance across simulation settings
Applied the approach to international macroeconomic indicators, finding one stable long-run relation across both indicators and countries

Link to Paper

Github

References

Li, Zebang, and Han Xiao. 2024. “Cointegrated Matrix Autoregression Models.” arXiv Preprint arXiv:2409.10860.

Samadi, S. Yaser, and Lynne Billard. 2024. “On a Matrix-Valued Autoregressive Model.” Journal of Time Series Analysis. https://onlinelibrary.wiley.com/doi/abs/10.1111/jtsa.12748.

Van Loan, Charles F. 2000. “The Ubiquitous Kronecker Product.” Journal of Computational and Applied Mathematics 123 (1-2): 85–100.

Yuan, Chaofeng, Zhigen Gao, Xuming He, Wei Huang, and Jianhua Guo. 2023. “Two-way dynamic factor models for high-dimensional matrix-valued time series.” Journal of the Royal Statistical Society Series B: Statistical Methodology 85 (5): 1517–37. https://doi.org/10.1093/jrsssb/qkad077.

Appendix

Pseudo-Code for the Gradient Descent Algorithm

Initialize parameters using Nearest Kronecker Product (Van Loan 2000)
Cycle through parameter blocks (block coordinate updates): \(\mathbf{D}, \mathbf{U}_1, \mathbf{U}_2, \mathbf{U}_3, \mathbf{U}_4, \boldsymbol{\Sigma}_1, \boldsymbol{\Sigma}_2, \boldsymbol{\phi}_1, \boldsymbol{\phi}_2\)
Each block is updated by taking a step in the negative gradient direction with its own step size (calculated using the Lipschitz constant)
Certain matrices are renormalized after each update
Continue until loss improvement falls below the tolerance threshold

Back to main