CFE-CM Statistics
December 15, 2025
\[ \begin{bmatrix} \bullet \\ \bullet \\ \bullet \end{bmatrix} \]
\[ \begin{bmatrix} \bullet & \bullet & \bullet \\ \bullet & \bullet & \bullet \\ \bullet & \bullet & \bullet \end{bmatrix} \]
Examples of matrix-valued data:
In the presence of cointegration, the stacked VECM is
\[ \boldsymbol{\Delta} \mathbf{y}_t = \mathbf{d} + \boldsymbol{\alpha} \boldsymbol{\beta}'\mathbf{y}_{t-1} + \sum_{j=1}^{p-1} \boldsymbol{\phi}_j \boldsymbol{\Delta} \mathbf{y}_{t-j} + \mathbf{e}_t \]
The Error Correction Model with matrix-valued observations is \[ \boldsymbol{\Delta} \mathbf{Y}_t = \mathbf{D} + \mathbf{U}_1 \mathbf{U}_3'\mathbf{Y}_{t-1} \mathbf{U}_4 \mathbf{U}_2' + \sum_{j=1}^{p-1} \boldsymbol{\phi}_{1,j} \boldsymbol{\Delta} \mathbf{y}_{t-j} \boldsymbol{\phi}_{2,j}' + \mathbf{E}_t \]
\[ \mathbf{E}_t \sim MVN(\mathbf{0}, \boldsymbol{\Sigma}_1, \boldsymbol{\Sigma}_2) \]
Vectorizing the MECM1 yields \[ \boldsymbol{\Delta} \mathbf{y}_t = \mathbf{d} + \underbrace{(\mathbf{U}_2 \otimes \mathbf{U}_1)}_{\boldsymbol{\alpha}} \underbrace{(\mathbf{U}_4 \otimes \mathbf{U}_3)'}_{\boldsymbol{\beta}'} \mathbf{y}_{t-1} + \sum_{j=1}^{p-1} \underbrace{(\boldsymbol{\phi}_{2,j} \otimes \boldsymbol{\phi}_{1,j})}_{\boldsymbol{\phi}_j} \boldsymbol{\Delta} \mathbf{y}_{t-j} + \mathbf{e}_t \] \[ \operatorname{vec}(\mathbf{E}_t) \sim N(\operatorname{vec}(\mathbf{0}), \boldsymbol{\Sigma}_2 \otimes \boldsymbol{\Sigma}_1) \]
The log-likelihood (up to a constant) of the MECM(p) for fixed rank \(r_1\) and \(r_2\) is \[ \mathcal{L} (\boldsymbol{\Theta}) = -\frac{T N_2}{2} \log | \boldsymbol{\Sigma}_1 | - \frac{T N_1}{2} \log | \boldsymbol{\Sigma}_2 | - \frac{1}{2} \sum_{t=1}^T \operatorname{tr} \left(\boldsymbol{\Sigma}_1^{-1} \mathbf{E}_t \boldsymbol{\Sigma}_2^{-1} \mathbf{E}_t'\right) \] where \(\mathbf{E}_t = \boldsymbol{\Delta} \mathbf{Y}_t - \mathbf{U}_1 \mathbf{U}_3' \mathbf{Y}_{t-1} \mathbf{U}_4 \mathbf{U}_2' - \sum_{j=1}^{p-1} \boldsymbol{\phi}_{1,j} \boldsymbol{\Delta} \mathbf{Y}_{t-j} \boldsymbol{\phi}_{2,j}' - \mathbf{D}\) and \(\boldsymbol{\Theta}\) collects all parameters. This can be solved using gradient descent.
In practice, the ranks \(r_1\) and \(r_2\) are unknown and must be selected. We use Akaike Information Criteria (AIC) and Bayesian Information Criteria (BIC). \[ \text{AIC}(r_1, r_2, p) = -2 \mathcal{L}(\widehat{\boldsymbol{\Theta}}) + 2 \psi(r_1, r_2, p) \\ \text{BIC}(r_1, r_2, p) = -2 \mathcal{L}(\widehat{\boldsymbol{\Theta}}) + \ln(T) \psi(r_1, r_2, p) \]
where \[ \psi(r_1, r_2, p) = r_1 (2 N_1 - r_1) + r_2 (2 N_2 - r_2) + (p-1) (N_1^2 + N_2^2) \] is the number of parameters to estimate.
| Setting | \(N_1 \times N_2 = 3 \times 4\) | ||
|---|---|---|---|
| Fully Reduced | \(\mathbf{r} = (1,1)\) | ||
| Partially Reduced (1st Dimension) | \(\mathbf{r} = (1,4)\) | ||
| Partially Reduced (2nd Dimension) | \(\mathbf{r} = (3,1)\) | ||
| No Rank Reduction | \(\mathbf{r} = (3,4)\) |
| Method1 | Average Rank | Standard Deviation | Freq. Correct |
|---|---|---|---|
| AIC (100) | (1.02, 1.00) | (0.16, 0.00) | (0.98, 1.00) |
| BIC (100) | (1.02, 1.00) | (0.13, 0.00) | (0.98, 1.00) |
| AIC (250) | (1.01, 1.31) | (0.08, 0.00) | (0.99, 1.00) |
| BIC (250) | (1.01, 1.00) | (0.04, 0.00) | (0.99, 1.00) |
| Method | Average Rank | Standard Deviation | Freq. Correct |
|---|---|---|---|
| AIC (100) | (1.13, 3.99) | (0.34, 0.03) | (0.87, 0.99) |
| BIC (100) | (1.06, 3.99) | (0.24, 0.03) | (0.94, 0.99) |
| AIC (250) | (1.03, 4.00) | (0.18, 0.00) | (0.97, 1.00) |
| BIC (250) | (1.02, 4.00) | (0.14, 0.00) | (0.98, 1.00) |
| Method | Average Rank | Standard Deviation | Freq. Correct |
|---|---|---|---|
| AIC (100) | (3.00, 1.07) | (0.00, 0.26) | (1.00, 0.93) |
| BIC (100) | (3.00, 1.01) | (0.00, 0.04) | (1.00, 0.99) |
| AIC (250) | (3.00, 1.03) | (0.00, 0.17) | (1.00, 0.97) |
| BIC (250) | (3.00, 1.00) | (0.00, 0.00) | (1.00, 1.00) |
| Method | Average Rank | Standard Deviation | Freq. Correct |
|---|---|---|---|
| AIC (100) | (3.00, 4.00) | (0.00, 0.00) | (1.00, 1.00) |
| BIC (100) | (3.00, 4.00) | (0.00, 0.00) | (1.00, 1.00) |
| AIC (250) | (3.00, 4.00) | (0.00, 0.00) | (1.00, 1.00) |
| BIC (250) | (3.00, 4.00) | (0.00, 0.00) | (1.00, 1.00) |
| Setting | \(N_1 \times N_2 = 3 \times 4\) | ||
|---|---|---|---|
| Fully Reduced | \(\mathbf{r} = (1,1)\) | ||
| Partially Reduced (1st Dimension) | \(\mathbf{r} = (1,4)\) | ||
| Partially Reduced (2nd Dimension) | \(\mathbf{r} = (3,1)\) | ||
| No Rank Reduction | \(\mathbf{r} = (3,4)\) |
| Method | Average Rank | Standard Deviation | Freq. Correct |
|---|---|---|---|
| AIC (100) | (1.06, 1.03) | (0.23, 0.21) | (0.94, 0.97) |
| BIC (100) | (1.01, 1.00) | (0.08, 0.00) | (0.99, 1.00) |
| AIC (250) | (1.06, 1.02) | (0.23, 0.15) | (0.94, 0.98) |
| BIC (250) | (1.01, 1.00) | (0.08, 0.00) | (0.99, 1.00) |
| Method | Average Rank | Standard Deviation | Freq. Correct |
|---|---|---|---|
| AIC (100) | (1.01, 4.00) | (0.05, 0.00) | (0.99, 1.00) |
| BIC (100) | (1.01, 4.00) | (0.03, 0.00) | (0.99, 1.00) |
| AIC (250) | (1.00, 4.00) | (0.00, 0.00) | (1.00, 1.00) |
| BIC (250) | (1.00, 4.00) | (0.00, 0.00) | (1.00, 1.00) |
| Method | Average Rank | Standard Deviation | Freq. Correct |
|---|---|---|---|
| AIC (100) | (3.00, 1.23) | (0.00, 0.59) | (1.00, 0.86) |
| BIC (100) | (3.00, 1.01) | (0.00, 0.08) | (1.00, 0.99) |
| AIC (250) | (3.00, 1.10) | (0.00, 0.40) | (1.00, 0.93) |
| BIC (250) | (3.00, 1.01) | (0.00, 0.03) | (1.00, 0.99) |
| Method | Average Rank | Standard Deviation | Freq. Correct |
|---|---|---|---|
| AIC (100) | (3.00, 4.00) | (0.00, 0.00) | (1.00, 1.00) |
| BIC (100) | (3.00, 4.00) | (0.00, 0.00) | (1.00, 1.00) |
| AIC (250) | (3.00, 4.00) | (0.00, 0.00) | (1.00, 1.00) |
| BIC (250) | (3.00, 4.00) | (0.00, 0.00) | (1.00, 1.00) |
| Indicator | \(\widehat{\mathbf{U}}_1\) | \(\widehat{\mathbf{U}}_3\) |
|---|---|---|
| GDP | -0.084 | 1.000 |
| PROD | -0.201 | -0.099 |
| IR | -7.843 | -0.021 |
| Country | \(\widehat{\mathbf{U}}_2\) | \(\widehat{\mathbf{U}}_4\) |
|---|---|---|
| USA | 0.088 | 1.000 |
| DEU | 0.774 | 0.055 |
| FRA | 1.103 | -1.092 |
| GBR | 0.674 | -0.196 |
Pseudo-Code for the Gradient Descent Algorithm