In this section we present the data from the Mexican economy used in this study, describe the methodology to evaluate the accuracy of the common trends in order to predict the Mexican economy, determine the number of common trends, and analyze their dynamic by evaluating the performance of the FAVAR model proposed in Sect. 2 with respect to benchmark models.
Data
Initially, we consider 511 macroeconomic and financial variables obtained from the Banco de Información Económica (BIE) of the Insituto Nacional de Geografía y Estadística (INEGI), Mexico’s national statistical agency. The analysis covers from March 2005 to April 2016, hence, \(T = 133\). The blocks of variables are considered according to the INEGI division, and additionally, we compare this division with respect to the National Institute’s Global Econometric Model, which considers nine blocks with a total of 67 variables.Footnote 7
According to our approach, it is necessary that all variables are integrated of order one, so that, if factors are found, these are the common trends of the observations. In this case, we consider only I(1) variables according to the Augmented Dickey–Fuller (ADF) test.Footnote 8 When needed, the time series have been deseasonalized and corrected by outliers using X-13ARIMA-SEATS developed by the US Census Bureau.Footnote 9 Following Stock and Watson (2005), outliers are substituted by the median of the five previous observations.
Finally, according to these non-stationary conditions, we work with the following database of \(N = 211\) variables (number between parentheses)Footnote 10:
-
Balance of trade (19)
-
Consumer confidence (18)
-
Consumption (9)
-
Economic activity (13)
-
Employment (5)
-
Financial (35)
-
Industrial (58)
-
International (17)
-
Investment (8)
-
Miscellaneous (18)
-
Prices (11).Footnote 11
Furthermore, we define \(x_t\) as the Global Index of Economic Activity (IGAE, Indicador Global de la Actividad Económica).Footnote 12
Estimating the common trends
We apply the criteria described in this article to detect the number of factors. We use an \(r_{\max } = 11\). The results indicate that \({\hat{r}}_{\mathrm{ER}} = {\hat{r}}_{\mathrm{GR}} = 2\) and \({\hat{r}}_{\mathrm{ED}} = 5\).Footnote 13 Given that Onatski (2010) is more robust in the presence of non-stationarity, see Corona et al. (2017a), we work with this number of factors.Footnote 14
Figure 1 plots the results of the ratios and differences of eigenvalues with the respective threshold. Note that, intuitively, it is congruent that the ratio of eigenvalues of Ahn and Horenstein (2013) determines one and two common factors, given that the first and second differences of eigenvalues are large, while the others are practically zero. However, the estimation of the “sharp” threshold is one of the main contributions from Onatski (2010), which as we have mentioned, consistently separates convergent and divergent eigenvalues. Furthermore, the five common factors explain 79.19\(\%\) of the total variability. Specifically, the first common factor explains 46.14\(\%\), the second 19.59\(\%\), the third 6.2\(\%\), the fourth 4.75\(\%,\) and the fifth 2.51\(\%\).
Figure 2 plots the behaviors between \(\ln x_t\) (deseasonalized) and each common factor extracted by PC and PLS. We observe that the first common factor, for each procedure, is similar to \(\ln x_t\) with contemporaneous linear correlations of 0.95 for PC and 0.96 for PLS. The second common factors are slightly correlated with \(\ln x_t\), having contemporaneous correlations of 0.13 and 0.08 for PC and PLS, respectively. On the other hand, \({\tilde{F}}_{3t}\) and \({\tilde{F}}_{5t}\) have an inverse behavior, although the correlations with respect to \(\ln x_t\) are around \(-0.01.\) In the other cases, the estimated common factors are positively related to \(\ln x_t\) but the linear correlations are drastically small. These facts indicate that contemporaneously, only the first two common factors are associated with economic activity. It is interesting to mention that these two factors explain 65.73% of the total variability. Note that it is complicated to establish the predictive capacity between the common factors, and it is necessary to evaluate it in the FAVAR model. Furthermore, it is clear that the sample period includes the economic crisis of 2008–2009; however, Stock and Watson (2011) show that the PC estimator of the factors is consistent even with certain types of breaks or time variation in the factor loadings.
Figure 3 shows the weighted average contribution of each variable block in the common factors estimated by PC. Each color bar represents the contribution of the percentage of explanation of each variable group with respect to each common factor, denoted as follows:
$$\hat{\mu }_{jg} = \sum_{i_g = 1}^{N_g}|\hat{p}_{i_gjg}|/N_g \quad \text{for } j = 1, \ldots r, \text{ and } g = 1, \ldots , G,$$
where \(\hat{p}_{i_gjg}\) is the loading weight of each group of variables; G is the number of blocks of variables; and \(N_g\) is the number of variables in each group. Specifically, the order of the groups of variables is computed as \(r^{-1}\sum_{j=1}^{r}{\hat{\lambda }}_{j}\hat{\mu }_{jg}\) where \({\hat{\lambda }}_j\) is the variance contribution of each common factor. The block that most explains the common factors is the miscellaneous group, economic activity, and balance of trade blocks. On the other hand, prices, employment, and consumer confidence are the least relevant groups of variables. Note that this importance is in terms of the loading contribution and it is not interpreted as predictive power. Specifically, the first common factor is more correlated with the IGAE of tertiary activities (0.99), the second common factor with the economic situation with respect to last year (0.94), the third common factor with oil exports (0.78), the fourth common factor with edification (0.70), and the fifth common factor with the food industry (0.53).
In order to evaluate if the common factors are the common trends of \(x_t\), we carry out the cointegration exercise. The possible cointegration relationship is given by:
$$\begin{aligned} {\hat{v}}_t= & {} \ln x_t - \underset{(0.0000)}{4.6369} - \underset{(0.0000)}{0.0719}\hat{F}_{1t} - \underset{(0.0000)}{0.0098}\hat{F}_{2t} \nonumber \\ &-\,\underset{(0.0000)}{0.0003}\hat{F}_{3t} + \underset{(0.0000)}{0.0008}\hat{F}_{4t} + \underset{(0.0000)}{0.0002}\hat{F}_{5t}. \end{aligned}$$
(13)
We estimate the ADF test with its respective p value (in parentheses), obtaining the following results:
$$\text{ADF test}{:} -3.4382\ (0.01)$$
Then, we can verify that the common trends of a large dataset of the economic variables are cointegrated with economic activity. Furthermore and following Bai and Ng (2004), first, we carry out a Panel Analysis of Non-stationarity in Idiosyncratic and Common Components (PANIC) on the idiosyncratic errors obtained using the “differencing and recumulating” method in order to disentangle the non-stationarity in this component. Second, we apply a PANIC to the idiosyncratic components estimated using data in levels as we have proposed in this study. In the first analysis, we obtain a p value of 0.1171 while in the second, we obtain a p value of 0.0000. Although in the first case \({\hat{\varepsilon }}_t\) is statistically non-stationary, the p value is around the uncertainty zone. Furthermore, we apply the variant of the ADF test proposed by Bai (2004) with the aim of detecting how many of the five common factors are non-stationary. As we expected, the tests show that the five common factors are non-stationary. Therefore, we conclude that the idiosyncratic terms are stationary and the common factors are non-stationary; hence, we can also argue that the elements of \(Y_t\) are cointegrated and the common factors are the common trends of \(Y_t\) and \(x_t\).
Note that Eq. (13) is the static version of Eq. (7). The goal of this exercise is to determine whether \(F_t\) are the common trends of \(x_t\) and \(Y_t\). We use this information to forecast the target variable with the FAVAR model presented in Sect. 2.
Evaluating the use of common trends to predict Mexican economic activity
It is important to have an empirical strategy to adequately predict the target variable. Consequently, with the aim of selecting the forecast model, we consider all possibilities of FAVAR models, \(\sum_{i=1}^r {}^rC_i\times 396\), where \(^rC_i\) is the binomial coefficient \(\left( {\begin{array}{c}r\\ i\end{array}}\right)\) and 396 is obtained as the product of 11 seasonal dummies (3–12 and none), 3 deterministic specifications in the FAVAR model (none, constant and trend), and 12 lags (1–12). Therefore, the lag order, the seasonal dummies, the deterministic component, and the factors are directly determined by minimum out-sample forecast error. The training sample covers from March 2015 to April 2016, such that we forecast 12 periods (1 year) for \(h = 2\). This forecast period is one of relative economic and political stability in Mexico. For example, during this time frame, the annual growth rate of Mexico’s GDP in any quarter was never lower than 2.3% and never higher than 2.8% (INEGI 2016). Moreover, the country’s economic performance tends to be more volatile in times close to presidential elections in Mexico, and to a lesser degree in the US, which do not coincide with our forecast period. Statistically, this period represents around 10% of the number of observations and discounting the degrees of freedom, we are able to represent 25% of \(T-K\) where K is the number of parameters in the FAVAR model.
For each model, we compute the forecast error. The forecasts are dynamic, so that we update \(T+1\) in each month. We selected the model that minimizes the Root Mean Square Error (RMSE).Footnote 15 Furthermore, we focus on the models that give a forecast error lower than a threshold. This threshold is determined directly using Eickmeier et al. (2014) as a reference. In their work they predict several macroeconomic variables using FAVAR models, FAVAR-tv, FAVAR-tv with stochastic volatility errors and univariate models. For one and two step ahead, forecasting the US GDP, the RMSE values in in-sample forecasts are 0.76 and 0.80 considering all periods for \(h = 1\) and \(h = 2,\) respectively. However, we select a threshold of 0.5 to obtain more accurate forecasts. Note that, if we predict \(\varDelta \ln x_t\) for the first step ahead, the forecast error is \(e_{T + 1} = (\ln x_{T + 1} - \ln x_{T}) - (\ln x_{T+1}^{f} - \ln x_T) = \ln x_{T + 1} - \ln x_{T+1}^{f}\), i.e., it is equivalent to forecasting the level of the first step ahead. Therefore, we focus on the levels of the IGAE.
Using common trends to predict Mexican economic activity
First, in order to descriptively evaluate the predictive capacity of each variable, we calculate the cross-correlation between \(y_{it-h}\) and \(x_t\) and consequently:
$$\rho ^{*}_h = \text{corr}(y_{it-h},x_t)|\lbrace \max (h \le 12:\text{Prob}(\text{corr}(y_{it-h}, x_t))<\alpha )\rbrace,$$
where \(\alpha = 0.05\). Figure 4 plots the results of the previous equation. Note that the top panel plots the \(\rho ^{*}_h\) with the confidence interval. The middle panel shows their corresponding maximum significant lag, \(\max (h \le 12:\text{Prob}(\text{corr}(y_{it-h}, x_t))<\alpha )\) and bottom panel presents the mean absolute correlations for each block of variables. Note that the block of variables most highly correlated with the future of \(x_t\) is the miscellaneous one. It is interesting to note that all significant variables of this block are positively correlated with the IGAE. Other interesting variable blocks are the economic activity and financial groups. On the other hand, the blocks least correlated with the future of \(x_t\) are consumer confidence, employment, and prices.
It is important to state that the PANIC carried out on the idiosyncratic errors and common factors shows that the elements of \(Y_t\) are cointegrated; hence, it is reasonable to expect that the correlations between \(y_{it}\) and \(x_t\) are not spurious.
Once the models are estimated, we review the forecast errors lower than the threshold in the training sample. Figure 5 plots the historical behavior of the RMSE for the selected predictive models. The top panel plots the RMSE for \(h = 1\). Note that the PC gives slightly more accurate results than PLS. Furthermore, the dispersion of the RMSE for PLS is larger than PLS and neither approach presents outliers. The bottom panel shows the results for \(h = 2\). We can see that for both procedures, the forecast errors are slightly increased with respect to \(h = 1\). It is interesting to mention that for both h the tails from both procedures are intercepted. This graph is important because this behavior is expected for the following two predicted months.
Using the models from Fig. 5 we predict two steps ahead: May and June 2016. Note that we have n models, such that it is necessary to combine the forecasts. Thus we propose a weighted average, obtaining loadings similar to the PLS procedure by solving the following optimization problem:
$$\rho_1 = \underset{w,b}{\max \text{cor}}(X^{f}w, xb) \quad \text{subject to} \quad \text{Var}(X^{f}w) = \text{Var}(xb) = 1,$$
where \(w = (w_1,\ldots , w_n)\). In order to normalize the loading weights, we carry out the following scaling: \(w^{*} = n(\sum_{i = 1}^{n}w_i)^{-1}w\), such that the loading weights are between 0 and 1. Figure 6 shows the forecast density, the predictions, and the observed data. We plot the confidence interval to 95\(\%\). Note that for \(h = 1\), PLS is more accurate than PC, while for \(h = 2\), PLS is less accurate. The models are centered on the mean of the distribution. Moreover, the forecast density acquires the observed data. Furthermore, focusing on \(h = 2\), the distribution of PC has two modes and the predicted data tend towards the center of the distribution. On the other hand, for PLS, it tends towards the median of the distribution.
An interesting question is: which common trends are helpful to reduce forecast error? To this end, we compute the following coefficients through OLS:
$$\partial e_t | D_{ti} = 1< 0 \quad \text{and} \quad \text{Prob}(\partial e_t | D_{ti} = 1) < 0.10 \qquad \text{for } i = 1,\ldots , N_{\mathrm{m}},$$
where \(N_{\mathrm{m}}\) is the number of models for each procedure. In other words, we carry out a linear regression between the forecast errors according to each procedure and dummy variables that specify the combination of common trends \(F_{1t}\), \(F_{2t}\), \(F_{3t}\), \(F_{4t},\) and \(F_{5t}\) for both procedures. Figure 7 plots the result for each procedure in each h. We can see that, in PC, \(F_{1t}\) is a very important common factor to reduce the forecast error for \(h = 1\), whereas \(F_{1t}\), \(F_{2t}\), \(F_{4t},\) and \(F_{5t}\) are important for \(h = 2\). Furthermore, in the PLS approach \(F_{1t}\), \(F_{2t}\), \(F_{4t},\) and \(F_{5t}\) are relevant common factors for \(h = 1\) whereas \(F_{1t},\) and \(F_{2t}\) are for \(h = 2\). In fact, for both procedures the interaction of all common factors is important to reduce the forecast errors. This result completes the conclusions when we analyze Figs. 1 and 4, where all common factors are helpful to reduce the forecast errors.
In order to evaluate the forecast accuracy in “real time,” we predict two steps ahead (May and June 2016). Figure 8 shows the forecast accuracy of the following models: (i) PC, (ii) PC using only factors that contribute to reducing the forecast error denoted as PC (2), (iii) PLS, (iv) PLS (2), (v) the average between the first and third models, and (vi) the average between the second and fourth models. We can see that for \(h = 1\) PLS, PLS (2) and the forecast average of PC and PLS are the most accurate model with 0 forecast error for May 2016. For \(h = 1\), PC gives a forecast error of 0.1 in June 2016. In conclusion, and with reference to Fig. 5, the results are as expected given that PC and PLS give forecast errors lower than the selected threshold.
It is interesting to mention the historical behavior of the models. Hence, we first analyze the RMSE of the benchmark models: the Autoregressive Integrated Moving Average (ARIMA) and the macroeconomic diffusion index (Stock and Watson 2002b). Figure 9 plots the results for the out-sample training. For \(h = 1\), we observe that the RMSE interval is between 0.82 and 1.07 for the ARIMA model, while for the macroeconomic diffusion index it is between 0.62 and 0.83. For \(h = 2\) the errors are slightly reduced in both cases; however, the macroeconomic diffusion index has better results. Hence, note that the inclusion of the factor in linear models reduces the forecast error. Then, we would expect the FAVAR models to show a small RMSE.Footnote 16
Figure 10 plots step-by-step the forecast errors of the FAVAR models considering the approach presented in this work. Note that the RMSE interval of PC for \(h = 1\) is between 0.27 and 0.67 and for \(h = 2\) between 0.29 and 0.59. The RMSE mean values are 0.47 and 0.44 for each h, respectively. Note that in both h, February 2016 is the outlier forecast error. On the other hand, for PLS and \(h = 1\), the forecast errors are between 0.34 and 0.60 and for \(h = 2\), the confidence interval is between 0.3 and 0.64. In this case, the mean of RMSE is 0.47 for each h, respectively. Note that the improvement with respect to the ARIMA model and macroeconomic diffusion index is relevant, above all when the factors are estimated using PC. Note that the forecast errors are very similar between PC and PLS.
A question of interest is: are the models consistent through the training sample? We can obtain different models in each step ahead, in which case the consistency of the predictors may be questionable. Taking into account only the selected models, we can observe in Fig. 11 that all models are robust in \(h = 1\). In fact, 76\(\%\) of the models are within the threshold and for \(h = 2\), the behavior of the predictions is similar to May 2016. Note that the predictions for May and June 2016 are carried out with information up to April 2016. Hence, we present the forecast in \(h = 1\) for May 2016 and \(h = 2\) for June 2016. The model is not updated as in the previous forecasts. It is interesting to note that February 2016 was the most complicated month to predict; however, the robustness of the selected models is reasonable. The reason why February 2016 was a complicated month to predict can be explained by the fact that this month has 29 days, and the seasonal variables and the dynamics between the variables do not account for this effect.