Introduction

Principal covariates regression (PCovR) was proposed by de Jong and Kiers (1992) to deal with the interpretational and technical problems that are encountered when applying ordinary least squares (OLS) regression using a relatively high number of predictor variables that are correlated to some extent. Interpreting the regression weights becomes challenging in such cases, as the weights only indicate the unique contributions of the predictor variables to the prediction problem and, thus, do not reflect possible shared effects (Cohen, Cohen, West, & Aiken, 2003). Moreover, the bouncing beta problem might occur, implying that removing or adding a single observation can already alter the regression weights drastically (Kiers & Smilde, 2007).

In PCovR, the predictor variables are reduced to a limited number of components that are linear combinations of the original predictors. Simultaneously, the criterion variables are regressed on these components. The components are extracted in such a way that they are good summarizers of the predictor data (i.e., explain a lot of the predictor variance), but also predict the criterion scores well (i.e., explain a lot of the criterion variance). PCovR has at least three attractive features: First, a closed-form solution is available for extracting the components. Second, the user may choose the extent to which either reduction or prediction is emphasized through a weighting parameter that ranges between 0 and 1. When the weighting parameter equals 0, PCovR corresponds to reduced rank regression (RRR, Aldrin, 2002), whereas a value of 1 implies that PCovR boils down to principal components regression (PCR: Jolliffe, 1982). Third, PCovR has been found to outperform popular methods such as partial least squares (PLS; Wold, Ruhe, Wold, & Dunn III, 1984) and PCR (Jolliffe, 1982) in retrieving relevant predictor components (i.e., the subset of predictor components that explain variance in the criterion block; see Vervloet, Van Deun, Van den Noortgate, & Ceulemans, 2016 and Van Deun, Crompvoets, & Ceulemans, 2018). A user-friendly R package is available for performing all the steps of a PCovR analysis (including preprocessing, model selection, rotation, etc.); the use of this package is described in Vervloet, Kiers, Van den Noortgate, and Ceulemans (2015). Note that PCovR differs from so-called coupled matrix factorization methods (Acar, Kolda, & Dunlavy, 2011; Acar, et al., 2014; De Lathauwer & Kofidis, 2017; Wilderjans, Ceulemans, & Van Mechelen, 2009; Wilderjans, Ceulemans, Van Mechelen, & van den Berg, 2011), in that PCovR treats predictors and criteria in an asymmetric way (i.e., reversing the role of the predictors and criteria would yield a different solution), whereas both sets of variables are treated symmetrically and have a similar status in coupled matrix factorization.

Performance of PCovR has mostly been studied in datasets that include one criterion variable only, however. When datasets contain a high number of criterion variables, PCovR users face new interpretational problems, because the number of regression weights will be high. Indeed, each criterion variable will correspond to a separate set of regression weights. Moreover, the criterion variables can be correlated with each other and partly include the same information; this may not be evident from these separate sets of regression weights, however. Finally, it might happen that some of the criteria are not related to the predictors, and thus cannot be predicted well. In line with what has been proposed in the PLS framework (i.e., PLS2, Manne, 1987), this interpretational burden can be lifted by also reducing the criterion variables to a few components and predicting these criterion components rather than the individual criteria. Such an approach facilitates interpretation, as it sheds light on the interrelations among the predictors and among the criteria, and reduces the number of regression weights that have to be interpreted. Herewith, it is important that the number of predictor components may differ from the number of criterion components, if one wants to fully grasp the predictor structure. Indeed, imposing an equality restriction on the number of predictor and criterion components, as is the case in PLS2, is more defendable if one is solely focused on predicting the criteria as well as possible.

In this paper, we therefore propose PCovR2, which extends PCovR by also reducing the criteria to a few components. These criterion components are predicted based on the predictor components, allowing both sets of components to differ in number. In line with PCovR, users can specify how much they want to emphasize reduction and prediction, by means of a weighting parameter. We will demonstrate that due to the weighting parameter, PCovR2 is indeed a very flexible tool. It can accommodate different research goals that researchers might want to pursue. Specifically, using a low weighting value allows one to prioritize retrieving the relevant predictor components and predictable criterion components and thus to focus on the regression problem, whereas with a high weighting value, the predictor and criterion components are formed based on how much variance they can explain in the corresponding data blocks, disregarding the link between the two blocks.

Moverover, we also give suggestions on how to further decide on the number of predictor components and criterion components once the decision on the weighting value has been taken. Especially, finding the number of relevant predictor components and predictable criterion components is not trivial. Therefore, we put forward a model selection approach to assist users when taking this decision.

We will compare the performance of PCovR2 with two alternative approaches: PLS2 and PCR2. As mentioned above, PLS2 is more restrictive than PCovR2 in that it requires the number of criterion components to equal the number of predictor components (Manne, 1987). We will demonstrate that even though PLS2 has a high predictive power, it does not necessarily reconstruct the underlying predictor and criterion components well. PCR2 is a natural extension of PCR that boils down to performing two separate principal component analyses (PCA; Pearson, 1901), one on the predictors and one on the criterion variables. Next, it predicts the criterion components on the basis of the predictor components in an additional ordinary least squares (OLS) step. We expect that this approach will perform worse than PCovR2, if one aims to focus on extracting the relevant predictor components and predictable criterion components. Indeed, with PCR2, the predictor information is completely ignored when extracting the criterion components and vice versa. As a result, the obtained predictor components may be irrelevant for predicting the criterion components.

The remainder of the paper is structured as follows: First, we discuss the original PCovR method. We then present the new PCovR2 technique and discuss the model selection problem. We also show that PCR2 and PLS2 fail when the research focus lies on reconstructing the underlying relevant predictor and predictable criterion components. Second, we present a simulation study in which the performance of PCovR2 is evaluated as well as the proposed PCovR2 model selection heuristic for deciding on the number of relevant predictor components and predictable criterion components. In the subsequent section, we analyze data from the 500 Family Study (Schneider & Linda, 2008). Discussion points and concluding remarks are given in the final section.

From PCovR to PCovR2

Data

The data consist of a block of predictors X (N × J ) and a block of criteria Y (N × K), which are all measured for the same N observations. Preprocessing the data is important, as it will influence the results that are obtained. Specifically, we will assume all variables to be centered (i.e., the mean of each variable is set to zero). As holds for component analysis methods in general, centering the data is necessary in order to model the covariance structure of the data. Scaling is important if differences in variance between variables are considered as arbitrary, since variables with larger variances would otherwise influence the obtained components more. In this paper, we assume that the sum of squares of each variable is set to one, implying that each variable has an equal weight in the analysis.

PCovR

Model

The PCovR model (De Jong & Kiers, 1992) decomposes the predictor matrix X as follows:

$$ \mathbf{X}=\mathbf{X}{\mathbf{W}}_{\mathrm{X}}{\mathbf{P}}_{\mathrm{X}}^{\prime }+{\mathbf{E}}_{\mathrm{X}}={\mathbf{T}}_{\mathrm{X}}{\mathbf{P}}_{\mathbf{X}}^{\prime }+{\mathbf{E}}_{\mathrm{X}}. $$
(1)

TX = XWX is the N × RX component score matrix that holds the scores of the N observations on the RX components. The sum of squares per column of TX is constrained to be equal to 1. PX is the J × RX loading matrix that contains the loadings of the predictor variables on the components. If the columns of TX are mutually orthogonal, these loadings boil down to the correlations among the predictors and the components, since we assume the data to be centered around zero and to be scaled to the sum of squares of one per variable (see 2.1). EX holds the residuals for the predictors, and Wx is a J × RX weight matrix. The criteria in Y are regressed on the RX components:

$$ \mathbf{Y}={\mathbf{T}}_{\mathbf{X}}{\mathbf{P}}_{\mathbf{Y}}^{\prime }+{\mathbf{E}}_{\mathbf{Y}}, $$
(2)

where the matrix PY (K × RX) contains the regression weights for each of the K criteria and EY the residuals for Y. As extensively discussed in Vervloet et al. (2015), PCovR solutions have rotational freedom as multiplying PY and PX by a rotation matrix Z and counter-rotating TX by multiplying it with (Z’)-1 does not alter the reconstructed X or Y scores.

Estimation

In a PCovR analysis, the reduction of predictor variables to components, and the prediction of the criterion variables by those components, is conducted simultaneously. This can clearly be seen in the PCovR loss function LPCovR:

$$ {L}_{\boldsymbol{PCovR}}=\alpha \frac{{\left\Vert \mathbf{X}-{\mathbf{T}}_{\mathbf{X}}{\mathbf{P}}_{\mathbf{X}}\prime \right\Vert}^2}{{\left\Vert \mathbf{X}\right\Vert}^2}+\left(1-\alpha \right)\frac{{\left\Vert \mathbf{Y}-{\mathbf{T}}_{\mathbf{X}}{\mathbf{P}}_{\mathbf{Y}}\prime \right\Vert}^2}{{\left\Vert \mathbf{Y}\right\Vert}^2}, $$
(3)

with α being a weighting parameter, ranging between 0 and 1, that specifies to what degree reduction and prediction are emphasized. As already mentioned in the introduction, when α equals 0, PCovR boils down to RRR (Aldrin, 2002) and, when α equals 1, to PCR (Jolliffe, 1982). High α values will lead to a higher focus on strong components (i.e., components that explain a lot of variance in X), while low α values most likely lead to more relevant components (i.e., components that explain a lot of variance in Y). In practice, balancing both aspects is often desirable, as using a too high α can lead to irrelevant components and using a too low α to uninterpretable ones. How this balance can be achieved is discussed in Section “Model selection”.

Given a specific α value and number of components RX, a closed-form solution exists for the PCovR parameters. Specifically, TX is set equal to the first RX eigenvectors of the matrix

$$ \mathbf{D}=\alpha \frac{\mathbf{X}\mathbf{X}^{\prime }}{{\left\Vert \mathbf{X}\right\Vert}^2}+\left(1-\alpha \right)\frac{{\mathbf{H}}_{\mathbf{X}}\mathbf{YY}^{\prime }{\mathbf{H}}_{\mathbf{X}}}{{\left\Vert \mathbf{Y}\right\Vert}^2}, $$
(4)

in which HX = X(X’X)-1X’ (see Vervloet et al., 2016, for more information). The loading matrix and regression weights can then be calculated, respectively, as PX = TX’X and PY = TX’Y.

Model selection

Performing model selection for PCovR is complicated, as one should determine both the number of components and the weighting parameter value in Eq. 3. Moreover, researchers may have different opinions on which components they want to retrieve. Some may aim to retrieve all important predictor components, irrespective of their relevance. Others may take the same stance as Vervloet et al. (2016) and consider a PCovR solution to be appropriate if all components that are relevant for predicting the criterion data, i.e., have a non-negligible regression weight, are recovered, irrespective of the amount of predictor variance they explain. Different selection criteria can be used, such as cross-validation, scree tests, and combinations thereof. One could, for instance, first select the number of components by performing a scree test on the predictor data and afterwards use cross-validation to tune the weighting parameter. This approach allows one to focus on the number of strong components, i.e., the subset of components that explain a lot of variance (Ceulemans & Kiers, 2009). Yet, these strong components are not necessarily relevant for prediction purposes. Alternatively, one could simultaneously optimize the number of components and the weighting parameter using K-fold cross-validation (Breiman, Friedman, Olshen, & Stone, 1984). This might be cumbersome, however, because of the large number of possible combinations. Moreover, the study by Vervloet et al. (2016) has shown that cross-validation tends to extract too many components. Importantly, the results reported by Vervloet et al. (2013, 2016) all indicate that using a low α is a good idea, if one is mainly interested in finding the relevant components. Indeed, a low α will penalize extracting components that do not help in reconstructing the criterion data.

PCovR2

Model

In PCovR2, we still assume the columns of the predictor matrix X and the criterion matrix Y to be centered and scaled to a unit sum of squares. The predictor matrix X is decomposed according to Eq. 1. The criterion matrix Y is modeled as follows:

$$ \mathbf{Y}={\mathbf{Y}\mathbf{W}}_{\mathbf{Y}}{\mathbf{P}}_{\mathbf{Y}}^{\prime }+{\mathbf{E}}_{\mathbf{Y}}={\mathbf{T}}_{\mathbf{Y}}{\mathbf{P}}_{\mathbf{Y}}^{\prime }+{\mathbf{E}}_{\mathbf{Y}}. $$
(5)

with TY holding the N × RY scores of the N observations on the RY criterion components and PY the K × RY loadings of the criterion variables on the criterion components. Imposing TY’TY = I, these loadings reflect the correlations among the criteria and the criterion components. EY holds the residuals for the criteria, and WY is a K × RY weight matrix. Finally, TY is regressed upon TX:

$$ {\mathbf{T}}_{\mathbf{Y}}={\mathbf{T}}_{\mathbf{X}}\mathbf{B}^{\prime }+{\mathbf{E}}_{{\mathbf{T}}_{\mathbf{Y}}}, $$
(6)

where B is an RY × RX matrix that holds the associated regression weights, and \( {\mathbf{E}}_{{\mathbf{T}}_{\mathbf{Y}}} \) contains the corresponding residuals. Because the columns of TX and TY have a sum of squares of one, the regression weights can be directly compared in size.

PCovR2 solutions have rotational freedom, in that without losing fit, the predictor components and criterion components can be rotated independently from each other to facilitate interpretation. Indeed, multiplying PX and PY by the rotation matrices ZX and ZY and counter-rotating TX and TY, by multiplying them with (ZX’)-1 and (ZY’)-1, does not alter the reconstructed X or Y scores. For instance, both sets of components can be rotated using a simple structure rotation criterion such as varimax. Obviously, such rotations would imply that the component scores TX and TY as well as the regression weights B need to be recalculated to take the rotation into account. When applying orthogonal rotations, the following formula can be used:

$$ {\mathbf{B}}_{rotated}={\mathbf{Z}}_{\mathbf{Y}}^{\prime}\mathbf{B}{\mathbf{Z}}_{\mathbf{X}} $$
(7)

Estimation

The key feature of PCovR2 is that the three sub-models—reduction of the predictor block, reduction of the criterion block, and prediction of the criterion components based on the predictor components—are fitted simultaneously. This leads to the following loss function to be minimized:

$$ {\displaystyle \begin{array}{c}{L}_{PCovR2}=\alpha \frac{{\left\Vert \mathbf{X}-\mathbf{X}{\mathbf{W}}_{\mathbf{X}}{\mathbf{P}}_{\mathbf{X}}\prime \right\Vert}^2}{{\left\Vert \mathbf{X}\right\Vert}^2}+\alpha \frac{{\left\Vert \mathbf{Y}-\mathbf{Y}{\mathbf{W}}_{\mathbf{Y}}{\mathbf{P}}_{\mathbf{Y}}\prime \right\Vert}^2}{{\left\Vert \mathbf{Y}\right\Vert}^2}+\left(1-\alpha \right)\frac{{\left\Vert \mathbf{Y}{\mathbf{W}}_{\mathbf{Y}}-\mathbf{X}{\mathbf{W}}_{\mathbf{X}}\mathbf{B}\prime \right\Vert}^2}{{\left\Vert \mathbf{Y}{\mathbf{W}}_{\mathbf{Y}}\right\Vert}^2}\\ {}=\alpha \left(\frac{{\left\Vert \mathbf{X}-{\mathbf{T}}_{\mathbf{X}}{\mathbf{P}}_{\mathbf{X}}\prime \right\Vert}^2}{{\left\Vert \mathbf{X}\right\Vert}^2}+\frac{{\left\Vert \mathbf{Y}-{\mathbf{T}}_{\mathbf{Y}}{\mathbf{P}}_{\mathbf{Y}}\prime \right\Vert}^2}{{\left\Vert \mathbf{Y}\right\Vert}^2}\right)+\left(1-\alpha \right)\frac{{\left\Vert {\mathbf{T}}_{\mathbf{Y}}-{\mathbf{T}}_{\mathbf{X}}\mathbf{B}\prime \right\Vert}^2}{R_Y},\end{array}} $$
(8)

where RY refers to the number of criterion components with 0 < α ≤ 1. Again, α is a tuning parameter that allows one to put more emphasis on the prediction aspect (α close to zero) or the reduction aspect (α close to one). Note that setting α to 1 corresponds to a PCR2 approach. Moreover, setting α to 0 will lead to a method that focuses entirely on finding strongly related predictor and criterion components, but totally ignores how much variance of the predictor and criterion matrix is explained by these components. The latter is not desirable for our purposes as we want to recover the components that underlie both predictor and criterion matrices. Therefore, we do not recommend setting α to 0.

Given specific α, RX, and RY values, we minimize (8) under the constraint that TX’TX and TY’TY equal an identity matrix. To this end, we rely on an alternating lowerFootnote 1 squares algorithm since no closed-form solution is readily available. The PCovR2 algorithm alternates between two steps that we discuss in detail below: 1) conditional estimation of TX, PX, and B given fixed values for TY and PY, and 2) conditional estimation of TY and PY given fixed values for TX, PX, and B. Note that step 2 consists of running an iterative procedure. Both steps are repeated until convergence of the loss function (i.e.,Lprevious − Lcurrent < ε, where ε is a pre-specified small number—we used 10−6). Because of the alternating nature of the algorithm, we advise the use of multiple starts, to avoid ending up in a local minimum.

  1. a)

    Step 1

Given current estimates of TY and PY, we can update TX, PX, and B by optimizing

$$ {L}_{\boldsymbol{step}1}=\alpha \frac{{\left\Vert \mathbf{X}-{\mathbf{T}}_{\mathbf{X}}{\mathbf{P}}_{\mathbf{X}}\prime \right\Vert}^2}{{\left\Vert \mathbf{X}\right\Vert}^2}+\left(1-\alpha \right){\frac{\left\Vert {\mathbf{T}}_{\mathbf{Y}}-{\mathbf{T}}_{\mathbf{X}}\mathbf{B}\prime \right\Vert }{{\left\Vert {\mathbf{T}}_{\mathbf{Y}}\right\Vert}^2}}^2. $$
(9)

In this optimization problem, a standard PCovR problem can be recognized, with X as the predictor matrix and TY as the criterion matrix. Hence, this step has a closed-form solution, which boils down to setting TX equal to the first RX eigenvectors of matrix D:

$$ \mathbf{D}=\alpha \frac{\mathbf{X}\mathbf{X}^{\prime }}{{\left\Vert \mathbf{X}\right\Vert}^2}+\left(1-\alpha \right)\frac{{\mathbf{H}}_{\mathbf{X}}{\mathbf{T}}_{\mathbf{Y}}{\mathbf{T}}_{\mathbf{Y}}^{\prime }{\mathbf{H}}_{\mathbf{X}}}{{\left\Vert {\mathbf{T}}_{\mathbf{Y}}\right\Vert}^2}, $$
(10)

with HX = X(X’X)-1X’. PX and B can then easily be computed by means of regression steps (see Section “Estimation”), as PX = X’TX and B = Ty’TX.

  1. b)

    Step 2

Given current estimates of TX, PX, and B, the non-constant part of the PCovR2 loss function can be re-expressed as a function of TY and PY (and later of TY alone, when the optimal PY is substituted by Y ′ TY) as follows:

$$ {\displaystyle \begin{array}{c}{L}_{Step2}=\alpha \frac{{\left\Vert \mathbf{Y}-{\mathbf{T}}_{\mathbf{Y}}{\mathbf{P}}_{\mathbf{Y}}\prime \right\Vert}^2}{{\left\Vert \mathbf{Y}\right\Vert}^2}+\frac{\left(1-\alpha \right)}{R_Y}{\left\Vert {\mathbf{T}}_Y-{\mathbf{T}}_{\mathbf{X}}\mathbf{B}\prime \right\Vert}^2\\ {}\kern0.36em =\alpha \frac{{\left\Vert \mathbf{Y}-{\mathbf{T}}_{\mathbf{Y}}{\mathbf{P}}_{\mathbf{Y}}\prime \right\Vert}^2}{{\left\Vert \mathbf{Y}\right\Vert}^2}+\frac{\left(1-\alpha \right)}{R_Y}{\left\Vert {\mathbf{T}}_{\mathbf{Y}}-\mathbf{U}\right\Vert}^2\\ {}\kern0.36em =-\alpha \frac{tr\left[\mathbf{Y}^{\prime }{\mathbf{T}}_{\mathbf{Y}}{\mathbf{T}}_{\mathbf{Y}}^{\prime}\mathbf{Y}\right]}{{\left\Vert \mathbf{Y}\right\Vert}^2}-\frac{2\left(1-\alpha \right)}{R_Y} tr\left(\mathbf{U}\hbox{'}{\mathbf{T}}_{\mathbf{Y}}\right)+c\\ {}=-\alpha \frac{tr\left[\mathbf{Y}^{\prime}\mathbf{Y}{\mathbf{W}}_{\mathbf{Y}}\;{\mathbf{W}}_{\mathbf{Y}}^{\prime}\mathbf{Y}^{\prime}\mathbf{Y}\right]}{{\left\Vert \mathbf{Y}\right\Vert}^2}-\frac{2\left(1-\alpha \right)}{R_Y} tr\left(\mathbf{U}\hbox{'}\mathbf{Y}{\mathbf{W}}_{\mathbf{Y}}\;\right)+c\\ {}\kern0.36em =-\alpha \frac{tr\left[\mathbf{V}^{\prime}\mathbf{Y}^{\prime}\mathbf{Y}\mathbf{V}\right]}{{\left\Vert \mathbf{Y}\right\Vert}^2}-\frac{2\left(1-\alpha \right)}{R_Y} tr\left(\mathbf{U}\hbox{'}\mathbf{Y}{\left(\mathbf{Y}\hbox{'}\mathbf{Y}\right)}^{-\mathbf{1}/\mathbf{2}}\mathbf{V}\right)+c\\ {}\kern0.36em = tr\left[\mathbf{V}\hbox{'}\mathbf{CV}\right]+ tr\left[\mathbf{AV}\right]+c.\end{array}} $$
(11)

with U = TXB′, V = (Y ′ Y)1/2WY, \( \mathbf{A}=-\frac{2\left(1-\alpha \right)}{R_Y}\mathbf{U}^{\prime}\mathbf{Y}{\left(\mathbf{Y}\prime \mathbf{Y}\right)}^{-\mathbf{1}/\mathbf{2}} \), , and c being a constant (see appendix for the detailed derivation of Eq. 11). This is a function that can be minimized through iterative majorization (Kiers, 2002). In each step of this iterative majorization procedure, the parameters are updated by solving a simple problem for which a closed-form solution is available. Here, the problem boils down to updating V subject to TY’TY = WY’Y’YWY = VV = I. Specifically, for updating V (i.e.,Vupdate), the current estimates of V (i.e., Vcurrent) are used to construct the matrix G:

$$ \mathbf{G}=\mathbf{A}+2{\mathbf{V}}^{\boldsymbol{current}}\hbox{'}\mathbf{C}-2\lambda {\mathbf{V}}^{\boldsymbol{current}}\hbox{'}, $$
(12)

with λ being a value higher or equal to the highest eigenvalue of C. Next, we compute a singular value decomposition of this matrix –G as:

$$ svd\left(-\mathbf{G}\right)=\mathbf{PDQ}^{\prime }. $$
(13)

Vupdatecan then be calculated as

$$ {\mathbf{V}}^{\boldsymbol{update}}=\mathbf{QP}^{\prime }. $$
(14)

This majorization procedure within step 2 is repeated until LStep2, previous − LStep2, current < ε. Next, TY and PY are computed based on the final estimate of V. Specifically, TY = Y(Y ′ Y)−1/2Vand PY = Y ′ TY.

Model selection

Defining the problem

If model selection was already complicated for PCovR (see 2.2.3), it obviously is even more challenging for PCovR2, because one now needs to decide on RX, RY, as well as on α. The choice regarding the RX, RY, and α values will again depend on the research goal. Indeed, as indicated in the introduction, some researchers aim to retrieve all the predictor and criterion components, regardless if the predictor components are relevant or not (i.e., explain variance of the criterion variables) and regardless if the criterion components are predictable or not (i.e., capture variance that is related to the predictors). Some take a narrower regression perspective, implying that they are only interested in retrieving the relevant predictor and predictable criterion components. On top of that, they often want to interpret the structure of the predictors and criteria, and thus want the retrieved relevant predictor and predictable criterion components to be an accurate representation of the predictor and criterion scores.

To illustrate these two different scenarios further, let us consider a simulated toy example. The dataset was simulated with three predictor components and with two criterion components. The associated loading matrices and regression weight matrix are shown in Table 1. The true number of relevant predictor and predictable criterion components can be derived from the simulated regression weights matrix. In particular, we see that the first criterion component is predictable (i.e., has at least one non-negligible regression weight), and the first two predictor components are relevant (i.e., have at least one non-negligible regression weight), whereas the second criterion component is not predictable, and the third predictor component is irrelevant, as the associated regression weights are all equal to zero.

Table 1 The loadings and regression weights used to simulate a toy example, next to the PCovR2 loadings and regression weights for the solutions with RX=2, RY=1 and RX=3, RY=1 components

To demonstrate how the reconstruction of the respective X and Y blocks and the prediction of the criterion components, TY, is affected by the weighting value across different RX and RY values, we analyzed the data multiple times, where we varied RX from 1 to 4 and RY from 1 to 3, and let α increase from .001 to 1, in steps of 001. Afterwards, we computed how well X and Y were reconstructed [i.e., first part of the loss function (8)] and how well the criterion components could be predicted (i.e., second part of the loss function). We plotted the obtained values in Fig. 1, where each panel represents a different RX and RY combination. The x-axis of each plot pertains to the α-value and the y-axis to the loss function parts.

Fig. 1
figure 1

Values of the reconstruction and prediction parts of the PCovR2 loss function (shown on the Y-axis) for the toy data as a function of the value of the weighting parameter α (shown on the X-axis). The panels pertain to 1 to 4 predictor components (rows) and 1 to 3 criterion components (columns). The dashed line ‘--’, dash-dot line ‘-.’ and solid line ‘-’ denote the reconstruction error of X and Y, and the prediction error, respectively

We first turn to the goal of retrieving all predictor and criterion components, disregarding their relevance or predictability. From Fig. 1, it can be concluded that increasing the number of RX and RY clearly improves the reconstruction of the X and Y blocks, respectively, with this improvement being more pronounced for high α values (i.e., larger than .5; this is confirmed by the simulation results in Section “Recovery of the number of relevant predictor and the number of predictable criterion components as well as recovery of all the simulated components”). As could be expected, this improvement levels off once RX = 3 and RY = 2, in that going further up than or to the right of the RX = 3 and RY = 2 plot hardly changes the picture any further, so this suggests that one should not take RX and RY higher than these values. The latter could be further verified by running a scree test (Cattell, 1966; Ceulemans & Kiers, 2006; Wilderjans, Ceulemans, & Meers, 2013). Thus, Fig. 1 might be useful to decide on the total numbers of predictor and criterion components.

When it comes to the regression problem and deciding on the number of relevant predictor components and predictable criterion components only, Fig. 1 shows that using a low α value (i.e., lower than .2) yields clearly better prediction results and is therefore clearly indicated (this will again be confirmed by the simulation study). Moreover, we see a specific prediction trend for α values that are very close to 0: it makes no sense to take more predictor components than there are criterion components, as it would not further reduce the prediction loss. Note that the models with RY = 1 and with one to four predictor components, for example, all show quite similar prediction error values for low α values. Analogously, if RY = 2, taking more than two predictor components does not lead to a decreased prediction loss, and generally, if Rx is larger than RY, this does not lead to reduced prediction loss compared to what is found when Rx = RY.

To understand how picking low α values affects the reconstruction part of the model given these prediction findings, we inspect the associated loading and regression matrices from the (RX = 2, RY = 1, α = .05) analysis in Table 1. Given that the simulated data contain two relevant predictor components and one predictable criterion component, we might a priori have expected good recovery of these components. Yet, the prediction findings in the previous paragraphs already suggest that this might not be the case. Indeed, Table 1 shows that the relevant predictor components are not well recovered but are rather merged into one componentFootnote 2. This merged component has a very high regression weight (.95), while the second predictor component has a low regression weight (.02) and corresponds to the irrelevant component. Further scrutinizing the results, we see that in order to properly recover both relevant predictor components, we actually need to extract more predictor components. Indeed, the (RX = 3, RY = 1, α = .05) PCovR2 solution adequately recovers the structure of the two relevant predictor components. Note that the (RX = 3, RY = 2, α = .05) analysis again yields merged predictor components; this makes sense as Fig. 1 shows that for this (RX , RY) combination, we need a higher α value to properly reconstruct X. Overall, these findings indicate that model selection in terms of retrieving the relevant predictor and predictable criterion components is not an easy matter, and it cannot be fully solved by scrutinizing a plot like Fig. 1. To further supplement the decision process, and also to make automatic selection possible, in the next section, we propose a two-step model selection heuristic that users can apply to get a better grip on this problem for a particular data set.

A two-step model selection heuristic to retrieve all relevant and predictable components

To retrieve all relevant and predictable components, we developed a two-step approach that can be automated. To run this approach, one estimates multiple PCovR2 models with RX and RY varying from 1 up to a pre-defined number. In the first step, we decide on the number of predictable criterion components, RY. In the second step, we assess the number of relevant predictor components, RX. We fixed the α-value to .05, since Fig. 1 points towards using a small value in case one wants to focus on the relevant and predictable components. Note that using other small α values would lead to the same conclusions.

  1. Step 1:

    Finding the number of predictable criterion components

To decide on the number of predictable criterion components, we assess for each criterion component how well it is reconstructed by the predictor components. To this end, we calculate for each criterion component the sum of the corresponding squared entries of \( {\mathbf{E}}_{{\mathbf{T}}_{\mathbf{Y}}} \)(see Eq. 6), yielding a criterion component-specific prediction error. Panel a of Fig. 2 shows the obtained prediction errors for the criterion components of the toy example, revealing a lot of variation in predictability within and across solutions. At this point, we need a rule to decide when we consider a criterion component to be predictable. We propose to compare the prediction errors of each criterion component to a threshold value TRy that is derived from the data. To obtain this threshold, we make a histogram of all the prediction errors; panel b of Fig. 2 shows the histogram for the toy example. We observe a lot of variation in the component-specific prediction errors that seem to consist of five groups of values (one of which is small). The gaps in prediction errors between the five groups yield four possible thresholds corresponding to the highest values of the first four groups. To restrict oneself to the most predictable components, we suggest being strict and to set the threshold TRy at the highest value of the first group, which in this case equals .10. We see that none of the rows of the table in panel a of Fig. 2, and thus none of the 25 considered PCovR solutions, contains more than one value smaller than .10, and hence we can conclude that, according to our rule, none of the solutions contain more than one predictable criterion component. Therefore, we decide that these data contain only one predictable criterion component. Note that if we set the threshold in a less strict way, by using, for example, the highest prediction error in the second group of the histogram, we would have some rows that contain two, three, or even four predictable criterion components, which would obviously be too high for this dataset.

Fig. 2
figure 2

(a) The sum of squared prediction errors for each criterion component across 25 PCovR2 models, estimated with RX and RY varying from one up to five. cc1 up to cc5 indicates the first up to the fifth criterion component within the considered solution. Predictable criterion components are marked in bold. (b) Histogram presenting the distribution of all the sum of squared prediction errors reported in Fig. 2a. (c) The regression weights that are obtained when RY is fixed to 1 and RX varies from 1 to 5

  1. Step 2:

    Finding the number of relevant predictor components

Once we have chosen the number of predictable criterion components, we assess the number of relevant predictor components. To do so, we examine the regression weights of the different solutions with the selected number of criterion components. These solutions vary with respect to the value of RX. For instance, panel c of Fig. 2 shows these regression weights for the toy example, with the number of predictor components RX varying from one to five. To decide which regression weights indicate a relevant component, we again make use of a threshold TRx. This threshold will be computed based on the threshold TRy from the first step. In case only one predictor component is needed to explain the predictable criterion component, the corresponding regression weight would amount to\( \sqrt{1-T{}_{Ry}} \). As multiple relevant predictor components might be needed however to reconstruct the predictable criterion component, we set the threshold lower, and we somewhat arbitrarily choose to set it at \( {T}_{Rx}=\sqrt{1-T{}_{Ry}}/2 \) . Note that computing TRx in this way allows one to automate and evaluate the model selection method in a simulation study (see later). When applying the method to real data, one could again make a histogram of the regression weights and evaluate different options. For the toy example, TRx equals .47, which implies that some solutions in the regression weights table (panel c of the Fig. 2) include one relevant predictor only and some two. We propose selecting the most parsimonious solution with the highest number of relevant predictor components, as in such a solution, relevant components will probably not have been merged and the number of irrelevant predictor components remains low. So this means that for the toy data, we would select the solution with three predictor components, but would communicate that only two of these three components are relevant. The fact that an irrelevant component shows up in the solution is a consequence of the fact that our aim is not only to get good predictions, but also good reconstruction of X.

PCR2 and PLS2 fail in parsimoniously recovering the relevant predictor and predictable criterion components

Using the toy example in Table 1, the previous section demonstrated how relevant predictor and criterion components can be found in a parsimonious way with PCovR2. We now investigate how PCR2 and PLS2 handle the problem, using the same data.

PCR2

PCR2 performs the reduction and regression steps in a sequential way, starting with the reduction step. Therefore, we expect that when the used RX and RY are lower than the underlying ones and the relevant/predictable components are weaker than the irrelevant/unpredictable ones, PCR2 will extract the irrelevant/unpredictable components rather than the relevant/predictable ones. Remember that PCR2 corresponds to setting α = 1, as confirmed by Fig. 1, indeed, noting the bad prediction score for RX = 2 and RX = 1 solution. It becomes clear that we need to set RX to three and RY to two to extract the relevant predictable and predictable criterion components (see Table 2).

Table 2 The loadings and the (criterion) component-specific sum of squared prediction errors for the PCR2 solution with RX=3 and RY=2 and the PLS2 solutions with 1, 2 and 3 components

PLS2

Like PCovR2, PLS2 performs the prediction and reduction steps simultaneously. However, PLS2 restricts RX to be equal to RY and implies a one-to-one correspondence between the respective predictor and criterion components. Therefore, when the number of underlying relevant predictor components differs from the number of predictable criterion components, we expect PLS2 to miss out on some of these components or to merge them. The PLS2 loadings in Table 2, obtained using one, two, or three components, support this expectation. The solution with one component recovers the predictable criterion component, but merges the two relevant predictor components. When we extract one more component, the weakest relevant predictor component is merged with the irrelevant one. In the solution with three components, all predictor components are recovered properly, but the criterion component that corresponds to the weakest relevant component is not meaningful, as can also be derived from the (criterion) component-specific sums of squared prediction errors (see Table 2).

Simulation study

In this section, we report the results of a simulation study in which we evaluate which criterion and predictor components are recovered by the PCovR2 algorithm and how this is affected by the α value. Second, we evaluate the performance of the proposed model selection heuristic.

In line with the PCoVR simulation studies of Vervloet et al. (2013, 2016, 2018), we manipulated the relevance and the strength of the predictor component. Additionally, we also varied the predictability and strength of the criterion components. The relevance of the predictor components is defined as follows. Predictor components that are relevant are useful in modeling part of the criterion variance (i.e., they have at least one large regression weight), whereas irrelevant predictor components are not, and have regression weights of zero. The predictability of the criterion components is also determined by the regression weights. Whereas irrelevant criterion components cannot be predicted based on the predictor components and thus correspond with regression weights of zero, predictable criterion components are associated with at least one larger regression weight. The strength of a predictor or criterion component is defined by the percentage of variance that it explains in the predictor or criterion data, respectively. We evaluate both the recovery of the relevant predictor and predictable criterion components and the recovery of all the simulated predictor and criterion components. We expect that when it comes to recovery of all the components, higher α values will do better, while if we are interested in relevant predictor and predictable criterion components, lower α values are going to yield better results. Moreover, we expect the merging issue that was described in Section “Defining the problem” to pop up in some conditions. We expect that, in those conditions, recovery of the relevant predictor and predictable criterion components will benefit from applying the proposed heuristic.

Data generation

All constructed datasets contained 100 observations of 24 predictor variables and 24 criterion variables. These data were generated according to X = TXPX ′  + EXand Y = TYPY ′  + EY, while taking into account the predictive relation between TX and TY \( {\mathbf{T}}_{\mathbf{Y}}={\mathbf{T}}_{\mathbf{X}}\mathbf{B}^{\prime }+{\mathbf{E}}_{{\mathbf{T}}_{\mathbf{X}}} \). The number of criterion components RY was set to two throughout this study, while the number of predictor components RX equaled either two or three. We will now describe how all underlying matrices were simulated.

The elements of the component matrices TX and TY were randomly drawn from a standard normal distribution, taking into account the relevance and predictability of the predictor and criterion components. This was done as follows: the component score matrices TX and TY were sampled together from a multivariate normal distribution. The diagonal entries of the (RX + RY) by (RX + RY) covariance matrix of this multivariate distribution were set to one, implying that this matrix is a correlation matrix, and the off-diagonal entries equal zero if they pertain to pairs of predictor components or to pairs of criterion components. The off-diagonal entries that refer to a combination of a predictor component and a criterion component are set to the values of the regression weight matrix B, which imposes the predictive relation \( {\mathbf{T}}_{\mathbf{Y}}={\mathbf{T}}_{\mathbf{X}}\mathbf{B}^{\prime }+{\mathbf{E}}_{{\mathbf{T}}_{\mathbf{Y}}} \) without having to generate \( {\mathbf{E}}_{{\mathbf{T}}_{\mathbf{Y}}} \). Afterwards, in order to discard deviations from orthogonality and unit variance restrictions that are due to sampling, TX and TY were orthogonalized and standardized.

The loading matrices PX and PY were created using only ones and zeros, with each variable (i.e., predictor or criterion) having only one loading of 1 and with a different number of 1s per component (see Table 1 for an example). The loading matrices thus have simple structure. Loading matrices with two components had, respectively, three and 21 highly loading variables; loading matrices with three components had, respectively, three, six, and 15 highly loading variables. Hence, the first component was always weak and the last component always strong, referring to the amount of variance in the variable scores accounted for by those components; the component that consists of six variables can also be considered weak.

The regression weight matrix B (RY × RX) was constructed according to Table 3 with the size of the non-zero regression weight(s) determined according to factor 2. This manipulation implies that in some conditions, the non-zero regression weights amount to .995, indicating that the predictable criterion components can be almost perfectly predicted based on the relevant predictor components. In other conditions, the regression weights amount to .86, implying that only part of the scores on the predictable criterion components can be captured by means of the relevant predictor components. We also manipulated the location of the non-zero regression weights (factor 3), yielding four types of regression weight matrices. In types 1 and 3, the relevant predictor components are weak, and the irrelevant ones are strong. In types 2 and 4, the situation is reversed for RX = 2; for RX = 3, one weak predictor component is relevant, as well as the strong one. Types 1 and 3 differ with respect to the strength of the predictable criterion component: In type 1 the predictable criterion component is weak, while in type 3 it is strong. The same holds for type 2 and type 4.

Table 3 Four types of regression weight matrices B (factor 3), with either 2 or 3 predictor components (factor 1), and with “x” depicting a regression weight of .995 or .86 (factor 2)

The error matrices EX and EY were drawn from a standard normal distribution as well, and were also standardized and orthogonalized. The error matrices were rescaled to obtain 25% of error variance (i.e., the predictor and criterion components explain 75% of the predictor and criterion variance, respectively).

Next, X and Y were created by multiplying TX and TY with PX and PY, respectively, and by adding EX and EY. Finally, X and Y were normalized so that each predictor and criterion had a sum of squares of one.

Summarizing the above setup, the following data characteristics were manipulated in a full factorial design:

  1. 1.

    The number of predictor components: 2, 3

  2. 2.

    The size of the non-zero regression weights: .995, .86

  3. 3.

    The strength of the relevant predictor and predictable criterion components: four types of regression weight matrices for each level of factor 1 (see Table 3)

For each of the 2 × 2 × 4 = 16 cells of the factorial design, 25 datasets were constructed, yielding 400 datasets in total.

Data analysis

To investigate the recovery of the components and how it depends on α, we first analyzed each data set six times. Specifically, the RX and RY values were set to the number of relevant predictor and the predictable criterion components, on the one hand, and to all the simulated predictor and criterion components, on the other hand. Moreover, in each of both settings, we used three α values: . 1Footnote 3, . 5, and .9 Note that when we talk about the numbers of relevant predictor components and predictable criterion components, we refer to the number of components which are associated with at least one non-zero regression weight according to the simulation settings in Table 3.

Next, we applied the model selection heuristic to all datasets. Here, we let RX and RY vary from one to five, yielding 25 solutions. We set the α value to .05 (see Section “Model selection”), as we wanted to retrieve the relevant predictor and predictable criterion components regardless of their strength.

To avoid obtaining a local minimum, we always applied a multi-start procedure with 25 startsFootnote 4, and set ε = 10−6 when specifying the convergence criterion. For each solution, we varimax-rotated both the predictor loadings and the criterion loadings, and compensated for these rotations in the component scores and the regression weights (see Section “Model”). Afterwards, we applied the heuristic described in Sect. 2.5.4 to the obtained 25 solutions, to select the relevant predictor and predictable criterion components.

Results

Recovery of the number of relevant predictor and the number of predictable criterion components as well as recovery of all the simulated components

PCovR2

In order to investigate the recovery of the relevant predictor and predictable criterion components, we computed the Tucker congruence value (Korth & Tucker, 1975) between the estimated component loadings by the PCovR2 model with RX and RY fixed to the number of simulated relevant predictor and predictable criterion components (i.e., RX = 2 or 3 and RY = 2), respectively, and the true relevant predictor and predictable criterion components (i.e., RX = 2 and RY = 1). When the number of components exceeded one, we calculated the congruence per component and averaged the obtained values.

Table 4 shows the mean recovery when we extract the numbers of relevant predictor components and predictable criterion components (i.e., taking RX = 1 or 2 based on the level of factor 1, and RY = 1). Table 5 shows the recovery results when extracting all the simulated components (i.e., setting RX = 2 or 3 based on the level of factor 1 and RY = 2) across multiple α values. As expected, when it comes to recovering the numbers of all simulated predictor and criterion components (hence not just the relevant ones), the higher the α value, the better. In fact, using an α value of .9, always yields almost perfect congruence values of .99, whereas these values range between .67 and .92 for an α value of .1.

Table 4 Mean recovery of the relevant predictor components and predictable criterion components with PLS, PCR2, and PCovR2. For PCovR2, we used three different α values. In all these analyses, the number of extracted predictor and criterion components is set to the number of relevant predictor components and predictable criterion components. The “X” denotes the cases for which the number of relevant predictor components differs from the number of predictable criterion components, implying that PLS cannot be applied
Table 5 Mean recovery of all the simulated components with PLS, PCR2 and PCovR2. For PCovR2 we used three different α values. In all these analyses the number of extracted predictor components and criterion components are set to the numbers of simulated ones (see factor 1)

This is not the case when only extracting the numbers of relevant predictor and predictable criterion components, however. For example, in the first row of Table 4, the recovery of the criterion components goes down from .97 to .01 when we increase α from .1 to .9. In this simulation cell, the data sets were simulated with weak but predictable and strong but non-predictable criterion components. When we model these data sets with one criterion component, the algorithm extracts the predictable criterion component if α = .1, while it selects the strong but non-predictable criterion component with higher α values. Moreover, there are some simulation cells for which, regardless of α, the recovery is never good enough. For example, in the second row of the table, the recovery of the predictor loadings is never higher than .5. This is due to the merging issue, demonstrated in Section “Model selection”, which occurs when the number of relevant predictor components differs from the number of predictable criterion components. For such scenarios, the model selection heuristic, proposed in Section “Model selection”, might yield better results.

PLS2 and PCR2

We also examined the performance of PLS2 and PCR2, by investigating how well these methods recovered the relevant predictor and predictable criterion components on the one hand and all the simulated predictor and criterion components on the other hand. As discussed earlier in this paper, PLS2 imposes an equality restriction on the number of predictor and criterion components.Therefore, we only applied PLS2 in the conditions where the number of simulated components equalled two. Like PCovR2, PCR2 does not restrict the numbers of predictor and criterion components to be the same. Hence, PCR2 could be applied in all conditions.

The results are presented in Tables 4 and 5. Like PCovR2, PLS2 seems to be affected by the strength of the relevant predictor and predictable criterion components, when focusing on these components only. Indeed, as can be seen in Table 4, PLS2 performs worst, when the relevant predictor and predictable criterion components are both weak (i.e., with type 1 regression weights, see Table 3). Moreover, the recovery of all simulated components (Table 5) is worse than that of the relevant and predictable ones (Table 4). The latter is not surprising as PLS2 is designed to find predictor and criterion components that are related to one another. If we compare the PLS2 results to the PCovR2 ones, we see in Table 4 that with α = .1, PCovR2 performs sligtly better than PLS2. Table 5 shows that when targeting all the simulated components, PCovR2 analyses with higher α values recover the components better than PLS2. The PCR2 results in both tables are very similar to the ones of PCovR2 with α = .9. This makes sense, since PCR2 equals PCovR2 with α = 1. Overall, this comparison of PLS2, PCR2, and PCovR2 under two different scenarios (focusing on the relevant predictor and predictable criterion components only or on all the components) demonstrates the benefits of the PCovR2 weighting parameter.

Model selection

First, we checked whether the model selection heuristic yielded the correct number of relevant predictor components, which amounts to one when RX = 2 and to two when RX = 3, and the number of predictable criterion components, which always equaled one. Table 6 shows the number of datasets for which the model selection heuristic found the correct number of relevant predictor and predictable criterion components. Specifically, the heuristic yielded a correct result for 365 out of the 400 datasets. Performance is mostly affected by the number of predictor components and the size of the non-zero regression weights, respectively, as 32 and 26 of the 35 mistakes occurred when the number of predictor components equaled two and when the non-zero regression weight amounted to .86. The effect of the strength of the components is less outspoken. Further analyzing the 35 mistakes, the first step of the model selection rule yielded an incorrect result, which is a too high number of predictable criterion components, for 24 of them. This implies that for the remaining 16 cases, model selection went wrong when assessing the number of relevant predictor components. Again, all mistakes pertained to selecting too many components.

Table 6 The number of datasets for which the correct number of relevant predictor and predictable criterion components were found, as a function of the number of predictor components, the size of the non-zero regression weight (RegWght), and type of regression weights matrix

Next, we again investigated the recovery of the relevant and predictable components by computing the Tucker congruence value between the estimated component loadings and the true relevant predictor and predictable criterion ones. When too many components were extracted, we computed the congruence value for the components that resembled the true ones the most. The last two columns of Table 5 show the recovery of the predictor and criterion loadings. Per cell of the design, the number before the vertical line indicates mean recovery for the datasets for which the model selection heuristic returned the correct numbers of relevant predictor components and predictable criterion components. The number after the line corresponds to the datasets for which the heuristic returned an incorrect result. The high Tucker congruence values for the datasets, for which the model selection heuristic returned an incorrect result, show that even for these datasets, the relevant predictor and predictable criterion components were adequately recovered. The average Tucker congruence values in Table 5 are higher or equal to .95, which, according to Lorenzo-Seva & ten Berge (2006), indicates that the components compared can be considered as equal. The average Tucker congruence seems to be affected by the strength of the relevant and predictable components: In type 1, both the relevant predictor and predictable criterion components are weak, yielding worst recovery. In contrast, in type 4, the predictable criterion component and one of the relevant predictor components are strong, yielding about perfect recovery.

Conclusion

The simulation study showed that if one is interested in recovering all the predictor and criterion components, regardless if they are relevant or predictable, one best uses PCovR2 with a high α value or PCR2. In contrast, lower α values lead to a better recovery of the subset of relevant predictor and predictable criterion components. In case the number of relevant predictor components equals the number of predictable criterion components, PLS2 yielded similar results. However, when the latter does not hold, PLS2 cannot be applied, and naively applying PCovR2 with the true number of relevant predictor and predictable criterion components did not yield good enough results. The latter is due to the merging issue described in Section “Defining the problem”. For such cases, the model selection heuristic, proposed in Section “A two-step model selection heuristic to retrieve all relevant and predictable components”, was useful. Indeed, using this heuristic, the mean recovery of the relevant predictor and predictable criterion loadings outperformed that of analyses with the number of relevant predictor and predictable criterion components.

Illustrative application

In this section, we analyze survey data of 192 fathers and their child, taken from “The 500 Family Study” (Schneider & Linda, 2008). For each family, the father filled in eight questionnaires, regarding their relationships, feelings, etc., and the child seven questionnaires (see Table 7). Like Gu and Van Deun (2018), we calculated a single score for each questionnaire, by summing the scores on the questions within the questionnaire. We thus obtained an X matrix with dimensions 192 × 8 pertaining to the father data, and a Y matrix with dimensions 192 × 7 concerning the child data. This means that, in our analyses, we examine the extent to which the child questionnaires can be predicted by the father ones. After preprocessing (centering the data and setting the sum of squares per questionnaire to 1), the data were analyzed with PCovR2.

Table 7 Varimax rotated loading matrices and regression weights of three interesting PCovR2 solutions; loadings and regression weights above .40 are shown in bold

In order to decide on the number of relevant predictor and predictable criterion components, we applied the model selection rule that we described in Section “Model selection”. First, letting RX and RY vary from 1 to 5, we fitted 25 PCovR2 models. Again, we fixed the α value to .05, because the simulation study confirmed that low α values are preferable when one wants to focus on the regression problem. To decide on the number of predictable criterion components, we considered the criterion component-specific prediction errors in panel a of Fig. 3, and the associated histogram (panel b). Based on the grouping structure in the histogram, we put the threshold TRy for distinguishing predictable and unpredictable components to .79. Given that each considered solution contains no more than one predictable criterion component, we conclude that this dataset contains one predictable criterion component and set RY to one.

Fig. 3
figure 3

(a) The sum of squared prediction errors for each criterion component across 25 PCovR2 models, estimated with RX and RY varying from one up to five for the illustrative application. cc1 up to cc5 indicates the first up to the fifth criterion component within the considered solution. Predictable criterion components are marked in bold. (b) Histogram presenting the distribution of all the sum of squared prediction errors reported in Fig. 3a. c) The regression weights that are obtained when RY is fixed to 1 and RX varies from 1 to 5

Panel c of Fig. 3 shows the regression weights of the PCovR2 solutions with one criterion component and the number of predictor components varying from one to five. Based on the formula \( {T}_{Rx}=\sqrt{\left(1-{T}_{Ry}\right)}/2 \), we compute the threshold for the regression weights TRx as \( \sqrt{\left(1-.79\right)}/2=.23 \). Comparing the regression weights to this threshold, we see that each considered solution contains only one relevant predictor component. Thus, we conclude that this dataset has one relevant predictor and one predictable criterion component and retain the RX = 1 and RY = 1 solution.

However, attentive readers might note that the heuristic includes some arbitrary decisions. For example, panel b of Fig. 3 is not very clear, and if we would have used a different number of bins, we might have ended up with a less strict threshold and possibly a higher number of predictable criterion components. Therefore, we supplement Fig. 3 with Fig. 4. This figure plots how well X and Y are reconstructed and how well TY is predicted, as a function of the α value used and the number of extracted predictor and criterion components. We see that next to (RX = 1, RY = 1, α = .05), the (RX = 2, RY = 1, α = .05) as well as (RX = 2, RY = 2, α = .05) might also be interesting solutions. Regarding (RX = 2, RY = 1), the reconstruction of X clearly benefits from the added predictor component. Similarly, for (RX = 2, RY = 2), adding one more criterion component also improves the reconstruction of Y, while the prediction curve does not worsen as much. Note that increasing the number of criterion components will generally lower the average predictability of those components, because the first criterion component extracted will always be the most predictable one.

Fig. 4
figure 4

Values of the reconstruction (dashed) and prediction (solid) parts of the PCovR2 loss function (Y-axis) for the illustrative application as a function of the value of the weighting parameter α (X-axis). The panels pertain to 1 to 3 predictor components (rows) and 1 to 3 criterion components (columns). The dashed line ‘--’, dash-dot line ‘-.’ and solid line ‘-’ denote the reconstruction error of X and Y, and the prediction error, respectively

Table 6 shows these three solutions. The father’s first component is highly similar across the solutions and can be labeled as optimism about their child’s future. The second father component differs across the solutions, however, and is a strong but irrelevant component in the (RX = 2, RY = 1, α = .05) solution, and a weaker but relevant one in the (RX = 2, RY = 2, α = .05) solution, where it seems to be related to relationship and communication quality. The first child component is also very stable across the solutions and pertains to self-esteem and academic performance and confidence about their future. The regression weights show that regardless of which solution we examine, the father’s optimism about their child’s future is moderately related to the child’s self-esteem and academic performance and confidence about their future. The second criterion component in the (RX = 2, RY = 2, α = .05) solution encompasses most non-performance-related aspects. This component is moderately related to the relationship and communication quality component of the father.

Discussion

In this paper, we proposed PCovR2, an extension of PCovR, in which both the predictor variables and the criterion variables are reduced to components. Simultaneously, the criterion components are regressed on the predictor components. The PCovR2 weighting parameter can be flexibly used to focus on the reconstruction of the predictors and the criteria, or on filtering out the "relevant” predictor components and “predictable” criterion components. Model selection is challenging, however, especially in the latter case: on the one hand, we want to find predictor components that can predict the criterion components and criterion components that can be predicted by those predictor components. On the other hand, we want these components to be an accurate representation of the underlying components. To aid researchers in this decision, we proposed inspecting a matrix plot of outcomes of the different loss parts for different model choices, as well as a model selection heuristic. The matrix plot of outcomes also yields further information on the range of suitable α values. In case one is only interested in the relevant predictor and predicable criterion components, one should select α values for which the prediction part of the loss is low. In case one wants to extract all the components, the reconstruction parts should be observed. Inspection of such plots shows that the range of α values that are low enough in the first case or high enough in the second case might vary from dataset to dataset. Yet, as a rule of thumb, .05 and .9 will often work well as low and high α values.

We performed a simulation study to investigate the performance of PCovR2 and the proposed model selection heuristic for filtering out the relevant predictor components and the predictable criterion components. The results indeed show that high α values yield good performance, when one wants to extract all predictor and criterion components, whereas low α values have to be preferred when one aims to find the relevant predictor and predictable criterion components. Applying the model selection heuristic further improved the results in the conditions where the number of relevant predictor components exceeded the number of predictable criterion components. It should be noted that the simulated conditions were not easy, as in some conditions, the relevant predictor and predictable criterion components were very weak. Note that even though the latter conditions might be considered extreme, we included them to evaluate and showcase the benefits of PCovR2.

Comparing PCovR2 with competitor methods, we have shown that when focusing on the subset of relevant predictor components and predictable criterion components, PLS2 also predicts the criterion variables well, but does not adequately recover the underlying predictor components, if more than one predictor component is needed to reconstruct a criterion component. In case the number of relevant predictor components equals the number of predictable criterion components, PLS2 and PCovR2 with a low α value had a very similar and mostly good recovery performance, however. Performance of PCR2 is worse in these analyses because it focuses on the strength of components rather than on their relevance/predictability. Yet, when we focused on retrieving all simulated components, PCR2 and PCovR2 with a high α value yield equally excellent results, whereas PLS2 did worse.

PCovR2 has some disadvantages, of which an important one pertains to model selection. Using the matrix plot involves many computations, and it may not be all that easy to compare the subplots, and find a useful compromise across the loss parts. The proposed model selection heuristic might still seem quite complicated, as it involves two steps to find the correct number of relevant predictor and predictable criterion components, and it is based on rather arbitrary thresholds. Another disadvantage is that one cannot simply fix RX and RY to the numbers of relevant predictor and predictable criterion components. Indeed, as we showed for our toy data set, one often needs to extract additional components to model the relevant and predictable components correctly.

Future research might further investigate model selection. In pilot simulations, we tried to tune the α value through cross-validation, for instance. The tricky part is that it is not so obvious which criterion should be optimized and evaluated in the cross-validation. In the pilot simulations, we optimized the predictability of the criterion scores based on the predictor scores. Yet, this criterion does not explicitly take the recovery of the predictor and criterion structure into account. An interesting alternative might be to use stability selection (Meinshausen & Bühlmann, 2010), although, like cross-validation, this procedure is computationally intensive.

PCovR2 holds promise for dyadic studies, as illustrated in the previous section. This research area has seen a significant increase in the last decade and many researchers are involved in the development of dedicated methods (e.g., Bodner et al., 2018; Butler, 2011; Gates & Liu, 2016). Often the major questions steering the analysis of such data focus on the interrelations between both dyadic partners, which is precisely what PCovR2 aims to unravel. Since many dyadic studies repeatedly measure the same dyad across time, it would be interesting to study how PCovR2 can be extended to take serial dependencies between observations into account. The recent surge of psychometric research on vector autoregressive modeling can be inspiring in this regard (e.g., Bulteel, Tuerlinckx, Brose, & Ceulemans, 2018; Epskamp et al., 2018; Hamaker et al., 2018). Moreover, so far, PCovR2 is an asymmetric method, implying that the analysis focuses on the effect that one partner has on the other and not vice versa. Further developments are needed that extend the regression weights and the PCovR2 loss function and algorithm, so as to take bi-directional influences into account.

To conclude, we presented PCovR2, a new method for multivariate regression problems with a relatively high number of predictors and criteria, and compared it to PLS2 and PCR2. The proposed PCovR2 algorithm yields comparable results to PCR2 when trying to retrieve all components in the data and comparable results to PLS2 when focusing on finding back the relevant predictor components and predictable criterion components and in case the numbers of the latter components are equal. Yet, unlike PLS2, PCovR2 can also handle settings in which these numbers are not the same, although model selection is challenging.