Abstract

The methods of two-parameter ridge and ordinary ridge regression are very sensitive to the presence of the joint problem of multicollinearity and outliers in the y-direction. To overcome this problem, modified robust ridge M-estimators are proposed. The new estimators are then compared with the existing ones by means of extensive Monte Carlo simulations. According to mean squared error (MSE) criterion, the new estimators outperform the least square estimator, ridge regression estimator, and two-parameter ridge estimator in many considered scenarios. Two numerical examples are also presented to illustrate the simulation results.

1. Introduction

The matrix form of the multiple linear regression model iswhere is the vector of the response variable, is the matrix of predictor variables, is the vector of unknown regression coefficients, and is the vector of disturbance term, such that . The ordinary least square (OLS) estimates of is defined as:

The estimator is unbiased and has minimum variance among all the linear unbiased estimators. However, the performance of this estimator is poor in the presence of multicollinearity, such that it is statistically insignificant with large variance [1]. To cope with this issue, several alternatives have been developed. The first method is proposed by Ref. [2] and is defined aswhere I is the identity matrix, and . To handle the problem of outliers, Ref.[3] derived a new estimator known as M-estimator (ME). M-estimator is defined as the solution of the equations and with being scale estimator for errors and being a suitably chosen function.

Ref. [4] illustrated that ridge regression (RR) is sensitive to outliers in the y-direction, hence developed a new robust ridge M-estimator (MRE) defined aswhere is M-estimator.

According to Ref. [5], the quality of fit for RR is not good as compared to OLS. To overcome this deficiency, they developed a two-parameter ridge estimator (TPR) that always performs better than the ordinary RR. Also, TPR has good orthogonal properties between the residuals and predicted values of dependent variables. They defined TPR aswhere

Later on, many researchers worked on TPR, see e.g., [613]. The selection of ridge M-estimator plays an important role to reduce the MSE of TPR in the presence of multicollinearity and outliers. Different ridge M-estimators have been proposed by various researchers. Some of them are Refs. [4, 8, 1417]; and recently Ref. [18]. In case of near singularity and large number of outliers, the existing estimators do not perform well in terms of MSE. Therefore, the aim of this article was to continue the series of work on the selection of ridge M-estimator in TPR. Motivated by the work of Ref. [8] and following the idea of Ref. [1], we proposed the modified ridge M-estimators in TPR. The developed M-estimators provide the minimum MSE than OLS, RR, and existing TPR estimators for different levels of correlation, sample size, error variance, and outliers.

The organization of this article is as follows: Section 2 gave the review of estimators included in this study, new developed estimators for the selection of k and their comparison criterion. Section 3 included the simulation design that we have adopted in this article together with the discussion of simulation results and numerical examples. Concluding remarks are given in section 4.

2. Methodology

The canonical form of the model given in equation (1) can be written aswhere , and , where T is the orthogonal matrix with the columns constituting the eigenvectors of and Ip is the identity matrix and , where and are the ordered eigenvalues of . The estimators in canonical form arewhere and .

2.1. Existing Estimators

(i) [2].(ii)RME [4].where [3].(iii) and [8].(iv)TRME2  iterative method defined in the following Algorithm 1: [8].

(i)Calculate .
(ii)Estimate using in (i).
(iii)Obtain using in (ii).
(iv)M1 [14].

In general, ridge M-estimators available in the literature may not fully address the simultaneous occurrence of high multicollinearity and outliers in data. To resolve this issue, we propose some new ridge M-estimators in TPR that perform generally better than other existing estimators in most of the considered situations.

2.2. Performance Criterion

To examine the performance of our developed estimators with the existing estimators, we used the MSE criterion defined aswhere

The MSE of the above defined estimators iswhere is error variance and where are the diagonal elements of Ref. [8] proved that if , then for and .

2.3. New Estimators

According to Ref. [8], the TPR is also sensitive to outliers in the y-direction as RR is. Thus, here we suggest modified ridge M-estimators (MTPM) in TPR. In a similar manner to TPR, the primary focus in MTPM is to find the suitable value of biasing parameter, which minimizes the MSE. By adopting the idea of Ref. [1], we multiply a quantity with as suggested by Ref. [8]. Hence, the modified biasing parameter iswhere and defined in TRME1.

As is based on correlation, an increase in the degree of correlation causes an increase in the value of . This increase in will lead to the larger value of . Since many existing estimators did not provide a large enough value of , this increase is required to obtain the suitable value of to solve the problem of near singularity. The term is used to deal with the outliers. Here, we have used Huber’s M-estimator.

We proposed three new methods by taking arithmetic mean (AM), geometric mean (GM), and harmonic mean (HM) of , denoted by MTPM1, MTPM2, and MTPM3, respectively, and defined as

Hence, the new modified two parameter ridge M-estimator is defined in the canonical form aswhereand .

Furthermore, through Algorithm 1, we proposed the modified iterative two-parameter ridge estimators. The new modified iterative TPR is defined aswhere is from algorithm of TRME2. Now by taking the AM, GM, and HM of three new estimators denoted by MTPM4, MTPM5, and MTPM6 are obtained and defined as

The new modified iterative two parameter ridge M-estimator is defined in the canonical form aswhereand .

3. Simulation Study

In this section, a simulation study is taken to check the performance of new and existing estimators.

3.1. Simulation Design

By following the simulation design of Refs. [8, 15], predictors are generated aswhere shows the correlation between two predictor variables and are pseudo random numbers generated using standard normal distribution. The response variable is generated aswhere is set to be zero and . This simulation experiment is carried out by randomly generating different factors that we consider in this study. The details are given below:

To check the robustness of the newly proposed estimators against outliers, different percentages of outliers (10%, 20%, and 30%) in the y-direction are generated using an error term , see Refs. [19, 20]. These simulation results based on 5000 replications and estimated MSE is calculated as

3.2. Performance of New Proposed Estimators

In view of the results from Tables 118 , we can get some conclusions:(i)The estimated MSE of all considered estimators increases, as increases. In general, MTPM1 performs well as compared to other estimators.(ii)For all sample sizes, MSE of all estimators decreases with increasing sample size from 20 to 100. For n = 20, MTPM1 performs better, but as n increases MTPM4 also performs better than the existing estimators.(iii)Estimated MSE of all estimators increases in accordance with the increase in the degree of correlation. Newly developed estimators MTPM1 and MTPM6 perform better in terms of smaller MSEs. There are few cases where M1 and TRME2 have better performance than the rest of the estimators.(iv)The estimated MSE of all estimators increases with regard to increase in the number of predictors (p).(v)As the percentage of outliers increases in the data, the estimated MSE of newly developed estimators decreases. MTPM1 outperforms the other estimators.(vi)When there are multicollinearity and outliers in data, MTPM1 performs better than the other considered estimators. There are some cases in which M1 and TRME2 are better alternatives.(vii)From these simulation results, we can conclude that in the presence of multicollinearity and outliers, MTPM1 and MPTM4 are best alternatives of the existing estimators.

3.3. Real-Life Applications

Example 1. We consider the Tobacco data of Ref. [21] to show the performance of newly modified estimators. The data contain four predictor variables with 30 observations. Condition number is 1892.33 which shows severe multicollinearity. Considering the following linear model: The eigenvalues are , , , and . The calculated value of error variance is 0.223. The correlation among the predictor variables is shown in Table 19. The data contain two outliers in the y-direction. Estimated MSE and regression coefficients for tobacco data are presented in Table 20. From the result, it is noticed that MTPM3 has the smallest MSE among all the considered estimators.

Example 2. The second example is of water quality data taken from the Pakistan Council of Research in Water Resources (PCRWR) for the year 2014–2015. We consider four predictors each with 31 observations. Predictor variables are HCO3, SO4, Na, and EC, while response variable is TDS. The estimated error variance is 0.111 and eigenvalues are , , , and . Condition number is 157.257, which shows strong multicollinearity. Table 21 shows the correlation among the predictors. The outliers are present in the y-direction. The estimated MSE and regression coefficients are shown in Table 22. The results indicated that MTPM3 is a good choice among the other estimators.

4. Concluding Remarks

In this article, modified robust ridge M-estimators for two parameter ridge regression model are proposed to overcome the joint problem of multicollinearity and outliers in the y-direction. We proposed six new estimators as an alternate to TRME. A simulation study is conducted to investigate the performance of new estimators on the basis of MSE. The simulation results indicated that the performance of new modified robust ridge M-estimators is better than the other considered estimators. It is also noticed that proposed estimators MTPM1 and MTPM4 and in some cases MTPM3 performed well in the presence of multicollinearity and outliers. The benefits of the new estimators are also shown through the two different numerical examples. Therefore, on the basis of these results, we recommend the use of proposed estimators in the considered scenarios.

Data Availability

Data used in this research were taken from the website available at [21]. All the results reported in this research are carried out on R-environment, a user-friendly statistical analysis tool. Furthermore, research code will be available on request from the corresponding author upon acceptance of this research.

Ethical Approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Conflicts of Interest

The authors declare that they have no conflicts of interest or personal relationships that could have appeared to influence the research work presented in this article.

Authors’ Contributions

S. Yasin conceptualized the study and wrote of the manuscript. S. Sultan and S. Kamal did the critical review. M. Suhail developed the methodology, reviewed and edited the article. Y. A. Khan provided the software and performed validation and formal analysis. S. Sultan and Y. A. Khan wrote the original draft. H. Ayed, S. Sultan, and M. Suhail reviewed and edited the article and performed visualization.

Acknowledgments

The authors extend their appreciation to the Deanship of Scientific Research at King Khalid University for funding this work through research groups under grant number G.R.P./185/42.