New robust-ridge estimators for partially linear model

This paper considers the partially linear model when the explanatory variables are highly correlated as well as the dataset contains outliers. We propose new robust biased estimators for this model under these conditions. The proposed estimators combine least trimmed squares and ridge estimations, based on the spline partial residuals technique. The performance of the proposed estimators and the Speckman-spline estimator has been examined by a Monte Carlo simulation study. The results indicated that the proposed estimators are more efficient and reliable than the Speckman-spline estimator.


Introduction
The Partially linear model (PLM) is one of the most commonly used semi-parametric regression models, which it allows both parametric and nonparametric specifications in the regression function. This model has gained great popularity since it was first introduced by Engle et al (1986) and has been widely applied in economics, social, and biological sciences. A PLM is defined by: Where denotes the response variable, is the random error term, and , are and of regressors, respectively. The finite dimensional parameter is the parametric part of the model, and the unknown function ( ) is the non-parametric part of it. The classical assumptions of this model are: Speckman (1988) proposed a general estimation method for PLM based on the partial residuals technique (PRT), and he considered the kernel approach to estimate the nonparametric component. There are other estimators have been proposed for this model, such as Chen and Shiau (1991), Ahn and Powell (1993), Hamilton and Truong (1997), Yatchew (1997Yatchew ( , 2000Yatchew ( , 2003, Fadili and Bullmore (2005), Aydın (2014), Abonazel and Gad (2018), and El-Sayed et al (2019). Härdle et al (2000) and Abonazel (2018a) reviewed some of these estimators.
Recently, Abonazel et al (2019) modified the Speckman estimator by using the spline smoothing approach, and they showed that the PRT based on spline smoothing approach is more efficient than traditional PRT based on the kernel smoothing approach.
In this paper, we propose efficient estimators of the PLM if the dataset combining the problem of multicollinearity and outlier values. The proposed estimators are the modified version of Abonazel et al (2019) estimator based on a method that combines least trimmed squares (LTS) and ridge estimations. The similar estimators have been provided by Amini and Roozbeh (2016) and Roozbeh (2016) based on the kernel smoothing approach. The rest of the paper is organized as follows. In the next section, we introduce the Speckman-Spline estimator that was proposed by Abonazel et al (2019). Our proposed estimators are presented in Section 3. While in Section 4, the Monte Carlo simulation study is conducted to compare the performance of the different estimators. Concluding remarks are included in Section 5.

Speckman-spline estimator
The PLM in (1) can be written in matrix form as: Taking the conditional expectation with respect to Z and differencing the two equations leads to (Härdle et al, 2004): Where ̃ ( | ), ̃ ( | ), and ̃ ( | ). Using the modified regression in (2), the vector of the parametric parameters ( ) can be estimated separately. The modified variables ̃ and ̃ are calculated using the fact that the conditional tion ( | ) can be estimated through a non-parametric regression on the explanatory variable . The Nadaraya-Watson kernel approach has been used by Speckman (1988) to estimate the nonparametric part in PLM. However, in Abonazel et al (2019) estimator, the nonparametric part has been estimated by the spline smoothing approach. Abonazel et al (2019) showed that their estimation is more efficient than the traditional Speckman estimation. We will call Abonazel et al (2019) estimator as Speckman-spline (SS) estimator, and we can summarize SS estimator in the following algorithm: Step 1: Making a smoother spline matrix , depending on smoothing parameter and the knot points.
Step 3: Estimating the parametric component: Step 4: Estimating the non-parametric component: , where ̂ is a natural cubic spline with knots at for a fixed , and is a well-known positive-definite matrix. To gain better perspective on smoothing spline, the estimation of the parameters of PLM can be performed by minimizing the following sum of squares equation: Where ̂ is the estimated parametric component given from step 3 in the algorithm above. To solve Equation (3), an iterative algorithm is required.

The proposed estimator
In this paper, we consider the PLM in the case of the columns of the design matrix have a near-linear dependence, so that is nearly singular. In this case, the OLS estimate becomes highly sensitive to random errors in the observed response variable with large variances. This case called in econometric literature as multicollinearity problem. There are many methods to handle this problem in regression models such as ridge, Liu, principal components, etc. Besides multicollinearity problem, we consider the dataset contain outlier values. The outliers are another common problem in the regression analysis. There are many robust regression methods that are used to overcome the effects of outliers. In this paper, we propose a new robust biased estimator of PLM based on LTS method plus ridge regression. We use LTS-ridge estimator in stage 3 (in the above algorithm) instead of OLS estimator, the formula of the proposed (LTS-ridge) estimator is (Kan et al, 2013): Where is the robust choice of the k parameter in ridge regression, and the total MSE (TMSE) of ̂ is Where values are the eigenvalues of ̃ ̃ matrix. This estimator is resistant to the combined problem of multicollinearity and outliers.
Here, and are replaced by their LTS estimates ̂ and ̂ to obtain the minimum TMSE estimate, see Arslan and Billor (1996). In this paper, we suggest using the three ridge parameters in ̂ : 1) Updating the formula of the ridge parameter that is proposed by Kibria (2003): 2) We suggest the following new ridge parameters: Where ̂ ( ̃ ̃ ) ̃ ̃; with eigenvectors of ̃ ̃ matrix. Using the three ridge parameters above in ̂ , we get three robust-ridge (RR) estimators: RR1, RR2, and RR3, respectively.

The simulation study
In this section, we investigate the performance of the presented estimators above through a Monte Carlo simulation study. In fact, we make a comparison study between the SS estimator and the proposed robust-ridge (RR1, RR2, and RR3) estimators. R software is used to perform our simulation study. For information about how to make Monte Carlo simulation studies using R, see Abonazel (2018b). The simulated dataset is carried out based on Equation (1)  The MSE of ̂ and ̂ are calculated as: Where ̂( ) and ̂ are the estimated values of ( ) and , respectively. To simplify the tables of the simulation results, we presented the total AMSE (TAMSE): TAMSE = AMSE of parametric part + AMSE of nonparametric part.
The simulation results are recorded in Tables 1-8. These tables present the TAMSE of the estimators in different factors ( and nonparametric function). Specifically, Tables 1-4 present the TAMSE values of the estimators when (with and ), while case of (with the same cases of and the same nonparametric functions) is presented in Tables 5-8.  From Tables 1-8, we can summarize some effects for all estimators in the following points:  As increases, the TAMSE values increase.  As increases, the TAMSE values decrease.  As increases, the TAMSE values increase.  As increases, the TAMSE values increase. In general, in all simulation situations, we can conclude that the TAMSE values of all RR estimators are smaller than the TAMSE values of SS estimator, and the efficient RR estimators are RR1 and RR3.        Graphically, we will illustrate the degree of goodness of fit of the four estimators for the several nonparametric functions ( ( ) ) via the simulated datasets with different factors ( and ). Figures 1-4 show the fitted curves of the estimators based on the four nonparametric functions, respectively. From Figure 1, we find that the fitted curves based on RR estimators are closer to the true curve than SS estimator; although the model contains many outliers and the explanatory variables are correlated. The same results can be concluded from figures 2-4. This means that RR estimators perform better regardless of the form of the nonparametric function and are not sensitive to outliers in the model.

Conclusion
In this paper, we developed new LTS-ridge estimators for PLM when there are high inter-correlations between the explanatory variables as well as the dataset contains outliers. Moreover, new biasing parameters are suggested. A Monte Carlo simulation study was conducted to evaluate the performance of SS estimator (proposed by Abonazel et al, 2019) and our LTS-ridge estimators. The simulation results indicate that our LTS-ridge estimators are efficient than SS estimator in all situations.