Diagnosing and correcting violations of normality and constant variance assumptions in multiple linear regression analysis
-
Received date: February 19, 2025
Accepted date: March 6, 2025
Published date: March 20, 2025
https://doi.org/10.14419/8bnsc148
-
Linear Regression Analysis; Multiple Linear Regression Model; Residuals; Non-Normality Assumption; Nonconstant Variance Assumption; Correcting. -
Abstract
This study showcased the significance of correcting for Non-normality and Nonconstant variance of residuals in linear regression modelling. The concept was demonstrated by using two different hypothetical, Data M (the Initial Dataset) and Data N (the Initial Dataset). The diagnosis of non-normality and nonconstant variance was performed using the Anderson-Darling test (or D’Agostino Omnibus test) and White test, respectively, revealing their presence in the models for the initial datasets, while the assumptions of no multicollinearity and no autocorrelation were met. The model established for Data M (the Initial Dataset) was statistically significant with an R-square value of 0.538, an AIC value of 1071.424, an SBC value of 1083.787, and an RMSE value of 9774.849. Similarly, the model established for Data N (the Initial Dataset) was statistically significant with an R-square value of 0.865, an AIC value of 768.443, an SBC value of 776.427, an RMSE value of 584.946. Data M (the Initial Dataset) and Data N (the Initial Dataset) were transformed using the Semi-Logarithm transformation method, generating new sets of data, Data M (Non-normality and Nonconstant Variance Corrected) and Data N (Non-normality and Nonconstant Variance Corrected). After the correction was made, the datasets complied with all the linear assumptions necessary for regression analysis. The multiple linear regression model estimated for Data M (Non-normality and Nonconstant Variance Corrected) was found to be statistically significant, achieving an R-square value of 0.748, an AIC value of -81.061, an SBC value of -61.699, and an RMSE value of 0.473; and the model established for Data N (Non-normality and Nonconstant Variance Corrected) was statistically significant with an R-square value of 0.871, an AIC value of -145.907, an SBC value of -137.529, an RMSE value of 0.287. Based on the R-squares, AIC, SBC, and RMSE values for both initial and transformed models, it was concluded that the estimated regression model for Data M (Non-normality and Nonconstant Variance Corrected) and Data N (Non-normality and Nonconstant Variance Corrected) demonstrated superior model performance when compared to the regression models for Data M (the Initial Dataset) and Data N (the Initial Dataset).
-
References
- Barker, L. E. and Shaw, K. M. (2015). Best (but oft-forgotten) Practices: Checking Assumptions Concerning Regression Residuals. Am J Clin Nutr 102:533–9. https://doi.org/10.3945/ajcn.115.113498.
- Bartlett, M. S. (1947). The use of transformation. Biometric Bulletin, 3, 39-52. https://doi.org/10.2307/3001536.
- D’Agostino, Ralph B., Albert Belanger, Ralph B., and D’Agostino, Jr. (1990). A Suggestion for Using Powerful and Informative Tests of Normality. The American Statistician, 44(4): 316–321. https://doi.org/10.1080/00031305.1990.10475751.
- Das, K. R. and Imon, A. H. M. R. A Brief Review of Tests for Normality. American Journal of Theoretical and Applied Statistics. 5(1): 5-12. https://doi.org/10.11648/j.ajtas.20160501.12.
- Flachaire, E. (2005). Bootstrapping Heteroskedastic Regression Models: Wild Bootstrap vs Pairs Bootstrap. Computational Statistics and Data Analysis, 49 (2): 361-376. https://doi.org/10.1016/j.csda.2004.05.018.
- Gujarati, D. (2004). Basic econometrics (4th ed.). Mcgraw-Hill, New York, U.S.A.
- Hawkins, D. L. (1989). Using U Statistics to Derive the Asymptotic Distribution of Fisher’s Z Statistic. American Statistician, 43, 235-237. https://doi.org/10.1080/00031305.1989.10475666.
- Hogg, R. V. (1979). An Introduction to Robust Estimation, in Robustness in Statistics, Edited by Launer, R. L., Wilkinson, G. N., New York: Academic Press; 1–17. https://doi.org/10.1016/B978-0-12-438150-6.50007-8.
- Jude, O. and Isobeye, G. (2021). Effect of Non-Normal Error Distribution on Simple Linear/Non-Parametric Regression Models. Inter-national Journal of Statistics and Applied Mathematics, 6(4): 131-136.
- Judge, G. G., Griffith, W. E., Hill, R. C., Lutkepohl, H., and Lee, T. (1985). Theory and Practice of Econometrics. (2nd ed.). John Wiley and Sons, New York, USA.
- Kim, T. K. and Park, J. H. (2009). More about the basic assumptions of t-test: normality and sample size. The Korean Society of Anes-thesiologists, 72(4): 331-335. https://doi.org/10.4097/kja.d.18.00292.
- Koenker, R. W. (1982). Robust Methods in Econometrics. Econometric Reviews 1: 213-290. https://doi.org/10.1080/07311768208800017.
- Koutsoyiannis, A. (1977). Theory of econometrics (7th ed.). Macmillian, London, United Kingdom. https://doi.org/10.1007/978-1-349-09546-9.
- Kutner, M. H., Nacthtsheim, C. J., Neter, J. and Li Williams (2005). Applied linear statistical models (5th ed.). Mcgraw-Hill/Irwin, New York, U.S.A.
- Neave, H. R. (1978). Statistics Tables for Mathematics, Engineers, Economics and the Behavioural and Management Sciences. George Allen and Unwin, London, United Kingdom.
- Nwankwo, S. C. (2011). Econometrics: a practical approach. El’demak, Enugu, Nigeria.
- Osemeke, R. F., Igabari, J. N. and Nwabenu, D. C. (2024). Detection and Correction of Violations of Linear Model Assumptions by Means of Residuals. Journal of Science Innovation & Technology Research 3 (9): 1-15
- Ohaegbulem, E. U. and Iheaka, V. C. (2024). On Remedying the Presence of Heteroscedasticty in a Multiple Linear Regression Model-ling. African Journal of Mathematics and Statistics Studies, 7(2): 225-261.
- Osaro, A. D. (2023). Application of Transformation of Variables in Remedying Heteroscedasticity in Nigeria GDP, Conditioning and Some Fiscal Variables. NIPES Journal of Science and Technology Research 5(1): 84-91.
- Pedace, R. (2013). Econometrics for Dummies. John Wiley & Sons, New Jersey, Canada.
- Thinh, R., Samarta, K. and Jansakula, N. (2020). Linear Regression Models for Heteroscedastic and Non-Normal Data. ScienceAsia 46: 353-360 https://doi.org/10.2306/scienceasia1513-1874.2020.047.
- White, H. (1980). A Heteroskedastic Consistent Covariance Matrix Estimator and a Direct Test of Heteroskedasticity. Open Access Li-brary Journal, 4(8):817-818. https://doi.org/10.2307/1912934.
-
Downloads
-
Received date: February 19, 2025
Accepted date: March 6, 2025
Published date: March 20, 2025