Diagnosing and correcting violations of normality and constant variance assumptions in multiple ‎linear regression analysis

  • Authors

    • Victor Chijindu Iheaka Imo state University, Owerri

    Received date: February 19, 2025

    Accepted date: March 6, 2025

    Published date: March 20, 2025

    https://doi.org/10.14419/8bnsc148
  • Linear Regression Analysis; Multiple Linear Regression Model; Residuals; Non-Normality ‎Assumption; Nonconstant Variance Assumption; Correcting.
  • Abstract

    This study showcased the significance of correcting for Non-normality and Nonconstant variance of residuals in ‎linear regression modelling. The concept was demonstrated by using two different hypothetical, Data M (the Initial Dataset) ‎and Data N (the Initial Dataset). The diagnosis of non-normality and nonconstant variance was performed using the ‎Anderson-Darling test (or D’Agostino Omnibus test) and White test, respectively, revealing their presence in the models for ‎the initial datasets, while the assumptions of no multicollinearity and no autocorrelation were met. The model established for ‎Data M (the Initial Dataset) was statistically significant with an R-square value of 0.538, an AIC value of 1071.424, an SBC ‎value of 1083.787, and an RMSE value of 9774.849. Similarly, the model established for Data N (the Initial Dataset) was ‎statistically significant with an R-square value of 0.865, an AIC value of 768.443, an SBC value of 776.427, an RMSE ‎value of 584.946. Data M (the Initial Dataset) and Data N (the Initial Dataset) were transformed using the Semi-Logarithm ‎transformation method, generating new sets of data, Data M (Non-normality and Nonconstant Variance Corrected) and Data ‎N (Non-normality and Nonconstant Variance Corrected). After the correction was made, the datasets complied with all the ‎linear assumptions necessary for regression analysis. The multiple linear regression model estimated for Data M (Non-‎normality and Nonconstant Variance Corrected) was found to be statistically significant, achieving an R-square value of ‎‎0.748, an AIC value of -81.061, an SBC value of -61.699, and an RMSE value of 0.473; and the model established for Data ‎N (Non-normality and Nonconstant Variance Corrected) was statistically significant with an R-square value of 0.871, an ‎AIC value of -145.907, an SBC value of -137.529, an RMSE value of 0.287. Based on the R-squares, AIC, SBC, and ‎RMSE values for both initial and transformed models, it was concluded that the estimated regression model for Data M ‎‎(Non-normality and Nonconstant Variance Corrected) and Data N (Non-normality and Nonconstant Variance Corrected) ‎demonstrated superior model performance when compared to the regression models for Data M (the Initial Dataset) and ‎Data N (the Initial Dataset)‎.

  • References

    1. Barker, L. E. and Shaw, K. M. (2015). Best (but oft-forgotten) Practices: Checking Assumptions Concerning Regression ‎Residuals. Am J Clin Nutr 102:533–9.‎ https://doi.org/10.3945/ajcn.115.113498.
    2. ‎Bartlett, M. S. (1947). The use of transformation. Biometric Bulletin, 3, 39-52.‎ https://doi.org/10.2307/3001536.
    3. ‎D’Agostino, Ralph B., Albert Belanger, Ralph B., and D’Agostino, Jr. (1990). A Suggestion for Using Powerful and ‎Informative Tests of Normality. The American Statistician, 44(4): 316–321. ‎ https://doi.org/10.1080/00031305.1990.10475751.
    4. ‎Das, K. R. and Imon, A. H. M. R. A Brief Review of Tests for Normality. American Journal of Theoretical and Applied ‎Statistics. 5(1): 5-12.‎ https://doi.org/10.11648/j.ajtas.20160501.12.
    5. ‎Flachaire, E. (2005). Bootstrapping Heteroskedastic Regression Models: Wild Bootstrap vs Pairs Bootstrap. ‎Computational Statistics and Data Analysis, 49 (2): 361-376.‎ https://doi.org/10.1016/j.csda.2004.05.018.
    6. ‎Gujarati, D. (2004). Basic econometrics (4th ed.). Mcgraw-Hill, New York, U.S.A.‎
    7. ‎Hawkins, D. L. (1989). Using U Statistics to Derive the Asymptotic Distribution of Fisher’s Z Statistic. American ‎Statistician, 43, 235-237. ‎ https://doi.org/10.1080/00031305.1989.10475666.
    8. ‎Hogg, R. V. (1979). An Introduction to Robust Estimation, in Robustness in Statistics, Edited by Launer, R. L., ‎Wilkinson, G. N., New York: Academic Press; 1–17.‎ https://doi.org/10.1016/B978-0-12-438150-6.50007-8.
    9. ‎Jude, O. and Isobeye, G. (2021). Effect of Non-Normal Error Distribution on Simple Linear/Non-Parametric ‎Regression Models. Inter-national Journal of Statistics and Applied Mathematics, 6(4): 131-136.‎
    10. ‎Judge, G. G., Griffith, W. E., Hill, R. C., Lutkepohl, H., and Lee, T. (1985). Theory and Practice of Econometrics. ‎‎(2nd ed.). John Wiley and Sons, New York, USA. ‎
    11. ‎Kim, T. K. and Park, J. H. (2009). More about the basic assumptions of t-test: normality and sample size. The Korean ‎Society of Anes-thesiologists, 72(4): 331-335.‎ https://doi.org/10.4097/kja.d.18.00292.
    12. ‎Koenker, R. W. (1982). Robust Methods in Econometrics. Econometric Reviews 1: 213-290.‎ https://doi.org/10.1080/07311768208800017.
    13. ‎Koutsoyiannis, A. (1977). Theory of econometrics (7th ed.). Macmillian, London, United Kingdom.‎ https://doi.org/10.1007/978-1-349-09546-9.
    14. ‎Kutner, M. H., Nacthtsheim, C. J., Neter, J. and Li Williams (2005). Applied linear statistical models (5th ed.). ‎Mcgraw-Hill/Irwin, New York, U.S.A.‎
    15. ‎Neave, H. R. (1978). Statistics Tables for Mathematics, Engineers, Economics and the Behavioural and Management ‎Sciences. George Allen and Unwin, London, United Kingdom.‎
    16. ‎Nwankwo, S. C. (2011). Econometrics: a practical approach. El’demak, Enugu, Nigeria.‎
    17. ‎Osemeke, R. F., Igabari, J. N. and Nwabenu, D. C. (2024). Detection and Correction of Violations of Linear Model ‎Assumptions by Means of Residuals. Journal of Science Innovation & Technology Research 3 (9): 1-15‎
    18. ‎Ohaegbulem, E. U. and Iheaka, V. C. (2024). On Remedying the Presence of Heteroscedasticty in a Multiple Linear ‎Regression Model-ling. African Journal of Mathematics and Statistics Studies, 7(2): 225-261.‎
    19. ‎Osaro, A. D. (2023). Application of Transformation of Variables in Remedying Heteroscedasticity in Nigeria GDP, ‎Conditioning and Some Fiscal Variables. NIPES Journal of Science and Technology Research 5(1): 84-91.‎
    20. ‎Pedace, R. (2013). Econometrics for Dummies. John Wiley & Sons, New Jersey, Canada.‎
    21. ‎Thinh, R., Samarta, K. and Jansakula, N. (2020). Linear Regression Models for Heteroscedastic and Non-Normal Data. ‎ScienceAsia 46: 353-360‎ https://doi.org/10.2306/scienceasia1513-1874.2020.047.
    22. ‎White, H. (1980). A Heteroskedastic Consistent Covariance Matrix Estimator and a Direct Test of Heteroskedasticity. ‎Open Access Li-brary Journal, 4(8):817-818. ‎ https://doi.org/10.2307/1912934.
  • Downloads

  • Received date: February 19, 2025

    Accepted date: March 6, 2025

    Published date: March 20, 2025