Performance analysis on least absolute shrinkage selection operator, elastic net and correlation adjusted elastic net regression methods

  • Authors

    • Pascalis Kadaro Matthew Department of Mathematics,Faculty of Science,Ahmadu Bello University, Zaria,Nigeria.
    • Abubakar Yahaya Department of Mathematics,Faculty of Science,Ahmadu Bello University, Zaria,Nigeria.
    2015-05-16
    https://doi.org/10.14419/ijasp.v3i1.4364
  • Convex Optimization, Cross Validation, Multicollinearity, Penalized Regression.
  • Abstract

    Some few decades ago, penalized regression techniques for linear regression have been developed specifically to reduce the flaws inherent in the prediction accuracy of the classical ordinary least squares (OLS) regression technique. In this paper, we used a diabetes data set obtained from previous literature to compare three of these well-known techniques, namely: Least Absolute Shrinkage Selection Operator (LASSO), Elastic Net and Correlation Adjusted Elastic Net (CAEN). After thorough analysis, it was observed that CAEN generated a less complex model.

  • References

    1. [1] Adams, J., “A computer experiment to evaluate regression strategiesâ€, Proceedings of the Statistical Computing Section, American Statistical Association, (1990), pp: 55 - 62.

      [2] Beer, D. G., Kardia, S. L., Huang, C. C, Giordano, T. J., Levin, A. M., “Gene-expression profiles predict survival of patients with lung adenocarcinomaâ€, Nat. Med., 8, (2002), pp: 816 – 824. http://dx.doi.org/10.1038/nm733.

      [3] Bornn, L., Gottardo, R., and Doucet, A., “Grouping priors and the Bayesian elasticnetâ€, Technical Report 254, Department of Statistics. University of British Columbia, (2010).

      [4] Bøvelstad, H. M., Nygard, S., Storvold, H. L., Aldrin, M., Borgan, O., Frigessi, A., Lingjarde, O. C., “Predicting survival from microarray data a comparative studyâ€, Bioinformatics, 23, (2007), pp: 2080 – 2087. http://dx.doi.org/10.1093/bioinformatics/btm305.

      [5] Breiman, L., Friedman, J., “Predicting multiple responses in multiple linear regression (with discussion)â€, Journal of the Royal Statistical Society: Series B59, (1997), pp: 3 – 54. http://dx.doi.org/10.1111/1467-9868.00054.

      [6] Bühlmann, P., van de Geer, S., Statistics for High-Dimensional Data: Methods, Theory and Applications, Springer-Verlag, NewYork, (2011), pp: 97 – 115.

      [7] Chen, H. Y., Yu, S. L., Chen, C. H., “A five gene signature and clinical outcome in non–small cell lung cancerâ€, N. Engl. J. Med., 356, (2007), pp: 11 – 20. http://dx.doi.org/10.1056/NEJMoa060096.

      [8] Cho, S., Kim, K., Lee, J. K., “Joint identification of multiple genetic variants via elastic-net variable selection in a genome-wide association analysisâ€, Ann. Hum. Genet. 74, (2010), pp: 416 – 428. http://dx.doi.org/10.1111/j.1469-1809.2010.00597.x.

      [9] Draper, N. R., Smith, H., Applied Regression Analysis, 2nd Ed. John Wiley and Sons, Inc. New York, (1981), pp: 75 – 95.

      [10] Efromyson, M. A., Multiple Regression Analysis. Mathematical Methods for Digital Computers, John Wiley and Sons, Inc. NewYork, (1960), pp: 65 - 79.

      [11] Efron, B., Large-Scale Inference: Empirical Bayes Methods for Estimation, Testing, and Prediction, Cambrige University Press, Cambrige, UK, (2010), pp: 46 – 67. http://dx.doi.org/10.1017/CBO9780511761362.005.

      [12] Efron, B., Turnbull, B. B., Narasimhan, B., Locfdr:Computes Local False Discovery Rates. R-packageVersion1.1-7, (2011), Available online: http://CRAN.R- project.org/package=locfdr.

      [13] Fan, J., Li, J., “A selective overview of variable selection in high dimensional feature spaceâ€, Stat.Sin, 20, (2010), pp: 101 – 148.

      [14] Friedman, J., Hastie, T., Tibshirani, R., “Regularization paths for generalized linear models via coordinate descentâ€, Journal of Statistical Software, 33, (2010), pp: 1 – 22.

      [15] Friedman, J., Hastie, T., Hoefling, H., Tibshirani, R. “Pathwise Coordinate Optimizationâ€, Annals of Applied Statistics, 2, (2007), pp: 302 – 332. http://dx.doi.org/10.1214/07-AOAS131.

      [16] Hastie, T. R., Tibshirani, R., Friedman, J., Elements of Statistical Learning: Data Mining, Inference and Prediction, 2nd Edition, Springer-Verlag, NewYork, (2009), pp: 37 - 71.

      [17] Hesterberg, T., Choi, N. H., Meier, L., Fraley, C., “Least angle and L1 penalized regression: a reviewâ€, Statistical .Survey, 2, (2008), pp: 61 – 93. http://dx.doi.org/10.1214/08-SS035.

      [18] Hoerl, A. E., Kennard, R., “Ridge regression: biased estimation for non-orthogonal problemsâ€, Technometrics, 12, (1970), pp: 55 – 67. http://dx.doi.org/10.1080/00401706.1970.10488634.

      [19] Hurvich, C., Tsai, C., “The impact of model selection on inference in linear regressionâ€, American Statistician, 44, (1990), pp: 214 - 217. http://dx.doi.org/10.2307/2685338.

      [20] Kooperberg, C., LeBlanc, M., Obenchain, V., “Risk prediction using genome-wide association studiesâ€, Genet. Epidemiol., 34, (2010), pp: 643 – 652. http://dx.doi.org/10.1002/gepi.20509.

      [21] Kutner, M. H., Nachtsheim, C. J., Neter, J., Li, W., Applied linear statistical models (5th edition), McGraw-Hill/Irwin, New York, (2005), pp: 67 - 83.

      [22] Kyung, M., Gill, J., Ghosh, M., Casella, G., “Penalized regression, standard errors, and Bayesian Lassosâ€, Bay.Anal., 5, (2010), pp: 369 – 412. http://dx.doi.org/10.1214/10-BA607.

      [23] Li, Q., Lin, N., “The Bayesian elasticnetâ€, Bay.Anal, 5, (2010), pp: 151 – 170. http://dx.doi.org/10.1214/10-BA506.

      [24] Neter, J., Kutner, M. H., Nachtsheim, C. J., Wasserman, W., Applied Linear Regression Models. 3rd Ed. McGraw-Hill/Irwin, Chicago, IL, (1996), pp: 49 - 87.

      [25] Shedden, K., Taylor, J. M., Enkemann, S. A., “Gene expression-based survival prediction in lung adenocarcinoma: a Multi-site, blinded validation studyâ€, Nat. Med., 14, (2008), pp: 822 – 827. http://dx.doi.org/10.1038/nm.1790.

      [26] Shieh, G., “Suppression situations in multiple linear regressionsâ€, Educational and Psychological Measurement, (2006), pp: 435 - 447. http://dx.doi.org/10.1177/0013164405278584.

      [27] Sørlie, T., Perou, C. M., Tibshirani, R., Aas, T., Geisler, S., Johnsen, H., Hastie, T., Eisen M. B., Van de Rijn, M., Jeffrey, S. S., Thorsen, T., Quist, H., “Gene expression patterns of breast carcinomas distinguish tumor Subclasses with clinical implicationsâ€, Proc. Natl Acad. Sci. USA, 98, (2001), 10869–10877. http://dx.doi.org/10.1073/pnas.191367098.

      [28] Tan, Q., Correlation Adjusted Penalization in Regression Analysis. PhD Thesis, (2012), Department of Statistics, University of Manitoba.

      [29] Tibshirani, R., “Regression shrinkage and selection via the lassoâ€. Journal of Royal Statistical Society, B58, (1996), pp: 267 – 288.

      [30] Turlach, B., Venables, W., Wright, S., “Simultaneous variable selectionâ€, Technometrics, 47, (2005), pp: 349 – 363. http://dx.doi.org/10.1198/004017005000000139.

      [31] Tutz, G., Ulbricht, J., “Penalized regression with correlation-based penaltyâ€, Statistical Computing, 19, (2009), pp: 239 – 253. http://dx.doi.org/10.1007/s11222-008-9088-5.

      [32] Van de Vijver, M. J., He, Y. D., Van’t Veer, L. J., Dai, H., Hart, A. A., Voskuil, D. W., “A gene-expression signature as a predictor of survival in breast cancerâ€, N. Engl. J. Med., 347, (2002), pp: 1999 – 2009. http://dx.doi.org/10.1056/NEJMoa021967.

      [33] Wigle, D. A., Jurisica, I., Radulovich, N., Pintilie, M., Rossant, J., Liu, N., Lu, C., Woodgett, J., “Molecular profiling of non-small cell lung cancer and correlation with disease-free survivalâ€, Cancer Res., 62, (2002), pp: 3005 – 3008.

      [34] Wahba, G., “Splines models for observational dataâ€, SIAM CBMS-NFS regional conference in applied mathematics, V.59, (1990).

      [35] Zhao, P., Yu, B., “On model selection consistency of Lassoâ€, Journal of Machine Learning Research, 7, (2006), pp: 2541 - 2563.

      [36] Zou, H., Hastie, T., “Regularization and variable selection via the elastic netâ€, Journal of Royal Statistical Society, B67, (2005), pp: 301 - 320. http://dx.doi.org/10.1111/j.1467-9868.2005.00503.x

  • Downloads

  • Received date: 2015-02-16

    Accepted date: 2015-03-17

    Published date: 2015-05-16