Clustering and multiple imputation of missing data

Elsiddig Koko; Amin Ibrahim Adam Mohamed

doi:10.14419/ijbas.v5i1.5470

Article Summary Abstract References Full Article How to cite

Authors
- Elsiddig Koko Sudan University of Science & Technology, Faculty of science, Department of Statistics
- Amin Ibrahim Adam Mohamed
2015-12-10

https://doi.org/10.14419/ijbas.v5i1.5470
Cluster Analysis, Missing Data, Multiple Imputation, Two-Step Cluster Analysis.
Abstract

The present work specifically focuses on the data analysis as the objective is to deal with the missing values in cluster analysis. Two-Step Cluster Analysis is applied in which each participant is classified into one of the identified pattern and the optimal number of classes is determined using SPSS Statistics/IBM. Any observation with missing data is excluded in the Cluster Analysis because like multi-variable statistical techniques. Therefore, before performing the cluster analysis, missing values will be imputed using multiple imputations (SPSS Statistics/IBM). The clustering results will be displayed in tables. Furthermore, goal of analysis is to reduce biases arising from the fact that non-respondents may be different from those who participate and to bring sample data up to the dimensions of the target population totals.
References
1. [1] Ngondi, J., Matthews, F., Reacher, M., Onsarigo, A., Matende, I., Baba, S., & Emerson, P. (2007). Prevalence of risk factors and severity of active trachoma in southern Sudan: an ordinal analysis. American Journal of Tropical Medicine and Hygiene, 77(1), 126.
  [2] Jain A. K. and Dubes R. C. (1988). Algorithms for clustering data, Prentice-Hall, Inc., Upper Saddle River, NJ, USA.
  [3] M. S. Aldenderfer, R. K. Blashfield, Cluster analysis, Sage Publications, London, England.
  [4] Anderberg M. R. (1973). Cluster analysis for applications, Academic Press, Inc., London, and ASR: An integrated study. In Proc. of Eurospeech â€™99, 2407â€“2410.
  [5] Karkka T., Inen and Ayramo, S., (2004).Robust clustering methods for incomplete and erroneous data, in Proceedings of the Fifth Conference on Data Mining,, pp. 101â€“112.
  [6] R. J. Little, D. B. Rubin, Statistical analysis with missing data, John Wiley & Sons, (1987).
  [7] Jain, A. K., Duin, R. P.W. and Mao, J. (2000) Statistical pattern recognition: A review, IEEE Trans. Pattern Anal. Mach. Intell., 22, pp. 4â€“37. http://dx.doi.org/10.1109/34.824819.
  [8] A. Jain, M. Murty, P. Flynn, Data clustering: a review, ACM Computing Surveys, 31 (1999) 264â€“323. http://dx.doi.org/10.1145/331499.331504.
  [9] Hastie, T., Tibshirani, R. and Friedman, J. (2001). The elements of statistical learning: Data mining, inference and prediction, Springer-Verlag. http://dx.doi.org/10.1007/978-0-387-21606-5.
  [10] Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R., Botstein, D. and Altman, R. B. (2001). Missing value estimation methods for DNA microarrays. Bioinformatics 17 520â€“525. http://dx.doi.org/10.1093/bioinformatics/17.6.520.
  [11] Ghahramani, Z. and Jordan, M. I. (1994). Learning from incomplete data. Tech. Rep., Massachusetts Inst. of Technology Artiï¬cial Intelligence Lab.
  [12] Vizinho, A., Green, P., Cooke, M. and Josifovski, L. (1999). Missing data theory, spectral subtraction and signal-to-noise estimation for robust.
  [13] Tresp, V., Neunier, R. and Ahmad, S. (1995). Efficient methods for dealing with missing data in supervised learning. In Advances in Neural Info Proc. Sys. 7.
  [14] Wagstaff, K., Cardie, C., Rogers, S. and Schroedl, S. (2001). Constrained k-means clustering with background knowledge. In Proc. of the 18th Intl. Conf. on Machine Learning, 577â€“584.
  [15] J. Han, M. Kamber, Data mining: concepts and techniques, Morgan Kaufmann Publishers, Inc., (2001).
  [16] Hand D., Mannila, H. and Smyth P.,Principles of Data Mining, MIT Press,(2001).
  [17] P. Tan, M. Steinbach, V. Kumar, Introduction to data mining, Addison-Wesley, Networks, 16 (2005) 645â€“678.
  [18] Horvitz, D. G., and D.J. Thompson, (1952). â€œA generalization of sampling without replacement from a finite universe.â€ The Journal of the American Statistical Association 47:663-685.
  [19] D. B Rubin, "Inference and Missing Data,â€ Biometrika, 63(1987)581â€“590. Multiple Imputations for Nonresponsive in Surveys, New York: Wiley. 8(1987) 3â€“15.Association, 91 (1976) 473â€“489.
  [20] Deville, J.C. and C.E. Sarndal, (1992). â€œCalibration Estimating in Survey Sampling.â€ Journal of the American Statistical Association 87:376-382. http://dx.doi.org/10.1080/01621459.1992.10475217.
  [21] Folsom, R. E. and A.C. Singh, (2000). â€œThe General Exponential Model for Sampling Weight Calibration for Extreme Values, Non-response, and Post-stratification.â€ in Proceedings of the Survey Research Methods Section, American Statistical Association. Indianapolis, Indiana.
  [22] Cochran, W. G., (1977). Sampling Techniques, Third Edition. New York: John Wiley & Sons.
  [23] Skinner, C.J., D. Holt and T.M.F. Smith. Editors, (1989). Analysis of Complex Surveys. Wiley, New York.
  [24] J. G. Ibrahim, M.-H. Chen, S. R. Lipsitz, â€œMissing- Data Methods for Generalized Linear Models: Comparative Review,â€ Journal of the American Statistical Association, 100(2005) 332â€“346. http://dx.doi.org/10.1198/016214504000001844.
  [25] J. Carpenter, â€œAnnotated Bibliography on Missing Dataâ€, Available online at http://www.lshtm.ac.uk/ msu/ missingdata/biblio.html [accessed July 30, 2006].
  [26] Horton, N. J., and Lipsitz, S. R. (2001), â€œMultiple Imputation in Practice: Comparison of Software Packages for Regression Models With Missing Variables,â€ The American Statistician, 55, 244â€“254. http://dx.doi.org/10.1198/000313001317098266.
  [27] I. Jansen, C. Bounces, G. Molenberghs, â€œAnalyzing Incomplete Discrete Longitudinal Clinical Trial Data,â€ Statistical Science, 21(2006) 52â€“69. http://dx.doi.org/10.1214/088342305000000322.
  [28] Robins, J. M., Rotnitzky, A., and Zhao, L. P. , â€œAnalysis of Semiparametric Regression Models for Repeated Outcomes in the Presence of Missing Data,â€ Journal of the American Statistical Association, 90, (1995)106â€“121. http://dx.doi.org/10.1080/01621459.1995.10476493.
  [29] Laird, N. M., â€œMissing Data in Longitudinal Studies,â€ Statistics in Medicine, 7, (1988) 305â€“315. http://dx.doi.org/10.1002/sim.4780070131.
  [30] T. E. Raghunathan , J. M. Lepkowski, P. Solenberger, â€œA Multivariate Technique for Multiply Imputing Missing Values Using a Sequence of Regression Models,â€ Survey Methodology, 27(2001) 85â€“95.
  [31] Von Hippel, P., â€œBiases in SPSS 12.0 Missing Value Analysis,â€ The American Statistician, 58, (2004), 160â€“164. http://dx.doi.org/10.1198/0003130043204.
  [32] Van Buuren, S. (2006), Multiple Imputation Online [accessed August 19, 2015]. Available online at http://www.multiple-imputation.com. (In press), â€œCreating Multiple Imputations in Discrete and Continuous Data by Fully Conditional Specification,â€ Statistical Methods in Medical Research.
  [33] S. van Buuren, H. C. Boshuizen, D. L. Knook, â€œMultiple Imputation of Missing, (1999).
  [34] P. D. Allison, â€œMultiple Imputation for Missing Data: A Cautionary Tale,â€ Sociological Methods and Research, 28(2000) 301â€“309. http://dx.doi.org/10.1177/0049124100028003003.
  [35] Raghunathan TE. What do we do with missing data? Some options for analysis of incomplete data.Annual Review of Public Health. 2004; 25:99â€“117. http://dx.doi.org/10.1146/annurev.publhealth.25.102802.124410.
  [36] Meng XL. Missing data: dial M for??? Journal of the American Statistical Association.2000; 95(452):1325â€“1330. http://dx.doi.org/10.1080/01621459.2000.10474341.
Downloads
How to Cite
Koko, E., & Mohamed, A. I. A. (2015). Clustering and multiple imputation of missing data. International Journal of Basic and Applied Sciences, 5(1), 15-29. https://doi.org/10.14419/ijbas.v5i1.5470
ACM

ACS

APA

ABNT

Chicago

Harvard

IEEE

MLA

Turabian

Vancouver

Download Citation

Endnote/Zotero/Mendeley (RIS)

BibTeX
Received date: 2015-10-23

Accepted date: 2015-12-05

Published date: 2015-12-10

Clustering and multiple imputation of missing data

Authors

Abstract

References

Downloads

How to Cite

Published