Parallel framework based gene signature-hierarchical random forest cluster for predicting human diseases
-
2018-05-08 https://doi.org/10.14419/ijet.v7i2.27.12103 -
Gene Association, Genetic Algorithm, Hierarchical Clustering, Human Genome Prediction, Parallel Framework, Random -
Abstract
Gene is not responsible for many Human Diseases and instead, diseases occur by different or group of genomes interacting together and cause diseases. Hence it is need to analyse and associate the complete genome sequences to understand or predict various possible human diseases. This research work focused i. Hierarchical-Random Forest based Clustering (HRF-Cluster), ii. Genetic Algorithm-Gene Associa-tion Classifier (GA-GA) and iii. Weighted Common Neighbor Classifier (wCN). These Classifiers were implemented and studied thor-oughly in terms of Prediction Accuracy, Memory Utilization, Memory Usage and Processing Time. To improve the performances of the Gene Classifiers / Predictors further, this research work was proposed and implemented Gene Signature based HRF Cluster, G-HR. Re-sults show that that the performances of the proposed Classifier G-HR is outperforming as compared with the identified three Classifiers in terms of Disease Pattern Prediction, Processing Time, Memory Usage and Classification Accuracy. To improve the performance of the system further in term of Processing Time, the proposed model G-HR is implemented under Parallel Framework and evaluated. That is the model is tested with Two, Four, Eight and Sixteen Parallel Processors and from the results, it is established that the Processing Time de-creases considerably which will improve the performance of the Proposed Model.Â
-
References
[1] N. K. Sakthivel, N. P. Gopalan, S. Subasree, “G-HR: Gene Signature based HRF Cluster for Predicting Human Diseasesâ€, International Journal of Pure and Applied Mathematics, Volume 117 No. 9 (2017).
[2] N. K. Sakthivel, N. P. Gopalan, S. Subasree, “A Comparative Study and Analysis of DNA Sequence Classifiers for Predicting Human Diseasesâ€, ACM International Conference on Informatics and Analytics (ICIA-16), (2016). https://doi.org/10.1145/2980258.2982038.
[3] Thiptanawat Phongwattana, Worrawat Engchuan and Jonathan H. Chan, “Clustering-Based Multi-Class Classification of Complex Diseaseâ€, seventh IEEE International Conference on Knowledge and Smart Technology (KST2015), (2015). https://doi.org/10.1109/KST.2015.7051475.
[4] Koosha Tahmasebipour and Sheridan Houghten, “Disease-Gene Association Using a Genetic Algorithmâ€, 14th IEEE Computer Society conference on Bioinformatics and Bioengineering, Pp. 191-197, (2014). https://doi.org/10.1109/BIBE.2014.38.
[5] Gregorio Alanis-Lobato, “Exploring the Genetics Underlying Autoimmune Diseases with Network Analysis and Link Prediction†Middle East Conference on Biomedical Engineering (MECBME). (2014). https://doi.org/10.1109/MECBME.2014.6783232.
[6] Wei Hu, “High Accuracy Gene Signature for Chemosensitivity Prediction in Breast Cancerâ€, Tsinghua Science and Technology. 530-536. Volume 20, Number 5, October, (2015).
[7] Conze, “Random Forests on Hierarchical Multi-Scale Supervoxels for Liver Tumor Segmentation in Dynamic Contrast-Enhanced CT Scans†IEEE 13th International Symposium on Biomedical Imaging (ISBI), April (2016).
[8] Desbordes Paul, “Feature selection for outcome prediction in esophageal cancer using genetic algorithm and Random Forest Classifierâ€, Computerized Medical Imaging and Graphics, (2016).
[9] Feng Luo, James Z Wang, and Eric Promislow, “Exploring local community structures in large networksâ€, Web Intelligence and Agent Systems. (2006).
[10] R. Ben-Hamo, S. Boue, F. Martin, M. Talikka, and S. Efroni, “Classification of lung adenocarcinoma and squamous cell carcinoma samples based on their gene expression profile Improved Diagnostic Signature Challengeâ€, Systems Biomedicine, 1(4), 68-77. (2013). https://doi.org/10.4161/sysb.25983.
[11] Lilian Berton, “Link prediction in graph construction for supervised and semi-supervised learningâ€, International Joint Conference on Neural Networks (IJCNN), Pp. 1-5. (2015).
[12] Witten, Ian H., and Eibe Frank, “Data Mining: Practical Machine Learning Tools and Techniquesâ€, Morgan Kaufmann, (2005).
[13] Zaki, Mohammed J., and Wagner Meira Jr, “Data Mining and Analysis: Fundamental Concepts and Algorithmsâ€, Cambridge University Press, (2014).
[14] Leskovec, Jure, Anand Rajaraman, and Jeffrey David Ullman, “Mining of Massive Datasets,†Cambridge University Press, (2014). https://doi.org/10.1017/CBO9781139924801
[15] Nikam, Sagar S. "A Comparative Study of Classification Techniques in Data Mining Algorithms," Oriental Journal of Computer Science & Technology, Pp. 13-19, (2015).
[16] Han, Jiawei, Jian Pei, and Micheline Kamber, “Data Mining: Concepts and Techniques,†Elsevier, (2011).
[17] Delveen Luqman Abd Al.Nabi, Shereen Shukri Ahmed, “Survey on Classification Algorithms for Data Mining (Comparison and Evaluation),†(ISSN 2222-2863) 4(8), (2013).
[18] W. Engchuan, J. H. Chan, “Pathway-Based Multi-Class Classification of Lung Cancerâ€, Lecture Notes in Computer Science (LNCS), Vol. 7667 (Part V), pp. 697-702, (2012).
[19] X. Zhang and W. Xiao, “Clustering based Two-Stage Text Classification Requiring Minimal Training Data,†International Conference on Systems and Informatics (ICSAI), (2012). https://doi.org/10.1109/ICSAI.2012.6223496.
[20] M. H. Chignell, B. G. Stacey, “The Classification of Patients into diagnostic groups using Cluster Analysis,†Journal of Clinical Psycology, Vol. 37, pp. 151-153, (2006). https://doi.org/10.1002/1097-4679(198101)37:1<151::AID-JCLP2270370129>3.0.CO;2-4.
[21] Decap,D. et al., “Halvade: Scalable Sequence Analysis With Mapreduce,†Bioinformatics, Pp. 2482–2488, (2015). https://doi.org/10.1093/bioinformatics/btv179.
[22] Gonzalez-Domı´nguez,J. et al., “Parallel and scalable short-read alignment on multi-core clusters using UPC++,†Vol. 11, (2016).
[23] Jeffrey,D. and Sanjay,G, “Mapreduce: Simplified data processing on large clusters,†Proceedings of the 6th Conference on Symposium on Opearting Systems Design and Implementation, pp. 10–10, (2014).
[24] Niemenmaa,M. et al., “Hadoop-bam: Directly Manipulating Next Generation Sequencing Data in the Cloud,†Bioinformatics, Vol. 28, 876, (2012). https://doi.org/10.1093/bioinformatics/bts054.
[25] Pireddu,L. et al., “Seal: A Distributed Short Read Mapping and Duplicate Removal Tool,†Bioinformatics, Vol. 27, Pp. 2159–2160, (2011). https://doi.org/10.1093/bioinformatics/btr325.
[26] Puckel wartz M.J. et al, “Supercomputing for the Parallelization Of Whole Genome Analysis,†Bioinformatics, Vol. 30, Pp. 1508–1513, (2014).
[27] Wylie,K.M. et al., “Emerging view of the Human Virome,†Translational Research, Vol. 160, Pp. 283–290, (2012). https://doi.org/10.1016/j.trsl.2012.03.006.
[28] Zaharia,M. et al., “Spark: Cluster Computing with working Sets,†Proceedings of the Second USENIX Conference on Hot Topics in Cloud Computing, HotCloud’10, pp. 10–10, (2010).
[29] Puggini L, Doyle J, McLoone S, “Fault Detection Using Random Forest Similarity Distance,†IFAC-Papers On Line, Vol. 48, Pp. 583–588, (2015). https://doi.org/10.1016/j.ifacol.2015.09.589.
[30] Kim E-Y, Kim S-Y, Ashlock D, Nam D, “MULTI-K: Accurate Classification of Microarray Subtypes using Ensemble KMeans Clustering,†BMC Bioinformatics, (2009). https://doi.org/10.1186/1471-2105-10-260.
[31] Boongoen T, Garrett S and Price C, “New Cluster Ensemble Approach to Integrative Biological Data Analysis,†International Journal of Data Mining and Bioinformatics, Vol. 8, Pp. 150-168, (2013). https://doi.org/10.1504/IJDMB.2013.055495.
-
Downloads
-
How to Cite
Sakthivel, N. K., Gopalan, N. P., & Subasree, S. (2018). Parallel framework based gene signature-hierarchical random forest cluster for predicting human diseases. International Journal of Engineering & Technology, 7(2.27), 12-16. https://doi.org/10.14419/ijet.v7i2.27.12103Received date: 2018-04-25
Accepted date: 2018-05-03
Published date: 2018-05-08