Gene Selection Approaches for Classifying Disease Relevant Data Sample


  • J Briso Becky Bell
  • S Maria Celestin Vigila





Microarrays, gene-expression, genomics, wrapper, dimensionality reduction.


In the latest field of gene expression profiling, the identification of most highly expressed genes with respect to diseases is been in focus lately, As to study the disease types and classify normal from disease syndrome samples. This paper portrays four gene selection approaches such as Pearson correlation, Signal to Noise Correlation, Feature Assessment by Sliding threshold and Feature Assessment by Information Retrieval for retrieving highly relevant genes oriented to a specific disease. This experiment uses various disease dataset for operating on the typical gene selection methods and to select top ten most relevant genes and thus selected genes are learned on using classifiers such as Support Vector Machine, K-Nearest Neighbour and Naïve Bayes to classify the specific disease oriented classes distinctively. Here we also compare the performance of our classifier with the previous papers techniques using classification Accuracy.




[1] Fang OH, Mustapha N & Nasir Sulaiman MD, “Integrating Biological Information for Feature Selection in Microarray Data Classificationâ€, IEEE Computer Society, IEEE Conference on Computer Engineering and Applications, Vol.2, (2010), pp.330-334.

[2] Osareh A & Shadgar B, “Microarray Data Analysis for Cancer Classificationâ€, IEEE Conference on Computer Engineering and Applications, (2010), pp.125-132.

[3] Hastie T, Tibshirani R, Eisen MB, Alizadeh A, Levy R, Staudt L & Brown P, “Gene shaving' as a method for identifying distinct sets of genes with similar expression patternsâ€, Genome biology, Vol.1, No.2, (2000).

[4] Saeys Y, Inza I & Larranaga P, “Review feature selection technique bioinformaticsâ€, Bioinformatics, Vol.23, No.19, (2007), pp.2507-2517.

[5] Maji P & Pal SK, “Fuzzy Rough sets for information measures and selection of relevant genes from microarray dataâ€, IEEE Transaction on Systems, Man, and Cybernetics, Vol.40, No.3, (2010), pp.741-752.

[6] Jose CHH, B´eatrice D & Jin KH, “A Genetic Embedded Approach for Gene Selection and Classification of Microarray Dataâ€, Springer, (2007), pp.90-101.

[7] Wasikowski M & Chen X, “Combating the small class imbalance problem using feature selectionâ€, IEEE Trans. Knowledge and Data Engineering, Vol.22, No.10, (2010), pp.1388-1400.

[8] Davis J & Goadrich M, “The Relationship between Precision-Recall and ROC Curvesâ€, 23rd Int’l Conf. Machine Learning, (2006), pp.30-38.

[9] Chen X & Wasikowski, “FAST: A ROC-Based Feature Selection Metric for Small Samples and Imbalanced Data Classification Problemsâ€, Proc. ACM SIGKDD, (2008), pp.124-133.

[10] Ganeshkumar P, Aruldoss T, Devaraj D & Renukadevi M, “Design of fuzzy Expert system for microarray data classification using a novel Genetic Swarm Algorithmâ€, Expert Systems with Applications, Vol.39, (2012), pp.1811-1821.

[11] Maji P, “Fuzzy–Rough Supervised Attribute Clustering Algorithm and Classification of Microarray Dataâ€, IEEE Transaction on Systems, Man and Cybernetics, (2010), pp.1-10.

[12] Golub T, Slonim D, Tamayo P, Huard C, Gaasenbeek M, Mesirov J, Coller H, Loh M, Dowing J, Caligiuri M, Bloomfield C & Lander E, “Molecular classification of cancer: Class discovery and class prediction by gene expression monitoringâ€, Science, Vol.286, (1999), pp.531-537.

[13] Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D & Levine A.J, “Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arraysâ€, Proc.Nat. Acad. Sci. U.S.A., Vol.96, No.12, (1999), pp.6745-6750.

[14] Welsh JB, Sapinoso LM, Su AI, Kern SG, Wang-Rodriguez J & Moskaluk CA, “Analysis of gene expression identifies candidate markers and pharmacological targets in prostate Cancerâ€, Cancer Research, Vol.61, (2001), pp.5974–5978.

[15] Hayward J, Alvarez SA, Ruiz C, Sullivan M, Tseng J & Whalen G, “Machine learning of clinical performance in pancreatic cancer databaseâ€, Artificial Intelligence in Medicine, Vol.49, No.3, (2010), pp.187-193.

[16] Kraan TCTM, Gaalen VFA, Kasperkovitz PV, Verbeet NL, Smeets TJM, Kraan MC, Fero M, Tak PP, Huizinga TWJ, Pieterman E, Breedveld FC, Breedveld AA, Alizadech AA & Verweij CL, “Rheumatoid arthritis is a heterogenous disease: Evidence for differences in activation of STAT-1 pathway between rheumatoid tissuesâ€, Arthritis Rheum., Vol.48, No.8, (2003), pp.2312-2145.

[17] Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A & Powell JI, “Distinct types of diffuse large B-cell lymphoma identified by gene expression profilingâ€, Nature, Vol.403, No.6769, (2000), pp.503-511.

[18] Teixeira VH, Olaso R, Martin-Magniette ML, Lasbleiz S, Jacq L, Oliveira CR & Petit-Teixeira E, “Transcriptome analysis describing new immunity and defense genes in peripheral blood mononuclear cells of rheumatoid arthritis patientsâ€, PloS one, Vol.4, No.8, (2009), pp.e6803.

[19] Mootha VK, Lindgren CM, Eriksson KF, Subramanian A, Sihag S, Lehar J and Houstis, N, “PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetesâ€, Nature genetics, Vol.34, No.3, (2003), pp.267-273.

[20] National Centre for Biotechnology Information (NCBI), U.S. National Library of Medicine, Available Online at, 2009.

[21] Hayward J, Alvarez SA, Ruiz C, Sullivan M, Tseng J & Whalen G, “Knowledge discovery in clinical performance of cancer patientsâ€, IEEE International conference on Bio-Informatics and Bio-Medicine, Vol.49, No.3, (2010), pp.187-193.

[22] Villalobos Antúnez, JV (2017). Karl R. Popper, Heráclito y la invención del logos. Un contexto para la Filosofía de las Ciencias Sociales. Opción Vol. 33, Núm. 84. 5-11

View Full Article: