Combination between DE and SVM to enhance Protein Structure Prediction based on Secondary Structural information

  • Authors

    • Thair A. Kadhim Ministry of Education, Babylon–Iraq
    • Mohammed Hasan Aldulaimi Ministry of Education, Babylon–Iraq
    • Suhaila Zainudin Ministry of Education, Babylon–Iraq
    • Azuraliza Abu Bakar Faculty of Information Science and Technology, UKM University, Malaysia
    2019-11-05
    https://doi.org/10.14419/ijet.v8i4.19619
  • Feature Selection, Differential Evolution, Hydropathical Information, Secondary Structure, Computational Time.
  • Abstract

    The effective selection of protein features and the accurate method for predicting protein structural class (PSP) is an important aspect in protein folding, especially for low-similarity sequences. Many promising approaches are proposed to solve this problem, mostly via computational intelligence methods. One of the main aspect of the prediction is the extraction of an excellent representation of a protein sequence. An integrated vector of dimensions 71 was extracted using secondary and hydropathy information in this study Using newly developed strategies for categorizing proteins into their respective main structures classes, which are all-α, all-β, α/β, and α+β. Support Vector Machine (SVM) and Differential Evolution (DE) were combined using the wrapper method to select the top N features based on the level of their respective importance. The classification can be made more accurate by tuning the kernel parameters for the SVM in the training phase. In this study, the mean of the classification rate from using the SVM classifier was used to evaluate the selected subset of features. This study was tested using two low - similarity data sets (D640 and ASTRAL). A comparison between the proposed (SVM + DE) based on DE feature selection approach and (SVM+DE) based on grid search (a traditional method to search for parameters) forms the core of this work. The proposed SVM+DE model is competitive and highly reliable in terms of time and performance accuracy compared with other reported methods in literature.

     

     

     

  • References

    1. [1] Bock JR, Gough DA. Predicting protein–protein interactions from primary structure. Bioinformatics 2001;17:455-60. https://doi.org/10.1093/bioinformatics/17.5.455.

      [2] Murzin AG, Brenner SE, Hubbard T, Chothia C. SCOP: a structural classification of proteins database for the investigation of sequences and structures. Journal of molecular biology 1995;247:536-40. https://doi.org/10.1016/S0022-2836(05)80134-2.

      [3] Rashid MA, Khatib F, Sattar A. Protein preliminaries and structure prediction fundamentals for computer scientists. arXiv preprint arXiv:1510.02775 2015.

      [4] Rupp B. Biomolecular crystallography: principles, practice, and application to structural biology. Garland Science, 2009.

      [5] Protein N. Spectroscopy: Principles and Practice. Palmer AG III 2007.

      [6] Rhodes CJ. Electron spin resonance. Part one: a diagnostic method in the biomedical sciences. Science progress 2011;94:16-96. https://doi.org/10.3184/003685011X12982218769939.

      [7] Li L, Cui X, Yu S, Zhang Y, Luo Z, Yang H, et al. PSSP-RFE: accurate prediction of protein structural class by recursive feature extraction from PSI-BLAST profile, physical-chemical property and functional annotations. PLoS One 2014;9:e92863. https://doi.org/10.1371/journal.pone.0092863.

      [8] Ghanty P, Pal NR. Prediction of protein folds: extraction of new features, dimensionality reduction, and fusion of heterogeneous classifiers. IEEE transactions on nanobioscience 2009;8:100-10. https://doi.org/10.1109/TNB.2009.2016488.

      [9] Dehzangi A, Paliwal K, Lyons J, Sharma A, Sattar A. Enhancing protein fold prediction accuracy using evolutionary and structural features. IAPR International Conference on Pattern Recognition in Bioinformatics 2013:196-207. https://doi.org/10.1007/978-3-642-39159-0_18.

      [10] Ding S, Zhang S, Li Y, Wang T. A novel protein structural classes prediction method based on predicted secondary structure. Biochimie 2012;94:1166-71. https://doi.org/10.1016/j.biochi.2012.01.022.

      [11] Kurgan L, Cios K, Chen K. SCPRED: accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences. BMC bioinformatics 2008;9:1. https://doi.org/10.1186/1471-2105-9-226.

      [12] Liu T, Jia C. A high-accuracy protein structural class prediction algorithm using predicted secondary structural information. Journal of theoretical biology 2010;267:272-5. https://doi.org/10.1016/j.jtbi.2010.09.007.

      [13] Mizianty MJ, Kurgan L. Modular prediction of protein structural classes from sequences of twilight-zone identity with predicting sequences. BMC bioinformatics 2009;10:1. https://doi.org/10.1186/1471-2105-10-414.

      [14] Zhang F, Wang D. An effective feature selection approach for network intrusion detection. Networking, Architecture and Storage (NAS), 2013 IEEE Eighth International Conference on 2013:307-11. https://doi.org/10.1109/NAS.2013.49.

      [15] Zhang J, Niu Q, Li K, Irwin GW. Model Selection in SVMs using Differential Evolution. IFAC Proceedings Volumes 2011;44:14717-22. https://doi.org/10.3182/20110828-6-IT-1002.00584.

      [16] Chou KC. Prediction of protein cellular attributes using pseudoâ€amino acid composition. Proteins: Structure, Function, and Bioinformatics 2001;43:246-55. https://doi.org/10.1002/prot.1035.

      [17] Chou K-C, Cai Y-D. Predicting protein structural class by functional domain composition. Biochemical and biophysical research communications 2004;321:1007-9. https://doi.org/10.1016/j.bbrc.2004.07.059.

      [18] Wu J, Li M-L, Yu L-Z, Wang C. An ensemble classifier of support vector machines used to predict protein structural classes by fusing auto covariance and pseudo-amino acid composition. The protein journal 2010;29:62-7. https://doi.org/10.1007/s10930-009-9222-z.

      [19] Kong L, Zhang L. Novel structure-driven features for accurate prediction of protein structural class. Genomics 2014;103:292-7. https://doi.org/10.1016/j.ygeno.2014.04.002.

      [20] Kong L, Zhang L, Lv J. Accurate prediction of protein structural classes by incorporating predicted secondary structure information into the general form of Chou's pseudo amino acid composition. Journal of Theoretical Biology 2014;344:12-8. https://doi.org/10.1016/j.jtbi.2013.11.021.

      [21] Paliwal KK, Sharma A, Lyons J, Dehzangi A. Improving protein fold recognition using the amalgamation of evolutionary-based and structural based information. BMC bioinformatics 2014;15:S12. https://doi.org/10.1186/1471-2105-15-S16-S12.

      [22] Huang C-L, Wang C-J. A GA-based feature selection and parameters optimizationfor support vector machines. Expert Systems with applications 2006;31:231-40. https://doi.org/10.1016/j.eswa.2005.09.024.

      [23] Li Z-C, Zhou X-B, Lin Y-R, Zou X-Y. Prediction of protein structure class by coupling improved genetic algorithm and support vector machine. Amino Acids 2008;35:581-90. https://doi.org/10.1007/s00726-008-0084-z.

      [24] Sun X-D, Huang R-B. Prediction of protein structural classes using support vector machines. Amino acids 2006;30:469-75. https://doi.org/10.1007/s00726-005-0239-0.

      [25] Chou K-C. A key driving force in determination of protein structural classes. Biochemical and biophysical research communications 1999;264:216-24. https://doi.org/10.1006/bbrc.1999.1325.

      [26] Zhou G-P. An intriguing controversy over protein structural class prediction. Journal of protein chemistry 1998;17:729-38. https://doi.org/10.1023/A:1020713915365.

      [27] Yang J-Y, Peng Z-L, Chen X. Prediction of protein structural classes for low-homology sequences based on predicted secondary structure. BMC bioinformatics 2010;11:1. https://doi.org/10.1186/1471-2105-11-S1-S9.

      [28] Zhang L, Zhao X, Kong L. Predict protein structural class for low-similarity sequences by evolutionary difference information into the general form of Chou׳ s pseudo amino acid composition. Journal of theoretical biology 2014;355:105-10. https://doi.org/10.1016/j.jtbi.2014.04.008.

      [29] Mohammad TAS, Nagarajaram HA. Svm-based method for protein structural class prediction using secondary structural content and structural information of amino acids. Journal of Bioinformatics and Computational biology 2011;9:489-502. https://doi.org/10.1142/S0219720011005422.

      [30] Liu T, Zheng X, Wang J. Prediction of protein structural class for low-similarity sequences using support vector machine and PSI-BLAST profile. Biochimie 2010;92:1330-4. https://doi.org/10.1016/j.biochi.2010.06.013.

      [31] Zhang S, Ding S, Wang T. High-accuracy prediction of protein structural class for low-similarity sequences based on predicted secondary structure. Biochimie 2011;93:710-4. https://doi.org/10.1016/j.biochi.2011.01.001.

      [32] Ahmadi Adl A, Nowzari-Dalini A, Xue B, Uversky VN, Qian X. Accurate prediction of protein structural classes using functional domains and predicted secondary structure sequences. Journal of Biomolecular Structure and Dynamics 2012;29:1127-37. https://doi.org/10.1080/07391102.2011.672626.

      [33] Wang L, Xu Y, Li L. Parameter identification of chaotic systems by hybrid Nelder–Mead simplex search and differential evolution algorithm. Expert Systems with Applications 2011;38:3238-45. https://doi.org/10.1016/j.eswa.2010.08.110.

      [34] Cai Y-D, Zhou G-P. Prediction of protein structural classes by neural network. Biochimie 2000;82:783-5. https://doi.org/10.1016/S0300-9084(00)01161-5.

      [35] Lyons J, Biswas N, Sharma A, Dehzangi A, Paliwal KK. Protein fold recognition by alignment of amino acid residues using kernelized dynamic time warping. Journal of theoretical biology 2014;354:137-45. https://doi.org/10.1016/j.jtbi.2014.03.033.

      [36] Shen H-B, Yang J, Liu X-J, Chou K-C. Using supervised fuzzy clustering to predict protein structural classes. Biochemical and Biophysical Research Communications 2005;334:577-81. https://doi.org/10.1016/j.bbrc.2005.06.128.

      [37] Wang ZX, Yuan Z. How good is prediction of protein structural class by the componentâ€coupled method? Proteins: Structure, Function, and Bioinformatics 2000;38:165-75. https://doi.org/10.1002/(SICI)1097-0134(20000201)38:2<165::AID-PROT5>3.0.CO;2-V.

      [38] Cao Y, Liu S, Zhang L, Qin J, Wang J, Tang K. Prediction of protein structural class with Rough Sets. BMC bioinformatics 2006;7:1. https://doi.org/10.1186/1471-2105-7-20.

      [39] Chen Y-W, Lin C-J. Combining SVMs with various feature selection strategies. Feature extraction. Springer, 2006, 315-24. https://doi.org/10.1007/978-3-540-35488-8_13.

      [40] Li X, Liu T, Tao P, Wang C, Chen L. A highly accurate protein structural class prediction approach using auto cross covariance transformation and recursive feature elimination. Computational Biology and Chemistry 2015;59:95-100. https://doi.org/10.1016/j.compbiolchem.2015.08.012.

      [41] Dehzangi A, Paliwal K, Lyons J, Sharma A, Sattar A. Proposing a highly accurate protein structural class predictor using segmentation-based features. BMC genomics 2014;15:1. https://doi.org/10.1186/1471-2164-15-S1-S2.

      [42] Cortes C, Vapnik V. Support-vector networks. Machine learning 1995;20:273-97. https://doi.org/10.1007/BF00994018.

      [43] Leijôto LF, Rodrigues TADO, Záratey LE, Nobre CN. A Genetic algorithm for the selection of features used in the prediction of protein function. Bioinformatics and Bioengineering (BIBE), 2014 IEEE International Conference on 2014:168-74. https://doi.org/10.1109/BIBE.2014.42.

      [44] Zhang L, Zhao X, Kong L, Liu S. A novel predictor for protein structural class based on integrated information of the secondary structure sequence. Biochimie 2014;103:131-6. https://doi.org/10.1016/j.biochi.2014.05.008.

      [45] Chuang L-Y, Ke C-H, Yang C-H. A hybrid both filter and wrapper feature selection method for microarray classification2008.

      [46] Uncu Ö, Türkşen I. A novel feature selection approach: combining feature wrappers and filters. Information Sciences 2007;177:449-66. https://doi.org/10.1016/j.ins.2006.03.022.

      [47] Tan KC, Teoh EJ, Yu Q, Goh K. A hybrid evolutionary algorithm for attribute selection in data mining. Expert Systems with Applications 2009;36:8616-30. https://doi.org/10.1016/j.eswa.2008.10.013.

      [48] Bradley PS, Mangasarian OL. Feature selection via concave minimization and support vector machines. ICML 1998;98:82-90.

      [49] Garcia-Nieto J, Alba E, Apolloni J. Hybrid DE-SVM approach for feature selection: application to gene expression datasets. 2009 2nd International Symposium on Logistics and Industrial Informatics 2009:1-6. https://doi.org/10.1109/LINDI.2009.5258761.

      [50] Guyon I. Practical feature selection: from correlation to causality. NATO Science for Peace and Security 2008;19:27-43.

      [51] He X, Zhang Q, Sun N, Dong Y. Feature selection with discrete binary differential evolution. Artificial Intelligence and Computational Intelligence, 2009. AICI'09. International Conference on 2009;4:327-30. https://doi.org/10.1109/AICI.2009.438.

      [52] Pal M, Foody GM. Feature selection for classification of hyperspectral data by SVM. IEEE Transactions on Geoscience and Remote Sensing 2010;48:2297-307. https://doi.org/10.1109/TGRS.2009.2039484.

      [53] Weston J, Mukherjee S, Chapelle O, Pontil M, Poggio T, Vapnik V. Feature selection for SVMs2000.

      [54] Korn F, Pagel B-U, Faloutsos C. On the “dimensionality curse†and the “self-similarity blessingâ€. IEEE Transactions on Knowledge and Data Engineering 2001;13:96-111. https://doi.org/10.1109/69.908983.

      [55] dos Santos GS, Luvizotto LGJ, Mariani VC, dos Santos Coelho L. Least squares support vector machines with tuning based on chaotic differential evolution approach applied to the identification of a thermal process. Expert Systems with Applications 2012;39:4805-12. https://doi.org/10.1016/j.eswa.2011.09.137.

      [56] Koloseni D, Luukka P. Differential Evolution Based Nearest Prototype Classifier with Optimized Distance Measures and GOWA. Intelligent Systems' 2014. Springer, 2015, 753-63. https://doi.org/10.1007/978-3-319-11313-5_66.

      [57] Mezura-Montes E, Velázquez-Reyes J, Coello CC. Modified differential evolution for constrained optimization. 2006 IEEE International Conference on Evolutionary Computation 2006:25-32.

      [58] Mohamed AW, Sabry HZ. Constrained optimization based on modified differential evolution algorithm. Information Sciences 2012;194:171-208. https://doi.org/10.1016/j.ins.2012.01.008.

      [59] Apolloni J, Leguizamón G, Alba E. Two hybrid wrapper-filter feature selection algorithms applied to high-dimensional microarray experiments. Appl. Soft Comput. 2016;38:922-32. https://doi.org/10.1016/j.asoc.2015.10.037.

      [60] Khushaba RN, Al-Ani A, Al-Jumaily A. Differential evolution based feature subset selection. Pattern Recognition, 2008. ICPR 2008. 19th International Conference on 2008:1-4. https://doi.org/10.1109/ICPR.2008.4761255.

      [61] Khushaba RN, Al-Ani A, Al-Jumaily A. Feature subset selection using differential evolution and a statistical repair mechanism. Expert Systems with Applications 2011;38:11515-26. https://doi.org/10.1016/j.eswa.2011.03.028.

      [62] Wenwen L, Xiaoxue X, Fu L, Yu Z. Application of Improved Grid Search Algorithm on SVM for Classification of Tumor Gene. International Journal of Multimedia & Ubiquitous Engineering 2014;9:181-8. https://doi.org/10.14257/ijmue.2014.9.11.18.

      [63] Ding S, Zhang S, Li Y, Wang T. A novel protein structural classes prediction method based on predicted secondary structure. Biochimie 2012;94:1166-71. https://doi.org/10.1016/j.biochi.2012.01.022.

      [64] Huang CL, Wang CJ. A GA-based feature selection and parameters optimizationfor support vector machines. Expert Systems with Applications 2006;31:231-40. https://doi.org/10.1016/j.eswa.2005.09.024.

      [65] Pal M, Foody GM. Feature selection for classification of hyperspectral data by SVM. IEEE Transactions on Geoscience and Remote Sensing 2010;48:2297-307. https://doi.org/10.1109/TGRS.2009.2039484.

      [66] Garcia-Nieto J, Alba E, Apolloni J. Hybrid DE-SVM Approach for Feature Selection: Application to Gene Expression Datasets. Logistics and Industrial Informatics, 2009. LINDI 2009. 2nd International 2009:1-6. https://doi.org/10.1109/LINDI.2009.5258761.

      [67] Storn R, Price K. Differential evolution–a simple and efficient heuristic for global optimization over continuous spaces. Journal of global optimization 1997;11:341-59. https://doi.org/10.1023/A:1008202821328.

      [68] Zou D, Liu H, Gao L, Li S. A novel modified differential evolution algorithm for constrained optimization problems. Computers & Mathematics with Applications 2011;61:1608-23. https://doi.org/10.1016/j.camwa.2011.01.029.

      [69] Yildiz AR. Hybrid Taguchi-differential evolution algorithm for optimization of multi-pass turning operations. Applied Soft Computing 2013;13:1433-9. https://doi.org/10.1016/j.asoc.2012.01.012.

      [70] Wong K-C, Wu C-H, Mok RK, Peng C, Zhang Z. Evolutionary multimodal optimization using the principle of locality. Information Sciences 2012;194:138-70. https://doi.org/10.1016/j.ins.2011.12.016.

      [71] Kurgan LA, Zhang T, Zhang H, Shen S, Ruan J. Secondary structure-based assignment of the protein structural classes. Amino Acids 2008;35:551-64. https://doi.org/10.1007/s00726-008-0080-3.

      [72] Jones DT. Protein secondary structure prediction based on position-specific scoring matrices. Journal of molecular biology 1999;292:195-202. https://doi.org/10.1006/jmbi.1999.3091.

      [73] Liu N, Wang T. Proteinâ€based phylogenetic analysis by using hydropathy profile of amino acids. FEBS letters 2006;580:5321-7. https://doi.org/10.1016/j.febslet.2006.08.086.

      [74] Cheng J, Tegge AN, Baldi P. Machine learning methods for protein structure prediction. IEEE reviews in biomedical engineering 2008;1:41-9. https://doi.org/10.1109/RBME.2008.2008239.

      [75] Sun X, Zhang L, Tan H, Bao J, Strouthos C, Zhou X. Multi-scale agent-based brain cancer modeling and prediction of TKI treatment response: Incorporating EGFR signaling pathway and angiogenesis. BMC bioinformatics 2012;13:218. https://doi.org/10.1186/1471-2105-13-218.

      [76] Huang C-L, Dun J-F. A distributed PSO–SVM hybrid system with feature selection and parameter optimization. Applied Soft Computing 2008;8:1381-91. https://doi.org/10.1016/j.asoc.2007.10.007.

  • Downloads

  • How to Cite

    A. Kadhim, T., Hasan Aldulaimi, M., Zainudin, S., & Abu Bakar, A. (2019). Combination between DE and SVM to enhance Protein Structure Prediction based on Secondary Structural information. International Journal of Engineering & Technology, 8(4), 478-489. https://doi.org/10.14419/ijet.v8i4.19619