Optimal Pair DNA Sequence Alignment based on Matching Regions and Multi-Zone Genetic Algorithm

  • Abstract
  • Keywords
  • References
  • PDF
  • Abstract

    DNA sequence alignment is an important and challenging task in Bioinformatics, which is used for finding the optimal arrangement between two sequences. In this paper, two methods are proposed in two stages to solve the pairwise sequence alignment problem. The first method is Matching Regions(MR) concerns on splitting the DNA into regions with adaptive interleaving windows to isolate the DNA tape into matched and non-matched regions. Additionally, a Multi-Zone Genetic Algorithm (MZGA) is proposed as an improved method in the second stage. It consists of segmenting a large non-matched region into smaller search space. Then, the MZGA is implemented in parallel to save time. Genetic Algorithm can be applied as an optimization toolto produce multiple solutions. Furthermore, the improvement focuses on the enhancement of Simple GA operators. In the selection, the population is divided into three Zones according to the fitness score. A new crossover approach is proposed depending on cut-points and location of gaps. The proposed method guarantees that the value of fitness tends to improvement or convergence in each successive generation. Thus, the offspring of populations will have better fitness value. The system has been applied to the real-world dataset of DNA with variable lengths which are ranged from 66 bases up to 26037 bases. As a result, the proposed technique satisfied the best alignment score of the DNA sequences. Finally, it is worth mentioning that the proposed system proved to be generalizable.



  • Keywords

    Bioinformatics, DNA Sequence Alignment, Matching regions, Multi-Zone Genetic Algorithm, Parallel processing.

  • References

      [1] J. Xiong, Essential bioinformatics. Cambridge University Press, 2006.

      [2] M. Gollery, “Bioinformatics: Sequence and Genome Analysis, David W. Mount. Cold Spring Harbor, NY: Cold Spring Harbor Laboratory Press, 2004, 692 pp., $75.00, paperback. ISBN 0-87969-712-1.,” Clin. Chem., vol. 51, no. 11, p. 2219, 2005.

      [3] M. L. Metzker, “Sequencing technologies - the next generation.,” Nat. Rev. Genet., vol. 11, no. 1, pp. 31–46, Jan. 2010.

      [4] T. P. Quinn, I. Erb, M. F. Richardson, and T. M. Crowley, “Understanding sequencing data as compositions: an outlook and review,” Bioinformatics, no. April, 2018.

      [5] J. Sun, K. Chen, and Z. Hao, “Pairwise alignment for very long nucleic acid sequences,” Biochem. Biophys. Res. Commun., vol. 502, no. 3, pp. 313–317, 2018.

      [6] N. H. Kaghed, E. S. Al, F. Emad, and K. Al-Khuzaie, “Comparative study of Genetic Algorithm and Dynamic Programming of DNA Multiple Sequence Alignment,” J. Babylon Univ. Appl. Sci. No, vol. 25, no. 2, 2017.

      [7] P. Bonizzoni and G. Della Vedova, “The complexity of multiple sequence alignment with SP-score that is a metric,” Theor. Comput. Sci., vol. 259, no. 1–2, pp. 63–79, 2001.

      [8] W. Just, “Computational Complexity of Multiple Sequence Alignment with SP-Score,” J. Comput. Biol., vol. 8, no. 6, pp. 615–623, Nov. 2001.

      [9] S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman, “Basic local alignment search tool,” J. Mol. Biol., vol. 215, no. 3, pp. 403–410, 1990.

      [10] W. R. Pearson, “Rapid and sensitive sequence comparison with FASTP and FASTA,” Methods Enzymol., vol. 183, pp. 63–98, Jan. 1990.

      [11] D. W. Mount, “Using the Basic Local Alignment Search Tool (BLAST).,” CSH Protoc., vol. 2007, p. pdb.top17, Jul. 2007.

      [12] C. Kemena and C. Notredame, “Upcoming challenges for multiple sequence alignment methods in the high-throughput era,” Bioinformatics, vol. 25, no. 19, pp. 2455–2465, Oct. 2009.

      [13] B. Chowdhury and G. Garai, “A review on multiple sequence alignment from the perspective of genetic algorithm,” Genomics, vol. 109, no. 5–6, pp. 419–431, Oct. 2017.

      [14] M. A. Larkin et al., “Clustal W and Clustal X version 2.0,” Bioinformatics, vol. 23, no. 21, pp. 2947–2948, Nov. 2007.

      [15] T. L. Bailey and C. Elkan, “Fitting a Mixture Model by Expectation Maximization to Discover Motifs in Bipolymers,” Proc. Second Int. Conf. Intell. Syst. Mol. Biol., pp. 28–36, 1994.

      [16] P. Di Tommaso et al., “T-Coffee: a web server for the multiple sequence alignment of protein and RNA sequences using structural information and homology extension,” Nucleic Acids Res., vol. 39, no. suppl_2, pp. W13–W17, Jul. 2011.

      [17] I.-G. Mircea, I. Bocicor, and G. Czibula, “A Reinforcement Learning Based Approach to Multiple Sequence Alignment,” in Soft Computing Applications, 2018, pp. 54–70.

      [18] H. Nabeel Kaghed, S. Eman Al–Shamery, and F. E. K. Al-Khuzaie, “Multiple Sequence Alignment based on Developed Genetic Algorithm,” Indian J. Sci. Technol., vol. 9, no. 2, 2016.

      [19] R. R. Rani and D. Ramyachitra, “Multiple sequence alignment using multi-objective based bacterial foraging optimization algorithm,” BioSystems, vol. 150, pp. 177–189, 2016.

      [20] J. Lee, Y. Yeu, H. Roh, Y. Yoon, and S. Park, “BulkAligner: A novel sequence alignment algorithm based on graph theory and Trinity,” Inf. Sci. (Ny)., vol. 303, pp. 120–133, 2015.

      [21] J. Li, S. Ranka, and S. Sahni, “Pairwise sequence alignment for very long sequences on GPUs,” 2012 IEEE 2nd Int. Conf. Comput. Adv. Bio Med. Sci. ICCABS 2012, 2012.

      [22] S. K. Pal and P. P. Wang, Genetic algorithms for pattern recognition. CRC press, 2017.




Article ID: 27993
DOI: 10.14419/ijet.v7i4.19.27993

Copyright © 2012-2015 Science Publishing Corporation Inc. All rights reserved.