Combination of levenshtein distance and rabin-karp to improve the accuracy of document equivalence level

  • Authors

    • Andysah Putera Utama Siahaan Universitas Pembangunan Panca Budi
    • Solly Aryza Universitas Pembangunan Panca Budi
    • Eko Hariyanto STMIK Pringsewu
    • Rusiadi . Universitas Medan Area
    • Andre Hasudungan Lubis Universitas Islam Negeri Sumatera Utara
    • Ali Ikhwan Universiti Malaysia Perlis
    • Phak Len Eh Kan
    2018-06-01
    https://doi.org/10.14419/ijet.v7i2.27.12084
  • Rabin-Karp, Levenshtein, Hash, Information Retrieval, Plagiarism.
  • Abstract

    Rabin Karp algorithm is a search algorithm that searches for a substring pattern in a text using hashing. It is beneficial for matching words with many patterns. One of the practical applications of Rabin Karp's algorithm is in the detection of plagiarism. Michael O. Rabin and Richard M. Karp invented the algorithm. This algorithm performs string search by using a hash function. A hash function is the values that are compared between two documents to determine the level of similarity of the document. Rabin-Karp algorithm is not very good for single pattern text search. This algorithm is perfect for multiple pattern search. The Levenshtein algorithm can be used to replace the hash calculation on the Rabin-Karp algorithm. The hash calculation on Rabin-Karp only counts the number of hashes that have the same value in both documents. Using the Levenshtein algorithm, the calculation of the hash distance in both documents will result in better accuracy.

     

     

  • References

    1. [1] A. P. U. Siahaan and R. Rahim, “Dynamic Key Matrix of Hill Cipher Using Genetic Algorithm,†Int. J. Secur. It is Appl., vol. 10, no. 8, pp. 173–180, Aug. 2016.

      [2] R. Meiyanti, A. Subandi, N. Fuqara, M. A. Budiman, and A. P. U. Siahaan, “The Recognition of Female Voice Based on Voice Registers in Singing Techniques in Real-Time using Hankel Transform Method and Macdonald Function,†J. Phys. Conf. Ser., vol. 978, no. 1, pp. 1–6, 2018. https://doi.org/10.1088/1742-6596/978/1/012051.

      [3] R. Rahim et al., “Combination Base64 Algorithm and EOF Technique for Steganography,†J. Phys. Conf. Ser., vol. 1007, no. 1, pp. 1–5, 2018. https://doi.org/10.1088/1742-6596/1007/1/012003.

      [4] Z. Tharo, A. P. U. Siahaan, and N. Evalina, “Improvisation Analysis of Reactive Power Energy Saving Lamps Based on Inverter,†Int. J. Eng. Tech., vol. 2, no. 5, pp. 141–145, 2016.

      [5] S. Aryza, M. Irwanto, Z. Lubis, A. P. U. Siahaan, R. Rahim, and M. Furqan, “A Novelty Design of Minimization of Electrical Losses in A Vector Controlled Induction Machine Drive,†in IOP Conference Series: Materials Science and Engineering, 2018, vol. 300, no. 1.

      [6] Z. Ramadhan and A. P. U. Siahaan, “Stop-and-Wait ARQ Technique for Repairing Frame and Acknowledgment Transmission,†Inte rnational J. Eng. Trends Technol., vol. 38, no. 7, pp. 384–387, 2016. https://doi.org/10.14445/22315381/IJETT-V38P269.

      [7] S. Ramadhani, Y. M. Saragih, R. Rahim, and A. P. U. Siahaan, “Post-Genesis Digital Forensics Investigation,†Int. J. Sci. Res. Sci. Technol., vol. 3, no. 6, pp. 164–166, 2017.

      [8] A. P. U. Siahaan, “Rabin-Karp Elaboration in Comparing Pattern Based on Hash Data,†Int. J. Secur. It is Appl., vol. 12, no. 2, pp. 59–66, Mar. 2018.

      [9] A. P. U. Siahaan, Mesran, R. Rahim, and D. Siregar, “K-Gram As A Determinant Of Plagiarism Level In Rabin-Karp Algorithm,†Int. J. Sci. Technol. Res., vol. 6, no. 7, pp. 350–353, 2017.

      [10] L. Marlina, Muslim, and A. P. U. Siahaan, “Data Mining Classification Comparison (Naïve Bayes and C4.5 Algorithms),†Inte rnational J. Eng. Trends Technol., vol. 38, no. 7, pp. 380–383, 2016. https://doi.org/10.14445/22315381/IJETT-V38P268.

      [11] W. Fitriani and A. P. U. Siahaan, “Comparison between WEKA and Salford System in Data Mining Software,†Int. J. Mob. Comput. Appl., vol. 3, no. 4, pp. 1–4, 2016.

      [12] Suherman and A. P. U. Siahaan, “Huffman Text Compression Technique,†Int. J. Comput. Sci. Enginee ring, vol. 3, no. 8, pp. 103–108, 2016.

      [13] L. Marlina, A. P. U. Siahaan, H. Kurniawan, and I. Sulistianingsih, “Data Compression Using Elias Delta Code,†Int. J. Recent Trends Eng. Res., vol. 3, no. 8, pp. 210–217, Aug. 2017. https://doi.org/10.23883/IJRTER.2017.3406.TEGS6.

      [14] Mochammad Iswan Perangin-angin, Khairul, and A. P. U. Siahaan, “Fuzzy Logic Concept in Technology, Society, and Economy Areas in Predicting Smart City,†Int. J. Recent Trends Eng. Res., vol. 2, no. 12, pp. 176–181, 2016.

      [15] R. F. Wijaya, Y. M. Tondang, and A. P. U. Siahaan, “Take Off and Landing Prediction using Fuzzy Logic,†Int. J. Recent Trends Eng. Res., vol. 2, no. 12, pp. 127–134, 2016.

      [16] M. I. Perangin-angin, Muslim, and A. P. U. Siahaan, “Frontline Personnel Knowledge Sharing and Transfer (PT. Wahana Ottomitra Multiartha, Tbk.),†Int. J. Comput. Sci. Enginee ring, vol. 3, no. 8, pp. 109–112, 2016.

      [17] I. Sumartono, D. Arisandi, A. P. U. Siahaan, and Mesran, “Expert System of Catfish Disease Determinants Using Certainty Factor Method,†Int. J. Recent Trends Eng. Res., vol. 3, no. 8, pp. 202–209, Aug. 2017. https://doi.org/10.23883/IJRTER.2017.3405.TCYZ2.

      [18] R. Rahim et al., “Searching Process with Raita Algorithm and its Application,†J. Phys. Conf. Ser., vol. 1007, no. 1, pp. 1–7, 2018. https://doi.org/10.1088/1742-6596/1007/1/012004.

      [19] S. Hartanto, M. Furqan, A. P. U. Siahaan, and W. Fitriani, “Haversine Method in Looking for the Nearest Masjid,†Int. J. Recent Trends Eng. Res., vol. 3, no. 8, pp. 187–195, Aug. 2017. https://doi.org/10.23883/IJRTER.2017.3402.PD61H.

      [20] A. P. U. Siahaan, “Heuristic Function Influence to the Global Optimum Value in Shortest Path Problem,†IOSR J. Comput. Eng., vol. 18, no. 05, pp. 39–48, May 2016. https://doi.org/10.9790/0661-1805053948.

      [21] Z. Sitorus and A. P. U. Siahaan, “Heuristic Programming in Scheduling Problem Using A* Algorithm,†IOSR J. Comput. Eng., vol. 18, no. 5, pp. 71–79, 2016.

      [22] M. Saragih, H. Aspan, and A. P. U. Siahaan, “Violations of Cybercrime and the Strength of Jurisdiction in Indonesia,†Int. J. Humanit. Soc. Stud., vol. 5, no. 12, pp. 209–214, 2017.

      [23] D. Kurnia, H. Dafitri, and A. P. U. Siahaan, “RSA 32-bit Implementation Technique,†Int. J. Recent Trends Eng. Res., vol. 3, no. 7, pp. 279–284, 2017. https://doi.org/10.23883/IJRTER.2017.3359.UXAIW.

      [24] Haryanto, A. P. U. Siahaan, R. Rahim, and Mesran, “Internet Protocol Security as the Network Cryptography System,†Int. J. Sci. Res. Sci. Technol., vol. 3, no. 6, pp. 223–226, 2017.

      [25] W. Fitriani, R. Rahim, B. Oktaviana, and A. P. U. Siahaan, “Vernam Encypted Text in End of File Hiding Steganography Technique,†Int. J. Recent Trends Eng. Res., vol. 3, no. 7, pp. 214–219, Jul. 2017. https://doi.org/10.23883/IJRTER.2017.3351.6ON8H.

      [26] A. P. U. Siahaan, “Vernam Conjugated Manipulation of Bit-Plane Complexity Segmentation.â€

      [27] Z. Tharo and A. P. U. Siahaan, “Profile Matching in Solving Rank Problem,†IOSR J. Electron. Commun. Eng., vol. 11, no. 05, pp. 73–76, May 2016. https://doi.org/10.9790/2834-1105017376.

      [28] A. P. U. Siahaan, “Rail Fence Cryptography in Securing Information.â€

      [29] A. P. U. Siahaan, “A Fingerprint Pattern Approach to Hill Cipher Implementation.â€

      [30] R. Rahim, Mesran, A. P. U. Siahaan, and S. Aryza, “Composite Performance Index for Student Admission,†Int. J. Res. Sci. Eng., vol. 3, no. 3, pp. 68–74, 2017.

      [31] Khairul, M. Simaremare, and A. P. U. Siahaan, “Decision Support System in Selecting the Appropriate Laptop Using Simple Additive Weighting,†Int. J. Recent Trends Eng. Res., vol. 2, no. 12, pp. 215–222, 2016.

      [32] T. Ho, S.-R. Oh, and H. Kim, “A parallel approximate string matching under Levenshtein distance on graphics processing units using warp-shuffle operations,†PLoS One, vol. 12, no. 10, p. e0186251, Oct. 2017. https://doi.org/10.1371/journal.pone.0186251.

      [33] C. Nyirarugira, H.-R. Choi, J. Kim, M. Hayes, and T. Kim, “Modified levenshtein distance for real-time gesture recognition,†in 2013 sixth International Congress on Image and Signal Processing (CISP), 2013, pp. 974–979.

      [34] R. Umar, Y. Hendriana, and E. Budiyono, “Implementation of Levenshtein Distance Algorithm for ECommerce of Bravoisitees Distro,†Int. J. Comput. Trends Technol., vol. 27, no. 3, pp. 131–136, Sep. 2015. https://doi.org/10.14445/22312803/IJCTT-V27P123.

      [35] L. Yujian and L. Bo, “A Normalized Levenshtein Distance Metric,†IEEE Trans. Pattern Anal. Mach. Intell., vol. 29, no. 6, pp. 1091–1095, Jun. 2007. https://doi.org/10.1109/TPAMI.2007.1078.

  • Downloads

  • How to Cite

    Putera Utama Siahaan, A., Aryza, S., Hariyanto, E., ., R., Hasudungan Lubis, A., Ikhwan, A., & Len Eh Kan, P. (2018). Combination of levenshtein distance and rabin-karp to improve the accuracy of document equivalence level. International Journal of Engineering & Technology, 7(2.27), 17-21. https://doi.org/10.14419/ijet.v7i2.27.12084

    Received date: 2018-04-24

    Accepted date: 2018-05-03

    Published date: 2018-06-01