Survey of duplicate detection using progressive detection techniques

  • Abstract
  • Keywords
  • References
  • PDF
  • Abstract

    Data is an important task in real world; the common data is represented and used in all the fields. The duplicate data is executed and displayed in scenario. The proposed work two types of techniques used first one Progressive Sort Neighbourhood Method (PSNM) and Progressive Blocking (PB). Progressive Sort Neighbourhood Method is used to deliver the exact input based output and the method will separate the input based keywords and check the similarity of the output data. The progressive blocking is to filter the irrelevant information, keywords based indexing and entry level filtering standard input is implemented based on user requirement.

  • Keywords

    Progressive Sort Neighborhood Method; Progressive Blocking; Duplicate Detection.

  • References

      [1] Ahmed K. Elmagarmid, Vassilios S. Verykios, Member,”Duplicate Record Detection: A Survey”. IEEE KDE, VOL. 19, NO. 1, JANUARY 2007.

      [2] S. Ramya, C. Palaninehruineering,“A Study of Progressive Techniques for Efficient Duplicate Detection”. International Journal of Advanced Research in Computer Science and Software Engineering.Volume 5, Issue 11, November 2015.

      [3] Mohd Shoaib Amir Khan, “Progressive identification of duplicity”.International Journal of Scientific and Research Publications, Volume 6, Issue 4, April 2016.

      [4] Mauricio A. Hernandez, J.Stolfo, .Real World Data Is Dirty:Data Cleaning And The Merge/Purge Problem.

      [5] Jayant Madhavan, Shawn R. Jeffery, Shirley Cohen, Xin (Luna) Dong,DavidKo, Cong Yu, Alon Halevy,Google, Inc. “Web-scale Data Integration: You can only afford to Pay As You Go”.

      [6] Shawn R. Jeffery_ UC Berkeley Jeffery,Alon Y. Halevy ”Pay-as-you-go User Feedback for Dataspace Systems”.

      [7] Top-k Set Similarity Joins Chuan Xiao Wei Wang Xuemin Lin Haichuan Shang

      [8] Ritika Mishra1, Navjot Kaur2 “A Survey of Spelling Error Detection and Correction Techniques” International Journal of Computer Trends and Technology- volume4Issue3- 2013.

      [9] Piotr Indyk1 A Small Approximately Min-Wise Independent Family of Hash Functions Received June 7, 1999.

      [10] Uwe DraisbachHasso Plattner, “A Generalization of Blocking and Windowing Algorithms for Duplicate Detection”.

      [11] Rupali Vairagade, Savitribai Phule “A Survey of Sorted Neighbourhood Indexing Technique for DeDuplication” International Journal of Innovative Research in Computer and Communication Engineering (An ISO 3297: 2007 Certified Organization). Vol. 3, Issue 12, December 2015.




Article ID: 9757
DOI: 10.14419/ijet.v7i1.9.9757

Copyright © 2012-2015 Science Publishing Corporation Inc. All rights reserved.