Successive Duplicate Detection in Scalable Datasets in Cloud Database

Article Summary Abstract References Full Article How to cite

Authors
- N Rajkumar
- K Kishore Kumar
- J Vivek
How to Cite

Rajkumar, N., Kishore Kumar, K., & Vivek, J. (2018). Successive Duplicate Detection in Scalable Datasets in Cloud Database. International Journal of Engineering and Technology, 7(2.4), 66-69. https://doi.org/10.14419/ijet.v7i2.4.11167

ACM

ACS

APA

ABNT

Chicago

Harvard

IEEE

MLA

Turabian

Vancouver

Download Citation

Endnote/Zotero/Mendeley (RIS)

BibTeX
Received date: April 6, 2018

Accepted date: April 6, 2018

Published date: March 10, 2018
https://doi.org/10.14419/ijet.v7i2.4.11167
Duplication Detection, Dataset, Progressive Blocking, Progressive Sorted Neighbourhood Method, Data cleaning
Abstract

Replica identification is the path toward perceiving various depictions of same true matters. Today, duplication location methodologies needed process ever greater datasets in ever shorter time: keeping this idea on dataset ends up being dynamically troublesome. To present, dynamic duplicate and distinguishing by proof figuring happened to be using Progressive Sorted Neighbourhood Method and Progressive Blocking augmentations more profitability occurred in order to find duplicates. If the execution time is restricted then the grow of general technique considers the time accessible and generates reports that considerably produces results faster than ordinary systems. Broad examinations exhibit that our dynamic counts can twofold the capability after some season of standard copy recognition and basically improve related work.
Â
Â
References
1. [1] T.Senthil Murugan, Jagannath E Nalavade, â€œC-mixture and multi-constraints based genetic algorithm for collaborative data publishing,â€ Elsevier - Journal of King Saud University â€“ Computer and Information Sciences (Article in press), 2016
  [2] Kavitha R, Rajkumar N, Kannan E.â€Framework for Primary Health Centers (PHC)â€ using cloud in the International Journal named Discovery, using Cloud. Discovery, , 30(116), 17-21 2015.
  [3] P. Christen, â€œA survey of indexing techniques for scalable record linkage and deduplication,â€ IEEE Trans. Knowl. Data Eng., vol. 24, no. 9, pp. 1537â€“1555, Sep. 2012.
  [4] R. Kavitha, E. Kannan and S. Kotteswaran â€œImplementation of Cloud based Electronic Health Record (EHR) for Indian Healthcare Needsâ€ in the International journal of Science and Technology., Vol 9(3), DOI: 10.17485/ijst/2016/v9i3/86391, January 2016
  [5] M. A. Hernandez and S. J. Stolfo, â€œReal-world data is dirty: Data cleansing and the merge/purge problem,â€ Data Mining Knowl. Discovery, vol. 2, no. 1, pp. 9â€“37, 1998.
  [6] T.Senthil Murugan, Jagannath E Nalavade, â€œTHRFuzzy: Tangential holoentropy-enabled rough fuzzy classifier to classification of evolving data streamsâ€, Springer - Journal of Central South University,Vol.24(8), 2017
  [7] O. Hassanzadeh, F. Chiang, H. C. Lee, and R. J. Miller, â€œFramework for evaluating clustering algorithms in duplicate detection,â€ Proc. Very Large Databases Endowment, vol. 2, pp. 1282â€“ 1293, 2009.
  [8] Kavitha R, E Kannan â€œA Novel Triangular Boundary based classification approach to detect outliers and predict the class labels using the kernel Methodsâ€ by Journal of Information Science and Engineering.
  [9] T.Senthil Murugan, Jagannath E Nalavade, â€œHRNeuro-fuzzy: Adapting neuro-fuzzy classifier for recurring concept drift of evolving data streams using rough set theory and holoentropyâ€, Journal of King Saud University â€“ Computer and Information Sciences, (Article in press), 2016
  [10] U. Draisbach and F. Naumann, â€œA generalization of blocking and windowing algorithms for duplicate detection,â€ in Proc. Int. Conf. Data Knowl. Eng., 2011, pp. 18â€“24.
Downloads
How to Cite
Rajkumar, N., Kishore Kumar, K., & Vivek, J. (2018). Successive Duplicate Detection in Scalable Datasets in Cloud Database. International Journal of Engineering and Technology, 7(2.4), 66-69. https://doi.org/10.14419/ijet.v7i2.4.11167
ACM

ACS

APA

ABNT

Chicago

Harvard

IEEE

MLA

Turabian

Vancouver

Download Citation

Endnote/Zotero/Mendeley (RIS)

BibTeX
Received date: April 6, 2018

Accepted date: April 6, 2018

Published date: March 10, 2018

Successive Duplicate Detection in Scalable Datasets in Cloud Database

Authors

How to Cite

Abstract

References

Downloads

How to Cite

Published