An Enhancement of Progressive Duplicate Detection with Performance Evaluation

Ravikanth. M; D Vasumathi

doi:10.14419/ijet.v7i3.27.18510

Article Summary Abstract References Full Article How to cite

Authors
- Ravikanth. M
- D Vasumathi
https://doi.org/10.14419/ijet.v7i3.27.18510
Duplicate detection, entity resolution, pay-as-you-go, progressiveness, data cleaning.
Abstract

Copy recognition is the way toward grouping various portrayals of same certifiable substances. By and by, these techniques made fundamental to course ever higher datasets in continually squatter period and managing the distinction of a dataset befits logically hazardous. Dynamic copy discovery calculations altogether strengthen the productivity of finding copies if the execution time is lacking. Abusing the extension of the general procedure inside the time accessible by detailing brings about much earlier than past systems. Here, Widespread tests show that dynamic calculations can twofold the effectiveness after some time of customary copy identification and inauspiciously advance upon associated work.
Â
References
1. [1] S. E. Whang, D. Marmaros, and H. Garcia-Molina, â€œPay-asyou- go entity resolution,â€ IEEE Trans. Knowl. Data Eng., vol. 25, no. 5, pp. 1111â€“1124, May 2012.
  [2] A. K. Elmagarmid, P. G. Ipeirotis, and V. S. Verykios, â€œDuplicate record detection: A survey,â€ IEEE Trans. Knowl. Data Eng., vol. 19, no. 1, pp. 1â€“16, Jan. 2007.
  [3] F. Naumann and M. Herschel, An Introduction to Duplicate Detection. San Rafael, CA, USA: Morgan & Claypool, 2010.
  [4] H. B. Newcombe and J. M. Kennedy, â€œRecord linkage: Making maximum use of the discriminating power of identifying information,â€ Commun. ACM, vol. 5, no. 11, pp.563â€“566, 1962.
  [5] M. A. Hern_andez and S. J. Stolfo, â€œReal-world data is dirty:Data cleansing and the merge/purge problem,â€ Data Mining Knowl. Discovery, vol. 2, no. 1, pp. 9â€“37, 1998.
  [6] X. Dong, A. Halevy, and J. Madhavan, â€œReference reconciliation in complex information spaces,â€ in Proc. Int. Conf. Manage. Data, 2005, pp. 85â€“96.
  [7] O. Hassanzadeh, F. Chiang, H. C. Lee, and R. J. Miller,â€œFramework for evaluating clustering algorithms in duplicate detection,â€ Proc. Very Large Databases Endowment, vol. 2, pp. 1282â€“1293, 2009.
  [8] O. Hassanzadeh and R. J. Miller, â€œCreating probabilistic databases from duplicated data,â€ VLDB J., vol. 18, no. 5, pp. 1141â€“1166, 2009.
  [9] U. Draisbach, F. Naumann, S. Szott, and O. Wonneberg, â€œAdaptive windows for duplicate detection,â€ in Proc. IEEE 28th Int. Conf. Data Eng., 2012, pp. 1073â€“1083.
  [10] S. Yan, D. Lee, M.-Y. Kan, and L. C. Giles, â€œAdaptive arranged neighborhood methods for efficient record linkage,â€ in Proc. 7th ACM/IEEE Joint Int. Conf. Digit. Libraries, 2007, pp. 185â€“194.
  [11] J. Madhavan, S. R. Jeffery, S. Cohen, X. Dong, D. Ko, C. Yu, and A. Halevy, â€œWeb-scale data integration: You can only afford to pay as you go,â€ in Proc. Conf. Innovative Data Syst.Res., 2007.
  [12] S. R. Jeffery, M. J. Franklin, and A. Y. Halevy, â€œPay-as-yougo user feedback for dataspace systems,â€ in Proc. Int. Conf. Manage. Data, 2008, pp. 847â€“860.
  [13] C. Xiao, W. Wang, X. Lin, and H. Shang, â€œTop-k set similarity joins,â€ in Proc. IEEE Int. Conf. Data Eng., 2009, pp. 916â€“927.
  [14] P. Indyk, â€œA small approximately min-wise independent family of hash functions,â€ in Proc. 10th Annu. ACM-SIAM Symp. Discrete Algorithms, 1999, pp. 454â€“456.
  [15] U. Draisbach and F. Naumann, â€œA generalization of blocking and windowing algorithms for duplicate detection,â€ in Proc. Int. Conf. Data Knowl. Eng., 2011, pp. 18â€“24.
Downloads
How to Cite
M, R., & Vasumathi, D. (2018). An Enhancement of Progressive Duplicate Detection with Performance Evaluation. International Journal of Engineering & Technology, 7(3.27), 631-635. https://doi.org/10.14419/ijet.v7i3.27.18510
ACM

ACS

APA

ABNT

Chicago

Harvard

IEEE

MLA

Turabian

Vancouver

Download Citation

Endnote/Zotero/Mendeley (RIS)

BibTeX
Received date: 2018-08-28

Accepted date: 2018-08-28

An Enhancement of Progressive Duplicate Detection with Performance Evaluation

Authors

Abstract

References

Downloads

How to Cite