An XML based Web Crawler with Page Revisit Policy and  Updation in Local Repository of Search Engine

Jyoti Mor; Dr Dinesh Rai; Dr Naresh Kumar

doi:10.14419/ijet.v7i3.12924

Article Summary Abstract References Full Article How to cite

Authors
- Jyoti Mor Ansal University, Gurugram
- Dr Dinesh Rai Ansal University, Gurugram
- Dr Naresh Kumar MSIT, New Delhi
How to Cite

Mor, J., Dinesh Rai, D., & Naresh Kumar, D. (2018). An XML based Web Crawler with Page Revisit Policy and Updation in Local Repository of Search Engine. International Journal of Engineering and Technology, 7(3), 1119-1123. https://doi.org/10.14419/ijet.v7i3.12924

ACM

ACS

APA

ABNT

Chicago

Harvard

IEEE

MLA

Turabian

Vancouver

Download Citation

Endnote/Zotero/Mendeley (RIS)

BibTeX
Received date: May 17, 2018

Accepted date: June 1, 2018

Published date: June 23, 2018
https://doi.org/10.14419/ijet.v7i3.12924
WWW, Search Engine, Web Crawler, Network Resources, Page Revisit.
Abstract

In a large collection of web pages, it is difficult for search engines to keep their online repository updated. Major search engines have hundreds of web crawlers that crawl the WWW day and night and send the downloaded web pages via a network to be stored in the search engineâ€™s database. These results in over utilization of network resources like bandwidth, CPU cycles and so on. This paper proposes an architecture that tries to reduce the utilization of shared network resources with the help of an advanced XML based approach. This focused crawling based architecture is trained to download only the high quality data from the internet leaving behind the web pages which are not relevant to the desired domain. Here, a detailed layout of the proposed system is described which is capable of reducing the load on network and reducing the problem arise in residency of mobile agent at the remote server.
Â
Â
References
1. [1] B. Mahar and C. K. Jha. â€œA Comparative Study on Web Crawling for searching Hidden Web.â€ International Journal of Computer Science and Information Technologies, 6, (2015), 2159-2163.
  [2] M. S. Ahuja, J. S. Bal and Varnica. â€œWeb Crawler: Extracting the Web Data.â€ International Journal of Computer Trends and Technology, 13(2014), 132-137. https://doi.org/10.14445/22312803/IJCTT-V13P128.
  [3] R. Nath and N. Kumar. â€œA Novel Parallel Domain Focused Crawler for Reduction in Load on the Network.â€ International Journal of Computational Engineering Research2 (2012), 77-84.
  [4] A. Amaliae, D. Gunwan and A. Najwan. â€œFocused crawler for the acquisition of health articlesâ€ International Conference on Data and Software Engineering, 2016. https://doi.org/10.1109/ICODSE.2016.7936110.
  [5] T. Harry, Y. Achsan and W. C. Wibow. â€œA Fast Distributed Focused-web Crawling.â€ 24th DAAAM International Symposium on Intelligent Manufacturing and Automation, a proceeding of Science Direct (2014), 492 â€“ 499, https://doi.org/10.1016/j.proeng.2014.03.017.
  [6] A. Pranav and S. Chauhan. â€œEfficient Focused Web Crawling Approach for Search Engine.â€ International Journal of Computer Science and Mobile Computing, 4(2015), 545-551.
  [7] A. Gupta and P. Anad. â€œFocused web crawlers and its approaches.â€ International Conference on Futuristic Trends on Computational Analysis and Knowledge Management, IEEE (2015). https://doi.org/10.1109/ABLAZE.2015.7154936.
  [8] A. Garg, K. Gupta and A. Singh. â€œSurvey of Web Crawler Algorithms.â€ International Journal of Advanced Research in Computer Science, 8 (2017), 426-428.
  [9] M. Kausar, V. S. Dhaka and S. K. Singh. â€œWeb Crawler: A Reviewâ€ International Journal of Computer Applications 63 (2013), 31-36.
  [10] C. Saini and V. Arora. â€œInformation retrieval in web crawling: A survey.â€ International Conference on Advances in Computing, Communications and Informatics, IEEE (2016). https://doi.org/10.1109/ICACCI.2016.7732456.
  [11] G. Pant, P. Srinivasan and F. Menczer â€œCrawling the Web.â€ Web Dynamics. Springer, Berlin, Heidelberg, (2004), 153-177. https://doi.org/10.1007/978-3-662-10874-1_7.
  [12] C. Castillo and R. Yates. â€œPractical Issues of Crawling Large Web Collections.â€ URL: http://chato.cl/papers/castillo_05_practical_web_crawling.pdf.
  [13] P. Dahiwale, M. M.Raghuwanshi and L. Malik. â€œDesign and Implementation of Focused Web Crawler Using Genetic Algorithm: An Approach to Web Mining.â€ International Journal of Scientific & Engineering Research, 6 (2015), 254-259.
  [14] M. A. Kausar, M. Nasarand S. K. Singh. â€œA Detailed Study on Information Retrieval using Genetic Algorithm.â€ Journal of Industrial and Intelligent Information, 1 (2013), 122-127. https://doi.org/10.12720/jiii.1.3.122-127.
  [15] A. Sefyi, A. Patel and J.C. Junior. â€œEmpirical evaluation of link and content-based Focused Treasure Crawler.â€ Computer Standards & Interfaces, 44(2016) 54-62. https://doi.org/10.1016/j.csi.2015.09.007.
  [16] H. Lu, D. Zhan, L. Zhou and D. He, â€œAn Improved Focused Crawler: Using Web Page Classification and Link Priority Evaluation.â€ Mathematical Problems in Engineering, 2016(2016), 1-11. https://doi.org/10.1155/2016/6406901.
  [17] https://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm, 03/02/2018, at 8.05am IST.
  [18] M. Kumar, R. Bhatia and A Ohri. â€œDesign of focused crawler for information retrieval of Indian Origin Academicians.â€ IEEE (2016) https://doi.org/10.1109/ICACCA.2016.7578895.
  [19] S. Brin and L. Page. â€œThe Anatomy of a Large-Scale Hyper textual Web Search Engine.â€ WWW conference (1998).
Downloads
How to Cite
Mor, J., Dinesh Rai, D., & Naresh Kumar, D. (2018). An XML based Web Crawler with Page Revisit Policy and Updation in Local Repository of Search Engine. International Journal of Engineering and Technology, 7(3), 1119-1123. https://doi.org/10.14419/ijet.v7i3.12924
ACM

ACS

APA

ABNT

Chicago

Harvard

IEEE

MLA

Turabian

Vancouver

Download Citation

Endnote/Zotero/Mendeley (RIS)

BibTeX
Received date: May 17, 2018

Accepted date: June 1, 2018

Published date: June 23, 2018

An XML based Web Crawler with Page Revisit Policy and Updation in Local Repository of Search Engine

Authors

How to Cite

Abstract

References

Downloads

How to Cite

Published