UDDWE: Universal domain deep web exposer

 
 
 
  • Abstract
  • Keywords
  • References
  • PDF
  • Abstract


    Traditionally, the search engines were able to extract web data which is smaller in size as compared to much relevant and quality data (also called hidden web data) hidden behind search interfaces. A lot of research has been done to extract this web data to fetch its relevant and quality content. However, most of the methods are domain specific i.e. for multiple domains multiple tools are designed. In this paper, a novel method is proposed to present one universal tool for all the domains. The key point in this approach is the customization of the traditional search engine to receive the user query to process it for identifying the entry points (search interfaces) to the hidden web. After this filtering process, the entry points are presented for opening in a controlled programmed environment to ease the data extraction process.

     

     

     

     



  • Keywords


    Deep Web; Hidden Web; Information Retrieval; Search Interfaces; Universal; Domains.

  • References


      [1] The Deep Web: Surfacing Hidden Value. http://www.completeplanet.com/Tutorials/DeepWeb/.

      [2] S. Lawrence and C. L. Giles. Searching the World Wide Web. Science, 280(5360):98, 1998. https://doi.org/10.1126/science.280.5360.98.

      [3] S. Lawrence and C. L. Giles. Accessibility of information on the web. Nature, 400:107{109, 1999}.

      [4] Manpreet Singh Sehgal and Anuradha. “HWPDE: Novel Approach for Data Extraction from Structured Web Pages.” Published in International Journal of Computer Applications (0975-8887) Volume 50 – No. 8. July 2012 pages 22-27.

      [5] Bing Liu, Robert Grossman, and Yanhong Zhai. Mining data records in web pages. In KDD ‟03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 601–606, New York, NY, USA, 2003.ACM Press). https://doi.org/10.1145/956750.956826.

      [6] Cai, D., Yu, S., Wen, J.-R., and Ma, W.-Y. 2003. VIPS: a Vision-based Page Segmentation Algorithm. Tech. Rep. MSR-TR-2003-79, Microsoft Technical Report.

      [7] Anuradha, A.K Sharma. “A Novel Technique for data extraction From Hidden Web Databases Published in International Journal of Computer Applications (09758887) Volume 15-No. 4 February 2011 pages 45-48.

      [8] YalinWang and Jianying Hu. A machine learning based approach for table detection on the web. In WWW ‟02: Proceedings of the 11th international conference on World Wide Web, pages.

      [9] Michael Benedikt, Georg Gottlob, and Pierre Senellart. 2011. Determining relevance of accesses at runtime. In Proc. of PODS. 211–222. https://doi.org/10.1145/1989284.1989309.

      [10] Andrea Cal`ı and Davide Martinenghi. 2008. Querying Data under Access Limitations. In Proc. of ICDE. 50–59.

      [11] Andrea Cal`ı, Davide Martinenghi, and Riccardo Torlone. 2016. Keyword Queries over the Deep Web. In Proc. of ER 2016. 260– 268.

      [12] Andrea Cal`ı and Umberto Straccia. 2015. A Framework for Conjunctive Query Answering over Distributed Deep Web Information Resources. In Proc. of SEBD. 358–365.

      [13] Kevin Chen-Chuan Chang, Bin He, and Zhen Zhang. 2005. Toward Large Scale Integration: Building a MetaQuerier over Databases on the Web. In Proc. of CIDR. 44–55.

      [14] Tim Furche, Georg Gottlob, Giovanni Grasso, Xiaonan Guo, Giorgio Orsi, and Christian Schallhart. 2013. The ontological key: automatically understanding and integrating forms to access the deep Web. VLDB J. 22, 5 (2013), 615–640. https://doi.org/10.1007/s00778-013-0323-0.

      [15] M.S., Prasad J.S. (2019) All Domain Hidden Web Exposer Ontologies: A Unified Approach for Excavating the Web to Unhide Deep Web. In: Tiwari S., Trivedi M., Mishra K., Misra A., Kumar K. (eds) Smart Innovations in Communication and Computational Sciences. Advances in Intelligent Systems and Computing, vol 851. Springer, Singapore.

      [16] Andrea Calí. 2017. Querying and searching the deep web. In Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics (WIMS '17). ACM, New York, NY, USA, Article 3, 1 pages. https://doi.org/10.1145/3102254.3102257.

      [17] Gollmann D. (2011) Problems with Same Origin Policy. In: Christianson B., Malcolm J.A., Matyas V., Roe M. (eds) Security Protocols XVI. Security Protocols 2008. Lecture Notes in Computer Science, vol 6615. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22137-8_11.

      [18] D. Song, Y. Luo and J. Heflin, "Linking Heterogeneous Data in the Semantic Web Using Scalable and Domain-Independent Candidate Selection," in IEEE Transactions on Knowledge and Data Engineering, vol. 29, no. 1, pp. 143-156, Jan. 1 2017. https://doi.org/10.1109/TKDE.2016.2606399.

      [19] Langley, P., Pearce, C., Barley, M. et al. Mind Soc (2014) 13: 83. https://doi.org/10.1007/s11299-014-0143-y.

      [20] Liakos, P., Ntoulas, A., Labrinidis, A. et al. World Wide Web (2016) 19: 605. https://doi.org/10.1007/s11280-015-0349-x.


 

View

Download

Article ID: 15751
 
DOI: 10.14419/ijet.v7i4.15751




Copyright © 2012-2015 Science Publishing Corporation Inc. All rights reserved.