Design and development of text extraction and retrieval using style of documents in web searching

  • Authors

    • S. Balan
    • P. Ponmuthuramalingam
    2017-12-28
    https://doi.org/10.14419/ijet.v7i1.2.9038
  • Web Search, Text Extraction, Data Alignment, Data Retrieval.
  • This research focuses on study and extraction of web pages and documents are returned from goggle search engine. The useful task of web is to exactly match the accurate information. That information are categorized into many ways such as manual, structured, semi-structured texts and images. Query Result Records (QRR’s) is used to extract the text information from the different type of documents. Data region is used to identify the actual segmentation step and the domain of documents contains suffix and prefix. Time compared to the existing pruning and other techniques are more efficient in manner. We analyze the different type of alignments in this paper and propose a new technique for alignment retrieval to find precision and recall evaluating the retrieval performance.

  • References

    1. [1] Bhosale, C (2015). Automatic Annotation of Query Results from Deep Web Database. International Journal of Engineering Sciences & Research Technology, 1(4), pp. 239-246.

      [2] Crescenzi, G. Mecca, and P. Merialdo (2003), “Road Runner: Towards Automatic Data extraction from Large Web Sites,†Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 109-118, 2001 Web Conf. (APWeb), pp.406-417.

      [3] Hai He, Hongkun Zhao, Y. Yiyao Lu, Weiyi Meng (mar. 2013), Annotating Search Result Records from web databases, IEEE Transaction on Knowledge and Data Engg., 25( 3), pp. 239-246.

      [4] Hammer, J. McHugh, and H. Garcia-Molina, (1997) “Semi structured Data: The TSIMMIS Experience,†Proc .East-European Workshop Advances in Databases and Information Systems (ADBIS), pp. 1-8.

      [5] http://db.cis.upenn.edu/DL/www8.pdf (accessed on 12th Nov 16)

      [6] Jadhav, T., & Chobe, S. (2015). Data Extraction and Alignment of Search Results by Combining Tag Value Structure. IJETT, 2(2). Pp. 381-384.

      [7] Liu, W., Meng, X., & Meng, W. (2006). Vision-based web data records extraction. In Proc. 9th international workshop on the web and databases (pp. 20-25).

      [8] Lu.Y, H. He, H. Zhao, W. Meng, and C. Yu (2007), Annotating Structured Data of the Deep Web, Procedure IEEE 23rd Intl Conference Data Eng. (ICDE). Pp. 1-18.

      [9] Manjula, R., & Chilambu chelvan, A. (2013). Hauling Templates from Web Pages Using Clustering Techniques. International Journal of Engineering Sciences & Emerging Technologies, 5(2), pp. 119-126.

      [10] Muneeswari, G. (2014). Agent based Authentication for Deep Web Data Extraction. International Journal of Innovative Research in Information Security (IJIRIS), 2(4), pp. 44-52.

      [11] Patel, D., & Thakkar, A. (2015). A Survey of Unsupervised technique for web data extraction. International Journal of Computer Science, 6(2), pp. 1-5.

      [12] Shen, W., & Zou, X. (2015). An Algorithm on Web Article Automatic Extraction Based on DOM Structure. International Journal of Hybrid Information Technology, 8(3), 243-254. https://doi.org/10.14257/ijhit.2015.8.3.22.

      [13] Sriramoju, S. B. (2014). An Application for Annotating Web Search Results. Proc. International Journal of Innovative Research in Computer and Communication Engineering (An ISO 3297: 2007 Certified Organization) Vol, 2. Pp. 3306-3312.

      [14] Stern, R., & Sagot, B. (2012, June). Population of a knowledge base for news metadata from unstructured text and web data. In Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction Association for Computational Linguistics. pp. 35-40.

      [15] Thomas, S (2014). Clustering Based Annotation of Search Results. International Journal of Emerging Trends in Engineering and Development 4(3). Pp.123-130.

      [16] Yogam, V., & Uma maheswari (2014), K. Automatic Annotation Wrapper Generation and Mining Web Database Search Result. International Journal of Innovative Research in Science, Engineering and Technology, 3(3). Pp 10562-10569.

  • Downloads

  • How to Cite

    Balan, S., & Ponmuthuramalingam, P. (2017). Design and development of text extraction and retrieval using style of documents in web searching. International Journal of Engineering & Technology, 7(1.2), 130-134. https://doi.org/10.14419/ijet.v7i1.2.9038