Information extraction in current Indian web documents

  • Authors

    • Kolla Bhanu Prakash
    2018-03-19
    https://doi.org/10.14419/ijet.v7i2.8.10332
  • Attribute, Bilingual, Classification, Content Extraction, Mining, Pixel-Based Approach, Voxel.
  • Communication and Internet are two major resources in today’s technical, social and scientific disciplines offering a wide range of possibilities in bringing in new approaches and variations in current ones. Web documents are increasingly growing in size, volume and time, bringing in the need to access and process them off and online over the Internet with a PC or a smart phone. When viewed in Indian context, web documents pose different kinds of challenge and the present study addresses some of them taking into account the vagaries in the Indian languages. This has become very relevant in Indian education scenario, where bilingual and multi-lingual communication and web documents through on-line courses, are being generated. When regional native dialect comes into picture, another dimension of complexity is added. After presenting the different kinds of web pages in the Indian perspective, the case for the development of a generic approach id highlighted so that it can blend with current tools of data mining and at the same time cater to vagaries in Indian texts. The approach based on a pixel level addressing of data-which is of large size-, is later modified and reduced to numerical equivalents using matrix manipulations so that they form inputs to some classification approaches, like statistical, pattern matching and neural models. Some typical case studies on text letters and words are presented to highlight the generality of approach and its flexibility to fit into different tools.

  • References

    1. [1] ACE. Annotation guidelines for entity detection and tracking 2004.

      [2] J. Aitken, Learning information extraction rules: An inductive logic programming approach. Proceedings of the 15th European Conference on Artificial Intelligence, pp. 355–359, 2002.

      [3] M. E. Califf and R. J. Mooney, Relational learning of pattern-match rules for information extraction. Proceedings of the Sixteenth National Conference on Artificial Intelligence (AAAI-99), pp. 328–334, July 1999.

      [4] D. Klein and C. D. Manning, Conditional structure versus conditional estimation in NLP models. Workshop on Empirical Methods in Natural Language Processing (EMNLP), 2002.https://doi.org/10.3115/1118693.1118695.

      [5] H. Cunningham, D. Maynard, K. Bontcheva, and V. Tablan, Gate: A framework and graphical development environment for robust nlp tools and applications. Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics, 2002.

      [6] G. Ramakrishnan, Using ILP to construct features for information extraction from semi-structured text. ILP, 2007.

      [7] Maha Al-Yahya, Sawsan Al-Malak, LuluhAldhubayi, Ontological Lexicon Enrichment: The Badea System for Semi-Automated Extraction of Antonymy Relations from Arabic Language Corpora. Malaysian Journal of Computer Science. Vol. 29(1), 2016, pp 56-73.https://doi.org/10.22452/mjcs.vol29no1.5.

      [8] R.G. Raj and S. Abdul-Kareem. A Pattern Based Approach for the Derivation of Base Forms of Verbs from Participles and Tenses for Flexible NLP. Malaysian Journal of Computer Science, Vol. 24(2):Jun. 2011. pp 63-72.

      [9] S.-P. Choi, S. Lee, H. Jung, and S.-K. Song, An Intensive Case Study on Kernel-based Relation Extraction. Multimedia Tools Appl., vol. 71, no. 2, pp. 741–767, Jul. 2014.https://doi.org/10.1007/s11042-013-1380-5.

      [10] P. Pantel and M. Pennacchiotti, Espresso: leveraging generic patterns for automatically harvesting semantic relations. Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics. Stroudsburg, PA, USA, 2006, pp. 113–120.https://doi.org/10.3115/1220175.1220190.

      [11] Bhanu Prakash, K, Mining Issues in Traditional Indian Web Documents. Indian Journal of Science and Technology, 8(32), 2015. https://doi.org/10.17485/ijst/2015/v8i1/77056.

      [12] Bhanu Prakash K, DoraiRangaSwamy MA, Raja Raman A. Feature Extraction studies in a heterogenous web world. International Journal of Applied Engineering Research, 9(22), pp. 16571-16579, 2014.

      [13] Dr.Seetaiah Kilaru, Hari Kishore K, Sravani T, Anvesh Chowdary L, Balaji T “Review and Analysis of Promising Technologies with Respect to fifth Generation Networksâ€, 2014 First International Conference on Networks & Soft Computing, ISSN:978-1-4799-3486-7/14,pp.270-273,August2014.

      [14] Meka Bharadwaj, Hari Kishore "Enhanced Launch-Off-Capture Testing Using BIST Designs†Journal of Engineering and Applied Sciences, ISSN No: 1816-949X, Vol No.12, Issue No.3, page: 636-643, April 2017.

      [15] P Bala Gopal,K Hari Kishore, B.PraveenKittu“An FPGA Implementation of On Chip UART Testing with BIST Techniquesâ€, International Journal of Applied Engineering Research, ISSN 0973-4562, Volume 10, Number 14 , pp. 34047-34051, August 2015

      [16] A Murali, K Hari Kishore, D Venkat Reddy "Integrating FPGAs with Trigger Circuitry Core System Insertions for Observability in Debugging Process†Journal of Engineering and Applied Sciences, ISSN No: 1816-949X, Vol No.11, Issue No.12, page: 2643-2650, December 2016.

      [17] Mahesh Mudavath, K Hari Kishore, D Venkat Reddy "Design of CMOS RF Front-End of Low Noise Amplifier for LTE System Applications Integrating FPGAs†Asian Journal of Information Technology, ISSN No: 1682-3915, Vol No.15, Issue No.20, page: 4040-4047, December 2016.

      [18] N Bala Dastagiri, K Hari Kishore "Novel Design of Low Power Latch Comparator in 45nm for Cardiac Signal Monitoringâ€, International Journal of Control Theory and Applications, ISSN No: 0974-5572, Vol No.9, Issue No.49, page: 117-123, May 2016.

      [19] N Bala Gopal, Kakarla Hari Kishore "Reduction of Kickback Noise in Latched Comparators for Cardiac IMDs†Indian Journal of Science and Technology, ISSN No: 0974-6846, Vol No.9, Issue No.43, Page: 1-6, November 2016.

      [20] S Nazeer Hussain, K Hari Kishore "Computational Optimization of Placement and Routing using Genetic Algorithm†Indian Journal of Science and Technology, ISSN No: 0974-6846, Vol No.9, Issue No.47, page: 1-4, December 2016.

      [21] N.Prathima, K.Hari Kishore, “Design of a Low Power and High Performance Digital Multiplier Using a Novel 8T Adderâ€, International Journal of Engineering Research and Applications, ISSN: 2248-9622, Vol. 3, Issue.1, Jan-Feb., 2013.

      [22] Harikishore Kakarla, Madhavi Latha M and Habibulla Khan, “Transition Optimization in Fault Free Memory Application Using Bus-Align Modeâ€, European Journal of Scientific Research, Vol.112, No.2, pp.237-245, ISSN: 1450-216x135/1450-202x, October 2013.

      [23] T.RajeshKumar&G.R.Suresh, “Examination of Militants utilizing NAM Microphone and Wireless Handset for Murmured Speech in view of Concealed Markov Modelâ€. International Innovative Research Journal of Engineering and Technology. 112-119.

      [24] T. Padmapriya and V. Saminadan, “Inter-cell Load Balancing technique for multi-class traffic in MIMO-LTE-A Networksâ€, International Journal of Electrical, Electronics and Data Communication (IJEEDC), ISSN: 2320- 2084, vol.3, no.8, pp. 22-26, Aug 2015.

      [25] S.V.Manikanthan and V.Rama, “Optimal Performance of Key Predistribution Protocol In Wireless Sensor Networks†International Innovative Research Journal of Engineering and Technology, ISSN NO: 2456-1983, Vol-2, Issue –Special –March 2017.

  • Downloads

  • How to Cite

    Bhanu Prakash, K. (2018). Information extraction in current Indian web documents. International Journal of Engineering & Technology, 7(2.8), 68-71. https://doi.org/10.14419/ijet.v7i2.8.10332