Authorship Identification of Punjabi Poetry

  • Abstract
  • Keywords
  • References
  • PDF
  • Abstract

    The problem of identifying the author of an anonymous text is basically Authorship Identification. It is nothing but a single-label text-categorization task, from the ML point-of-view. An assumption is made that an unknown text’s author can be differentiated by comparing a few lexical features extracted from theunknown text with the same of texts having known authors. In this paper, the process of Authorship Identification is executed on Punjabi poetry dataset consisting of Punjabi poems written by 5 different poets. Various features broadly categorised as statistical (word-count, char-count, etc.), syntactical (i.e. lexical) and semantically (language dependent) are first selected using the J48 Decision Tree Algorithm. The selected features are in turn, used as an input to multiple classifiers (like SVM, SMO, Bayes Net & Naive Bayes) and the proposed system’s validation is evaluated on the basis of Precision, Recall, F-score and Accuracy.



  • Keywords

    Authorship Identification, Punjabi poetry corpus, Feature extraction, J48 Decision Tree, Bayes Net Classifier, Naive Bayes Classifier

  • References

      [1] FarkhundIqbal, HamadBinsalleeh, Benjamin C.M. Fung,MouradDebbabi, 2015, “E-mail authorship attribution usingcustomized associative classification”,DigitalInvestigation(Elsevier),Vol.7,pp.56-64

      [2] Sanjanasri J.P andAnand Kumar M, “A Computational Framework for Tamil DocumentClassification using Random Kitchen Sink”, IEEE 2015, International Conference onAdvances in Computing, Communications and Informatics(ICACCI)

      [3] Mahmoud Khonji, Youssef Iraqi, Andrew Jones,“An Evaluation of Authorship Attribution Using Random Forests”, IEEE 2015, International Conference on Information andCommunication Technology Research (ICTRC2015)

      [4] Ahmed Fawziotoom, Emad E Abdullah, ShifaaJaafar, AseerHamdellh, Dana Amer, “Towards Author Identification of Arabic Text Articles”, IEEE 2014, 5th InternationalConference on Information and Communication Systems(ICICS)

      [5] Pandian, A., and Md. Abdul KarimSadiq, 2014, “AuthorshipCategorization In Email Investigations Using Fisher’s LinearDiscriminate Method With Radial Basis Function”, InternationalJournal of Computer Science, Vol.10,No.6,pp.1003-1014 (SNIP: 0.874)

      [6] Al-Falahi Ahmed, Ramdani Mohammad, Bellahfkimustafa, Al-Sarem Mohammad, “Authorship Attribution in Arabic Poetry”,78-1- 4799-7560- 0/15, 2015, IEEE

      [7] Ahmed FawziOtoom, Emad E. Abdullah, ShifaaJaafer, AseelHamdallh, Dana Amer“Towards Author Identification of Arabic Text Articles”, 2014,IEEE, 5th International Conference on Information andCommunication Systems (ICICS)

      [8] BhargavaUrala k, A.G.Ramakrishnan and Sahil Mohammad, “Recognition of Open Vocabulary, Online Tamil HandwrittenPages in Tamil Script”, 2014 IEEE, Vol.42, No.3, pp.6-9.

      [9] Pandian A. and Md. Abdul KarimSadiq, 2012, “Detection ofFraudulent Emails by Authorship Extraction”, InternationalJournal of Computer Application Vol.41, No.7, pp.7 – 12.

      [10] Pandian A. and Md. Abdul KarimSadiq, 2013, “AuthorshipAttribution in Tamil Language Email For Forensic Analysis”,International Review on Computers and Software, Vol. 8, No. 12, pp.2882-2888, (SNIP: 1.178).

      [11] M.Mahalakshmi, MalathiSharavanan, “Ancient Tamil ScriptRecognition and Translation Using LabVIEW”, IEEE, 2013,International conference on Communication and SignalProcessing, April 3-5.

      [12] FarkhundIqbal, HamadBinsalleeh, Benjamin C.M. Fung,MouradDebbabi, 2010, “Mining writeprints from anonymous e-mails for forensic investigation”,Digital Investigation(Elsevier),Vol.7,pp.56-64

      [13] Bagavandas, M., Hameed, A., Manimannan G, 2009, “NeuralComputation in Authorship Attribution: The Case of SelectedTamil Articles”, Journal Quantitative Linguistics, Vol.16, No.2, pp.115-131.

      [14] R Chandrasekaran and G Manimannan, 2013, “Use ofGeneralized Regression Neural Network in AuthorshipAttribution”, International Journal of Computer Applications, Vol.62, No.4, pp.7-10.

      [15] Pandian A. and Md. Abdul KarimSadiq, 2014, “A study ofAuthorship Identification Techniques in Tamil Articles”,International Journal of Software and Web Sciences, Vol. 7 No.1, pp.105-108.




Article ID: 21987
DOI: 10.14419/ijet.v7i4.19.21987

Copyright © 2012-2015 Science Publishing Corporation Inc. All rights reserved.