Open Problems in Indonesian Automatic Essay Scoring System


  • Faisal Rahutomo
  • Trisna Ari Roshinta
  • Erfan Rohadi
  • Indrazno Siradjuddin
  • Rudy Ariyanto
  • Awan Setiawan
  • Supriatna Adhisuwignjo



Indonesian, Natural language processing, Automatic essay scoring system, Open problems.


This paper presents open problems in Indonesian Scoring System. The previous study exposes the comparison of several similarity metrics on automated essay scoring in Indonesian. The metrics are Cosine Similarity, Euclidean Distance, and Jaccard. The data being used in the research are about 2,000 texts. This data are obtained from 50 students who answered 40 questions on politics, sports, lifestyle, and technology. The study also evaluates the stemming approach for the system performance. The difference between all methods between using stemming or not is around 4-9%. The results show Jaccard is the best metric both for the system with stemming or not. Jaccard method with stemming has the percentage error lowest than the others. The politic category has the highest average similarity score than lifestyle, sport, and technology. The percentage error of Jaccard with stemming is 52.31%, Cosine Similarity is 59.49%, and Euclidean Distance is 332.90%. In addition, Jaccard without stemming is also the best than the others. The percentage error without stemming of Jaccard is 56.05%, Cosine Similarity is 57.99%, and Euclidean Distance is 339.41%. However, this percentage error is high enough to be used for a functional essay grading system. The percentage errors are relatively high, more than 50%. Therefore this paper explores several ideas of open problems in this issue. The openly available dataset can be used to develop better approaches than the standard similarity metrics. The approaches expose are ranging from feature extraction, similarity metrics, learning algorithm, environment implementation, and performance evaluation.




[1] M. A. Raihan, R. H. Shamim, C. K. Clement, and H. S. Lock, “A Study on Assessment & Evaluation of Engineering Students’ Learning by Essay Test Based on The Cognitive Domain of Bloom’s,†Int. J. Adv. Eng. Technol., vol. 6, no. 1, pp. 1–11, 2013.

[2] T. Kakkonen and E. Sutinen, “Automatic Assessment of the Content of Essays Based on Course Materials,†in ITRE 2004. 2nd International Conference Information Technology: Research and Education, 2004, pp. 126–130.

[3] S. Ghosh and S. S. Fatima, “Design of an Automated Essay Grading (AEG) System in Indian Context,†in TENCON 2008 - 2008 IEEE Region 10 Conference, 2008, pp. 1–6.

[4] T. Roshinta and F. Rahutomo, “Analisis Aspek-Aspek Ujian Esai Daring Berbahasa Indonesia,†Pros. Sentrinov (Seminar Nas. Terap. Ris. Inov., vol. 2, no. 1, 2016.

[5] F. Rahutomo and T. A. Roshinta, “Indonesian Query Answering Dataset for Online Essay Test System.†.

[6] R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval: The Concepts and Technology Behind Search, 2nd ed. USA: Addison-Wesley Publishing Company, 2008.

[7] C. D. Manning, P. Raghavan, and H. Schütze, Introduction to Information Retrieval. New York, NY, USA: Cambridge University Press, 2008.

[8] M. Adriani, J. Asian, B. Nazief, S. M. M. Tahaghoghi, and H. E. Williams, “Stemming Indonesian: A Confix-stripping Approach,†vol. 6, no. 4, pp. 1–33, Dec. 2007.

[9] F. Z Tala, “A Study of Stemming Effects on Information Retrieval in Bahasa Indonesia,†2003.

[10] A. Z. Broder, S. C. Glassman, M. S. Manasse, and G. Zweig, “Syntactic Clustering of the Web,†Comput. Networks ISDN Syst., vol. 29, no. 8, pp. 1157–1166, 1997.

[11] P. Bahasa, Kamus Tesaurus Bahasa Indonesia. Departemen Pendidikan Nasional, 2008.

[12] R. Navigli, “Word Sense Disambiguation: A Survey,†ACM Comput. Surv., vol. 41, no. 2, pp. 1–69, 2009.

[13] F. Rashel, A. Luthfi, A. Dinakaramani, and R. Manurung, “Building an Indonesian Rule-Based Part-of-Speech Tagger,†in 2014 International Conference on Asian Language Processing (IALP), 2014, pp. 70–73.

[14] Y. Wibisono, “Indonesian Stopword,†2008. [Online]. Available: [Accessed: 01-Aug-2018].

[15] D. Doyle, “Indonesian Stopword.†[Online]. Available: [Accessed: 01-Aug-2018].

[16] G. Salton, A. Wong, and C. S. Yang, “A Vector Space Model for Automatic Indexing,†Commun. ACM, vol. 18, no. 11, pp. 613–620, Nov. 1975.

[17] G. Salton and C. Buckley, “Term-Weighting Approaches in Automatic Text Retrieval,†Inf. Process. Manag., vol. 24, no. 5, pp. 513–523, Aug. 1988.

[18] S. Robertson and H. Zaragoza, “The Probabilistic Relevance Framework: BM25 and Beyond,†Found. Trends® Inf. Retr., vol. 3, no. 4, pp. 333–389, 2009.

[19] Y. Goldberg and O. Levy, “word2vec Explained: Deriving Mikolov et al.’s Negative-Sampling Word-Embedding Method,†2014. .

[20] S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman, “Indexing by Latent Semantic Analysis,†J. Am. Soc. Inf. Sci., vol. 41, no. 6, pp. 391–407, 1990.

[21] E. Gabrilovich and S. Markovitch, “Computing Semantic Relatedness Using Wikipedia-Based Explicit Semantic Analysis,†in Proceedings of the 20th international joint conference on Artifical intelligence, 2007, pp. 1606–1611.

[22] G. A. Miller, “WordNet: A Lexical Database for English,†Commun. ACM, vol. 38, no. 11, pp. 39–41, Nov. 1995.

[23] R. Feldman and J. Sanger, Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. New York, NY, USA: Cambridge University Press, 2006.

[24] A. Kao and S. R. Poteet, Natural Language Processing and Text Mining. Springer Publishing Company, Incorporated, 2006.

[25] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. The MIT Press, 2016.

[26] V. Klema and A. Laub, “The Singular Value Decomposition: Its Computation and Some Applications,†IEEE Trans. Automat. Contr., vol. 25, no. 2, pp. 164–176, 1980.

[27] I. T. Jolliffe, Principal Component Analysis. Springer Verlag, 1986.

[28] J. Benesty, J. Chen, Y. Huang, and I. Cohen, “Pearson Correlation Coefficient,†in Noise Reduction in Speech Processing, Berlin, Heidelberg: Springer Berlin Heidelberg, 2009, pp. 1–4.

View Full Article: