Sentiment Analysis of Indonesian Movie Review using K-Nearest Neighbors and Information Gain

  • Authors

    • Ria Ine Pristiyanti
    • M. Ali Fauzi
    • Lailil Muflikhah
    2018-12-03
    https://doi.org/10.14419/ijet.v7i4.38.27911
  • Movie Review, Sentiment Analysis, Information Gain, K-Nearest Neighbors
  • Abstract

    Movie review is a necessity for movie lover to get information about people opinion on the movie to watch. However, movie lover cannot read all of the movie review manually. It will be costly and time consuming. Therefore, automatic way to analyze them is needed. In this study, we use bag of word (BOW) model and utilize IG to select the best features before KNN is employed to classify the review into positive or negative. The result for using all of term for classification is better than the feature selection due to the elimination of term having low information gain value with 92% accuracy.

     

  • References

    1. [1] Antinasari P, Perdana RS, Fauzi MA. Analisis Sentimen Tentang Opini Film Pada Dokumen Twitter Berbahasa Indonesia Menggunakan Naive Bayes Dengan Perbaikan Kata Tidak Baku. Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer. 2017; 1(12):1733-41.

      [2] Claudy YI, Perdana RS, Fauzi MA. Klasifikasi Dokumen Twitter Untuk Mengetahui Karakter Calon Karyawan Menggunakan Algoritme K-Nearest Neighbor (KNN). Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer. 2018; 2(8):2761-65.

      [3] Fanissa S, Fauzi MA, Adinugroho S. Analisis Sentimen Pariwisata di Kota Malang Menggunakan Metode Naive Bayes dan Seleksi Fitur Query Expansion Ranking. Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer.2018; 2(8):2766-70.

      [4] Fauzi MA, Arifin AZ, Gosaria SC. Indonesian News Classification Using Naïve Bayes and Two-Phase Feature Selection Model. Indonesian Journal of Electrical Engineering and Computer Science. 2017 Dec 1;8(3).

      [5] Fauzi MA, Arifin A, Yuniarti A. Term Weighting Berbasis Indeks Buku dan Kelas untuk Perangkingan Dokumen Berbahasa Arab. Lontar Komputer: Jurnal Ilmiah Teknologi Informasi. 2013;5(2).

      [6] Fauzi MA, Arifin AZ, Yuniarti A. Arabic Book Retrieval using Class and Book Index Based Term Weighting. International Journal of Electrical and Computer Engineering (IJECE). 2017 Dec 1;7(6):3705-10.

      [7] Fauzi MA, Afirianto T. Improving Sentiment Analysis of Short Informal Indonesian Product Reviews using Synonym Based Feature Expansion. TELKOMNIKA (Telecommunication Computing Electronics and Control). 2018 Jun 1;16(3).

      [8] Fauzi MA, Yuniarti A. Ensemble Method for Indonesian Twitter Hate Speech Detection. Indonesian Journal of Electrical Engineering and Computer Science. 2018 Jul 1;11(1).

      [9] Gunawan F, Fauzi MA, Adikara PP. Analisis Sentimen Pada Ulasan Aplikasi Mobile Menggunakan Naive Bayes Dan Normalisasi Kata Berbasis Levenshtein Distance (Studi Kasus Aplikasi BCA Mobile). Systemic: Information System and Informatics Journal. 2017 Des 31; 3(2):1-6.

      [10] Hussein DM. A survey on sentiment analysis challenges. Journal of King Saud University-Engineering Sciences. 2016 Apr 26.

      [11] Khan, M.T., Durrani, M., Ali, A., Inayat, I., Khalid, S., Khan, H.,. Sentiment analysis and the complex natural language. 2016. Pakista: Complex adaptive system modeling.

      [12] Mentari ND, Fauzi MA, Muflikhah L. Analisis Sentimen Kurikulum 2013 Pada Sosial Media Twitter Menggunakan Metode K-Nearest Neighbor dan Feature Selection Query Expansion Ranking. Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer. 2018; 2 (8):2739-43.

      [13] Nilsson, NJ. Introduction To Machine Learning. 1996. Stanford Univercity.

      [14] Nurjanah WE, Perdana RS, Fauzi MA. Analisis Sentimen Terhadap Tayangan Televisi Berdasarkan Opini Masyarakat pada Media Sosial Twitter menggunakan Metode K-Nearest Neighbor dan Pembobotan Jumlah Retweet. Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer. 2017; 1 (12), 1750-57.

      [15] Pramukantoro ES, Fauzi MA. Comparative analysis of string similarity and corpus-based similarity for automatic essay scoring system on e-learning gamification. InAdvanced Computer Science and Information Systems (ICACSIS), 2016 International Conference on 2016 Oct 15 (pp. 149-155). IEEE.

      [16] Rofiqoh U, Perdana RS, Fauzi MA. Analisis Sentimen Tingkat Kepuasan Pengguna Penyedia Layanan Telekomunikasi Seluler Indonesia Pada Twitter Dengan Metode Support Vector Machine dan Lexicon Based Features. Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer. 2017; 1(12):1725-32.

      [17] Suharno CF, Fauzi MA, Perdana RS. Klasifikasi Teks Bahasa Indonesia Pada Dokumen Pengaduan Sambat Online Menggunakan Metode K-Nearest Neighbors dan Chi-Square. Systemic: Information System and Informatics Journal. 2017 Dec 7;3(1):25-32.

      [18] Tala FZ. A study of stemming effects on information retrieval in Bahasa Indonesia. Institute for Logic, Language and Computation, Universiteit van Amsterdam, The Netherlands. 2003 Jul.

      [19] UÄŸuz H. A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm. Knowledge-Based Systems. 2011 Oct 31;24(7):1024-32.

      Yang Y, Pedersen JO. A comparative study on feature selection in text categorization. InIcml 1997 Jul 8 (Vol. 97, pp. 412-420).
  • Downloads

  • How to Cite

    Ine Pristiyanti, R., Ali Fauzi, M., & Muflikhah, L. (2018). Sentiment Analysis of Indonesian Movie Review using K-Nearest Neighbors and Information Gain. International Journal of Engineering & Technology, 7(4.38), 1499-1501. https://doi.org/10.14419/ijet.v7i4.38.27911

    Received date: 2019-02-24

    Accepted date: 2019-02-24

    Published date: 2018-12-03