Preprint has been published in a journal as an article
DOI of the published article https://doi.org/10.36040/jati.v10i2.17829
Preprint / Version 1

Prediction Model of Duolingo User Perception Based on Digital Comments

Model Prediksi Presepsi Pengguna Duolingo Berdasarkan Komentar Digital

##article.authors##

DOI:

https://doi.org/10.21070/ups.10597

Keywords:

Sentiment Analysis, Duolingo, Support Vector Machine (SVM), Decision Tree, TF-IDF, Random Oversampling, Machine Learning

Abstract

Duolingo is a language-learning application that receives user reviews on the Google Play Store. These reviews reflect user perceptions of the application’s quality and effectiveness. The challenge lies in handling amounts of unstructured text data and sentiment class imbalance, requiring a method capable of producing accurate and balanced predictions. This study aims to develop a prediction model of Duolingo user perceptions and compare the performance of Support Vector Machine (SVM) and Decision Tree algorithms in classifying positive and negative sentiments. The dataset consists of 10,000 reviews collected between 2020 and 2025. Research stages include preprocessing, labeling, feature extraction using TF-IDF, splitting with ratios of 80:20 and 70:30, and handling imbalance using Random Oversampling (ROS). Evaluation uses a confusion matrix with accuracy, precision, recall, and F1-score. Results show SVM with an 80:20 ratio achieves 93.57% accuracy, while ROS improves negative recall to 56.19%, making SVM with ROS the most optimal model.

Downloads

Download data is not yet available.

References

I. Duolingo, “Duolingo surpasses 50 million daily active users and grows revenue in 2025,” Duolingo Investor Relations. [Online]. Available: https://investors.duolingo.com

Y. Finance, “Duolingo surpasses 50 million daily active users,” Yahoo Finance. [Online]. Available: https://finance.yahoo.com/news/duolingo-surpasses-50-million-daily-210100511.html

J. Empati, T. Salsabila, N. Nafilah, F. Patangga, S. Zulfa, and N. Listyaningsih, “LITERATURE REVIEW : EFEKTIVITAS PENGGUNAAN APLIKASI DUOLINGO TERHADAP MOTIVASI BELAJAR,” vol. 13, pp. 302–312, 2024.

Friska Aditia Indriyani, Ahmad Fauzi, and Sutan Faisal, “Analisis sentimen aplikasi tiktok menggunakan algoritma naïve bayes dan support vector machine,” TEKNOSAINS J. Sains, Teknol. dan Inform., vol. 10, no. 2, pp. 176–184, 2023, doi: 10.37373/tekno.v10i2.419.

S. Putri, A. Ati, P. Muharani, and R. Qori, “Sentiment Analysis of Gojek User Reviews using TF-IDF and Machine Learning in Orange Platform,” vol. 6, no. 4, pp. 296–305, 2025.

E. F. Laili et al., “KOMPARASI ALGORITMA DECISION TREE DAN SUPPORT VECTOR MACHINE ( SVM ) DALAM,” vol. 8, no. 1, pp. 67–76, 2025.

F. Alifiana, M. F. Asnawi, I. A. Ihsannudin, M. A. M. Baihaqy, and D. Asmarajati, “Analisis Sentimen Aplikasi Duolingo Menggunakan Algoritma Naïve Bayes Dan Support Machine Learning,” Device, vol. 13, no. 2, pp. 223–230, 2023, doi: 10.32699/device.v13i2.5905.

K. A. Rokhman, B. Berlilana, and P. Arsi, “Perbandingan Metode Support Vector Machine Dan Decision Tree Untuk Analisis Sentimen Review Komentar Pada Aplikasi Transportasi Online,” J. Inf. Syst. Manag., vol. 3, no. 1, pp. 1–7, 2021, doi: 10.24076/joism.2021v3i1.341.

Z. Abidin, T. Suratno, and M. F. Putri, “PENERAPAN RANDOM OVERSAMPLING DAN PRINCIPAL COMPONENT ANALYSIS UNTUK MENINGKATKAN AKURASI PREDIKSI KEBANGKRUTAN APPLICATION OF RANDOM OVERSAMPLING AND PRINCIPAL COMPONENT ANALYSIS TO ENHANCE THE ACCURACY OF BANKRUPTCY PREDICTION FOR,” vol. 12, no. 5, 2025.

T. A. Anastasya, A. Diani, P. Saka, M. Juventus, D. Deke, and A. M. Rizki, “OPTIMASI ALGORITMA SVM DENGAN PSO UNTUK ANALISIS SENTIMEN PADA MEDIA SOSIAL X TERHADAP KINERJA TIMNAS DI ERA SHIN TAE-YONG,” vol. 9, no. 1, pp. 384–391, 2025.

M. I. Prayugah, U. Indahyanti, and N. Ariyanti, “Analisis sentimen publik pada pemerintah dalam serangan ransomware dengan pendekatan smote,” vol. 8, no. 2, pp. 333–343, 2024.

U. Indahyanti et al., “Perbandingan Algoritma Machine Learning dalam,” vol. 11, no. 2, pp. 96–105, 2025.

Z. Y. Burnama, M. A. Rosid, and N. L. Azizah, “Analisis Sentimen Pada Komentar Youtube Dalam Turnamen MPL Season 13 Dengan Metode Ensemble Machine Learning Sentiment Analysis on YouTube Comments in MPL Season 13 Tournament Using Ensemble Machine Learning Method”.

A. Zhahrina, U. Sofiah, D. Wahyu, A. Andayani, and N. Khairurrabbani, “Penerapan Metode Support Vector Machine ( SVM ) dalam Analisis Sentimen Ulasan Aplikasi Tokopedia di Google Play Store,” no. September 2025, pp. 84–91.

A. Okta, K. Adi, A. Sanjaya, and A. B. Setiawan, “Penerapan Inset Lexicon untuk Analisis Sentimen Penonton Video JKT48 di YouTube,” vol. 9, pp. 1276–1283.

J. Penerapan, T. Informasi, D. S. Sarassati, S. Yulianto, J. Prasetyo, and B. Rob, “IT-EXPLORE Analisis sentimen terhadap dampak banjir rob dengan menggunakan metode Support Vector Machine,” vol. 04, pp. 233–244, 2025, doi: 10.24246/itexplore.v4i2.2025.pp233-244.

R. A. P. Sari, S. Kacung, and B. Santoso, “ANALISIS SENTIMEN LAYANAN KESEHATAN BPJS MENGGUNAKAN METODE SVM,” pp. 878–885, 2025.

D. Fitriono, R. Indriati, and A. Ristyawan, “Analisis Sentimen Ulasan Aplikasi Chatgpt Menggunakan Algoritma Support Vector Machine dan Lexicon Based,” vol. 3, no. 2, pp. 101–111, 2025.

S. V. M. Tf-idf, “Analisis Sentimen terhadap RSUD Salatiga Menggunakan Abstrak,” vol. 6, no. 1, pp. 478–489, 2025.

F. M. Fadillah, Y. Cahyana, and A. Fauzi, “BULLETIN OF COMPUTER SCIENCE RESEARCH Analisis Sentimen Masyarakat Terhadap Pembatasan BBM Pertalite Menggunakan Random Forest dan K-Nearest Neighbor,” vol. 5, no. 4, pp. 340–352, 2025, doi: 10.47065/bulletincsr.v5i4.547.

Posted

2026-04-08