Preprint has been submitted for publication in journal
Preprint / Version 1

The Impact of Text Data Preprocessing for Review Analysis E-Wallet Application on Google Play Store


Pengaruh Preprocessing Data Teks untuk Analisis Ulasan Aplikasi E-Wallet di Google Play Store

##article.authors##

DOI:

https://doi.org/10.21070/ups.6279

Keywords:

Google Play Store, Sentiment Analysis, Dana, Preprocessing

Abstract

This Research aims to optimize preprocessing techniques in sentiment analysis of reviews for the E-Wallet Dana application on the Google Play Store. Text preprocessing is a crucial step in Natural Language Processing (NLP) that affects the accuracy and efficiency of sentiment analysis. This study employs various preprocessing methods, including stopwords removal, stemming, and lemmatization, to clean and prepare the review data before analysis. The results show that lemmatization techniques significantly improve accuracy compared to basic preprocessing techniques such as stopwords removal and stemming. With proper preprocessing optimization, sentiment analysis can provide more accurate and informative results, which can be used to enhance the application's quality and user experience. This study uses SVM classification testing models with 4 kernels, where the highest results were achieved with cleaning, case folding, tokenization, and lemmatization techniques at 100% for Linear; 100% for RBF, 99% for Polynomial, and 99.50% for Sigmoid with average accuracy 99.63%.

Downloads

Download data is not yet available.

References

N. D. Abrilia and S. Tri, “Pengaruh Persepsi Kemudahan Dan Fitur Layanan Terhadap Minat Menggunakan E-Wallet Pada Aplikasi Dana Di Surabaya,” J. Pendidik. Tata Niaga, vol. 8, no. 3, pp. 1006–1012, 2020.

S. P. Anggraini and S. Suaidah, “Sistem Informasi Sentral Pelayanan Publik dan Administrasi Kependudukan Terpadu dalam Peningkatan Kualitas Pelayanan Kepada Masyarakat Berbasis Website …,”J. Teknol. Dan Sist. Inf., vol. 3, no. 1, pp. 12–19, 2022, [Online]. Available: http://jim.teknokrat.ac.id/index.php/sisteminformasi/article/view/1658%0Ahttp://jim.tekn krat.ac.id/index.p hp/sisteminformasi/article/viewFile/1658/579

J. W. Iskandar and Y. Nataliani, “Perbandingan Naïve Bayes, SVM, dan k-NN untuk Analisis Sentimen Gadget Berbasis Aspek,” J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 5, no. 6, pp. 1120–1126, Dec. 2021, doi: 10.29207/resti.v5i6.3588.

M. A. Rosid, A. S. Fitrani, I. R. I. Astutik, N. I. Mulloh, and H. A. Gozali, “Improving Text Preprocessing for Student Complaint Document Classification Using Sastrawi,” IOP Conf. Ser. Mater. Sci. Eng., vol. 874, no. 1, 2020, doi: 10.1088/1757-899X/874/1/012017.

R. Ulgasesa, A. B. P. Negara, and T. Tursina, “Pengaruh Stemming Terhadap Performa Klasifikasi Sentimen Masyarakat Tentang Kebijakan New Normal,” J. Sist. dan Teknol. Inf., vol. 10, no. 3, p. 286, 2022, doi: 10.26418/justin.v10i3.53880.

Z. R. N. S. Prasetija, A. Romadhony, and E. B. Setiawan, “Analisis Pengaruh Normalisasi Teks pada Klasifikasi Sentimen Ulasan Produk Kecantikan,” e-Proceeding Eng., vol. 9, no. 3, pp. 1769–1775, 2022, [Online]. Available: https://openlibrarypublications.telkomuniversity.ac.id/index.php/engineering/article/view/18184/17795 [7] H. H. Mubaroroh, H. Yasin, and A. Rusgiyono, “Analisis Sentimen Data Ulasan Aplikasi Ruangguru Pada Situs Google Play Menggunakan Algoritma Naïve Bayes Classifier Dengan Normalisasi Kata Levenshtein Distance,” J. Gaussian, vol. 11, no. 2, pp. 248–257, 2022, doi: 10.14710/j.gauss.v11i2.35472.

G. A. BUNTORO, R. ARIFIN, G. N. SYAIFUDDIIN, A. SELAMAT, O. KREJCAR, and H. FUJITA,“Implementation of a Machine Learning Algorithm for Sentiment Analysis of Indonesia‘s 2019 Presidential Election,” IIUM Eng. J., vol. 22, no. 1, pp. 78–92, 2021, doi: 10.31436/IIUMEJ.V22I1.1532.

D. Duei Putri, G. F. Nama, and W. E. Sulistiono, “Analisis Sentimen Kinerja Dewan Perwakilan Rakyat (DPR) Pada Twitter Menggunakan Metode Naive Bayes Classifier,” J. Inform. dan Tek. Elektro Terap., vol. 10, no. 1, pp. 34–40, 2022, doi: 10.23960/jitet.v10i1.2262.

A. B. Putra Negara, “The Influence Of Applying Stopword Removal And Smote On Indonesian Sentiment Classification,” Lontar Komput. J. Ilm. Teknol. Inf., vol. 14, no. 3, p. 172, 2023, doi: 10.24843/lkjiti.2023.v14.i03.p05.

S. J. Angelina, A. Bijaksana, P. Negara, and H. Muhardi, “Analisis Pengaruh Penerapan Stopword Removal Pada Performa Klasifikasi Sentimen Tweet Bahasa Indonesia,” vol. 02, no. 1, pp. 165–173, 2023, doi: 10.26418/juara.v2i1.69680.

H. A. Almuzaini and A. M. Azmi, “Impact of Stemming and Word Embedding on Deep Learning-Based Arabic Text Categorization,” IEEE Access, vol. 8, pp. 127913–127928, 2020, doi:

1109/ACCESS.2020.3009217.

O. Manullang, C. Prianto, and N. H. Harani, “Analisis Sentimen Untuk Memprediksi Hasil Calon Pemilu Presiden Menggunakan Lexicon Based Dan Random Forest,” J. Ilm. Inform., vol. 11, no. 02, pp. 159–169, 2023, doi: 10.33884/jif.v11i02.7987.

F. Noer Azzahra et al., “Penerapan Metode Naive Bayes Dalam Klasifikasi Spam SMS Menggunakan Fitur Teks Untuk Mengatasi Ancaman Pada Pengguna,” J. Inf. Syst. Res., vol. 5, no. 3, p. 880, 2024, doi: 10.47065/josh.v5i3.5070.

D. A. Vonega, A. Fadila, and D. E. Kurniawan, “Analisis Sentimen Twitter Terhadap Opini Publik Atas Isu Pencalonan Puan Maharani dalam PILPRES 2024,” J. Appl. Informatics Comput., vol. 6, no. 2, pp. 129–135, 2022, doi: 10.30871/jaic.v6i2.4300.

Posted

2024-08-16