Preprint has been published in a journal as an article
Preprint / Version 1

Topic Modelling in COVID-19 Vaccination Refusal Cases Using Latent Dirichlet Allocation and Latent Semantic Analysis

Pemodelan Topik Pada Kasus Tolak Vaksinasi COVID-19 Menggunakan Latent Dirichlet Allocation dan Latent Semantic Analysis

##article.authors##

DOI:

https://doi.org/10.21070/ups.774

Keywords:

Text Mining, LDA, LSA, Topic Modeling, Pemodelan Topik, COVID-19, Vaksin, Anti Vaksin, Twitter, Data Twitter, Scrapping, Crawling, Big Data, Wordcloud

Abstract

The COVID-19 vaccination program in Indonesia goes hand in hand with issues that are circulating, causing controversy and rejection of vaccination on social media, especially Twitter. There are many factors that influence vaccine rejection on Twitter, to summarize frequently discussed topics and find out hidden topics, this study uses the Latent Dirichlet Allocation (LDA) and Latent Semantic Analysis (LSA) methods. The topics that appear will be further analyzed through coherence score by applying a limit of 20 topics to display the best value. Further modeling experiments are carried out to display topics through LDA and LSA models, this study takes 6 topics with highest coherence values including the right of individuals to choose whether to be vaccinated or not (0.484607), the Ribka Tjiptaning controversy (0.473368), rejection of the COVID-19 vaccine by groups represented by public figures (0.463631), punishment for non-compliance in the form of fines (0.324924), and halal certification (0.312521). 

Downloads

Download data is not yet available.

References

S. S. Aljameel et al., “A sentiment analysis approach to predict an individual’s awareness of the precautionary procedures to prevent covid-19 outbreaks in Saudi Arabia,” Int. J. Environ. Res. Public Health, vol. 18, no. 1, pp. 1–12, 2021, doi: 10.3390/ijerph18010218.

P. A. Sumitro, Rasiban, D. I. Mulyana, and W. Saputro, “Analisis Sentimen Terhadap Vaksin Covid-19 di Indonesia pada Twitter Menggunakan Metode Lexicon Based,” J-ICOM - J. Inform. dan Teknol. Komput., vol. 2, no. 2, pp. 50–56, 2021, doi: 10.33059/j-icom.v2i2.4009.The Oxford Dictionary of Computing, 5th ed. Oxford: Oxford University Press, 2003.

Q. A. Chairunnisa, Y. Herdiyeni, M. K. D. Hardhienata, and J. Adisantoso, “Analisis Sentimen Pengguna Twitter Terhadap Program Vaksinasi Covid-19 di Indonesia Menggunakan Algoritme Support Vector Machine,” J. Ilmu Komput. dan Agri-Informatika, vol. 9, no. 1, pp. 9–89, 2022, doi: 10.29244/jika.9.1.79-89.O. B. R. Strimpel, "Computer graphics," in McGraw-Hill Encyclopedia of Science and Technology, 8th ed., Vol. 4. New York: McGraw-Hill, 1997, pp. 279-283.

J. Xue, J. Chen, C. Chen, C. Zheng, S. Li, and T. Zhu, “Public discourse and sentiment during the COVID 19 pandemic: Using latent dirichlet allocation for topic modeling on twitter,” PLoS One, vol. 15, no. 9 September, pp. 1–12, 2020, doi: 10.1371/journal.pone.0239441.A. Altun, Understanding hypertext in the context of reading on the web: Language learners' experience," Current Issues in Education, vol. 6, no. 12, July 2003. [Online]. Available: http://cie.ed.asu.edu/volume6/number12/. [Accessed Dec. 2, 2004].

F. F. Rachman and S. Pramana, “Analisis Sentimen Pro dan Kontra Masyarakat Indonesia tentang Vaksin COVID-19 pada Media Sosial Twitter,” Heal. Inf. Manag. J., vol. 8, no. 2, pp. 100–109, 2020, [Online]. Available: https://inohim.esaunggul.ac.id/index.php/INO/article/view/223/175J. R. Beveridge and E. M. Riseman, "How easy is matching 2D line models using local search?" IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, pp. 564-579, June 1997.

A. Muzaki and A. Witanti, “Sentiment Analysis of the Community in the Twitter To the 2020 Election in Pandemic Covid-19 By Method Naive Bayes Classifier,” J. Tek. Inform., vol. 2, no. 2, pp. 101–107, 2021, doi: 10.20884/1.jutif.2021.2.2.51.

S. Sarica and J. Luo, “Stopwords in technical language processing,” PLoS One, vol. 16, no. 8 August, pp. 1–13, 2021, doi: 10.1371/journal.pone.0254937.J. Lach, "SBFS: Steganography based file system," in Proc. of the 2008 1st Int. Conf. on Information Technology, IT 2008, 19-21 May 2008, Gdansk, Poland [Online]. Available: IEEE Xplore, http://www.ieee.org. [Accessed: 10 Sept. 2010].

A. Amalia, D. Gunawan, Y. Fithri, and I. Aulia, “Automated Bahasa Indonesia essay evaluation with latent semantic analysis,” J. Phys. Conf. Ser., vol. 1235, no. 1, 2019, doi: 10.1088/1742-6596/1235/1/012100.T. J. van Weert and R. K. Munro, Eds., Informatics and the Digital Society: Social, ethical and cognitive issues: IFIP TC3/WG3.1&3.2 Open Conf. on Social, Ethical and Cognitive Issues of Informatics and ICT, July 22-26, 2002, Dortmund, Germany. Boston: Kluwer Academic, 2003.

B. O. Karo Karo, D. S. Naga, and V. C. Mawardi, “Perancangan Aplikasi Pendeteksi emiripan Teks Dengan Menggunakan Metode Latent Semantic Analysis,” Comput. J. Comput. Sci. Inf. Syst., vol. 4, no. 1, p. 1, 2020, doi: 10.24912/computatio.v4i1.7191.European Telecommunications Standards Institute, “Digital Video Broadcasting (DVB): Implementation guidelines for DVB terrestrial services; transmission aspects,” European Telecommunications Standards Institute, ETSI TR-101-190, 1997. [Online]. Available: http://www.etsi.org. [Accessed: Aug. 17, 1998].

H. J. Kang, C. Kim, and K. Kang, “Analysis of the trends in biochemical research using latent dirichlet allocation (LDA),” Processes, vol. 7, no. 6, pp. 1–14, 2019, doi: 10.3390/PR7060379.G. Sussman, "Home page - Dr. Gerald Sussman," July 2002. [Online]. Available: http://www.comm.pdx.edu/faculty/Sussman/sussmanpage.htm. [Accessed: Sept. 12, 2004].

L. W. Narendra, “Topic Modeling in Conversational Dialogs for Naming Intent Labels Using LDA,” JTECS J. Sist. Telekomun. Elektron. Sist. Kontrol Power Sist. dan Komput., vol. 2, no. 1, p. 65, 2022, doi: 10.32503/jtecs.v2i1.1820.A. Karnik, “Performance of TCP congestion control with rate feedback: TCP/ABR and rate adaptive TCP/IP,” M. Eng. thesis, Indian Institute of Science, Bangalore, India, Jan. 1999.

D. Ridhwanulah and D. H. Fudholi, “Pemodelan Topik pada Cuitan tentang Penyakit Tropis di Indonesia dengan Metode Latent Dirichlet Allocation,” J. Ilm. SINUS, vol. 20, no. 1, p. 11, 2022, doi: 10.30646/sinus.v20i1.589.Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specification, IEEE Std. 802.11, 1997.

F. Alattar and K. Shaalan, “Emerging Research Topic Detection Using Filtered-LDA,” Ai, vol. 2, no. 4, pp. 578–599, 2021, doi: 10.3390/ai2040035.

A. H. Ardiansyah, K. P. Kartika, and S. N. Budiman, “Penerapan Latent Semantic Indexing Pada Sistem Temu Balik Informasi Pada Undang-Undang Pemilu Berdasarkan Kasus,” J. Mnemon., vol. 4, no. 2, pp. 64–70, 2021.

S. Qomariyah, N. Iriawan, and K. Fithriasari, “Topic modeling Twitter data using Latent Dirichlet Allocation and Latent Semantic Analysis,” AIP Conf. Proc., vol. 2194, no. December 2019, 2019, doi: 10.1063/1.5139825.

T. Williams and J. Betak, “A Comparison of LSA and LDA for the Analysis of Railroad Accident Text,” vol. 11, no. 1, pp. 11–15, 2019, doi: 10.5383/JUSPN.11.01.002.

Posted

2023-04-11