Penerapan Machine Learning untuk Klasifikasi Teks Depresi pada Kesehatan Mental dengan SVM, TF-IDF, dan Chi-Square
Main Article Content
Abstract
Mental health has become a crucial global issue, with increasing numbers of individuals expressing their psychological conditions openly on social media platforms. This study aims to classify tweets related to mental health, specifically depression, using a combination of Support Vector Machine (SVM), Term Frequency-Inverse Document Frequency (TF-IDF) feature extraction, and Chi-Square feature selection techniques. Although this approach has been widely applied in domains such as product and movie reviews, its application in the mental health context remains limited. The main challenge lies in capturing implicit psychological nuances and indirect expressions frequently present in platforms like Twitter, unlike the explicit text in other domains. Moreover, most prior studies have not integrated comprehensive preprocessing stages including lemmatization, stopword removal, and duplicate elimination for mental health data on social media. This research employs a dataset of 26,448 tweets derived from Kaggle and self-crawled data. The best result was achieved using an SVM with an RBF kernel without Chi-Square feature selection, yielding an accuracy of 74.93%. The study demonstrates that a comprehensive preprocessing pipeline can enhance classification performance. However, the model still struggles with sarcastic or ironic contexts. Future research is recommended to adopt deep learning approaches such as BERT or LSTM to capture more complex textual contexts.
Downloads
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
References
S. K. Lipson et al., “Trends in college student mental health and help-seeking by race/ethnicity: Findings from the national healthy minds study, 2013–2021,” J Affect Disord, vol. 306, pp. 138–147, Jun. 2022, doi: 10.1016/j.jad.2022.03.038.
M. A. Mansoor and K. H. Ansari, “Early Detection of Mental Health Crises through Artifical-Intelligence-Powered Social Media Analysis: A Prospective Observational Study,” J Pers Med, vol. 14, no. 9, p. 958, Sep. 2024, doi: 10.3390/jpm14090958.
E. Hokijuliandy, H. Napitupulu, and Firdaniza, “Application of SVM and Chi-Square Feature Selection for Sentiment Analysis of Indonesia’s National Health Insurance Mobile Application,” Mathematics, vol. 11, no. 17, Sep. 2023, doi: 10.3390/math11173765.
Ahmad et al., “Studi Performa TF-IDF dan Word2Vec Pada Analisis Sentimen Cyberbullying,” no. 2, pp. 94–106, 2024, doi: 10.62951/router.v2i2.76.
U. Ikhsani Larasati, M. Aziz Muslim, and R. Arifudin, “Improve the Accuracy of Support Vector Machine Using Chi Square Statistic and Term Frequency Inverse Document Frequency on Movie Review Sentiment Analysis,” Scientific Journal of Informatics, vol. 6, no. 1, pp. 2407–7658, 2019, [Online]. Available: http://journal.unnes.ac.id/nju/index.php/sji
K. Tri Putra, M. Amin Hariyadi, and C. Crysdian, “Perbandingan Feature Extraction TF-IDF Dan BoW Untuk Analisis Sentimen Berbasis SVM,” 2024.
P. Subarkah, P. Arsi, D. I. S. Saputra, A. Aminuddin, Berlilana, and N. Hermanto, “Indonesian Police in the Twitterverse: A Sentiment Analysis Perspectives,” in 2023 IEEE 7th International Conference on Information Technology, Information Systems and Electrical Engineering (ICITISEE), IEEE, Nov. 2023, pp. 76–81. doi: 10.1109/ICITISEE58992.2023.10405357.
W. Andriyani, Y. Astuti, B. A. Wisesa, and D. Hengki, “Sentiment Analysis on Product Reviews with SVM and Word2Vec the,” no. 8, pp. 173–185, 2024, doi: 10.26798/jiko.v8i1.1498.
I. Sari and K. Wardhana, “Analisa Internet Movie Database (IMDb) Menggunakan Algoritma Machine Learning Super Vector Machine,” 2025.
F. Putrawansyah, “Penerapan Metode Support Vector Machine Terhadap Klasifikasi Jenis Jambu Biji,” JIKO (Jurnal Informatika dan Komputer), vol. 8, no. 1, p. 193, Feb. 2024, doi: 10.26798/jiko.v8i1.988.
J. Khatib Sulaiman Dalam No, U. Ilhami Arsyah, M. Pratiwi, and A. Muhammad, “Twitter Sentiment Analysis of Public Space Opinions using SVM and TF-IDF Methods,” Indonesian Journal of Computer Science Attribution, vol. 13, no. 1, pp. 2024–387, 2024.
O. I. Gifari, M. Adha, I. Rifky Hendrawan, F. Freddy, and S. Durrand, “Analisis Sentimen Review Film Menggunakan TF-IDF dan Support Vector Machine,” JIFOTECH (JOURNAL OF INFORMATION TECHNOLOGY, vol. 2, no. 1, 2022.
P. Indriyanti and W. Gunawan, “Pemanfaatan Chi Square dan Ensemble Tree Classifier pada Model SVM, KNN dan C4.5 dalam Penjualan Online,” Faktor Exacta, vol. 17, no. 3, p. 314, Oct. 2024, doi: 10.30998/faktorexacta.v17i3.24149.
S. Afriyani, S. Surono, and I. M. Solihin, “Chi-Square Feature Selection with Pseudo-Labelling in Natural Language Processing,” JTAM (Jurnal Teori dan Aplikasi Matematika), vol. 8, no. 3, p. 896, Jul. 2024, doi: 10.31764/jtam.v8i3.22751.
M. Rakha, M. Dwi Sulistiyo, D. Nasien, and M. Ridha, “A Combined MobileNetV2 and CBAM Model to Improve Classifying the Breast Cancer Ultrasound Images,” Journal of Applied Engineering and Technological Science, vol. 6, no. 1, pp. 561–578, 2024.
Muhammad Ridha, Dade Nurjanah, and Muhammad Rakha, “Multilabel Classification Abusive Language and Hate Speech on Indonesian Twitter using Transformer Model: IndoBERTweet & IndoRoBERTa,” in The 4th International Conference on Intelligent Cybernetics Technology & Applications 2024 (ICICyTA 2024), 2024.
