Comparative Evaluation of Preprocessing Techniques in Twitter Sentiment Analysis for Indonesia’s 2024 Regional Elections
DOI:
https://doi.org/10.35314/tt65bb54Keywords:
Sentiment Analysis, Twitter, Regional Elections 2024, Naïve Bayes, Logistic RegressionAbstract
The rapid expansion of social media has positioned Twitter as a critical platform for capturing public opinion during political events, including Indonesia’s 2024 Regional Elections. This study investigates the impact of preprocessing strategies and class balancing on the performance of sentiment analysis models applied to election-related tweets. An initial dataset of 9,096 tweets was collected and refined into 6,202 relevant entries from 2024–2025 through text cleaning, normalization, tokenization, and duplicate removal. Sentiment distribution analysis reveals a dominance of positive sentiment (58.4%), followed by negative (33.6%) and neutral (8.0%) expressions. Two classical machine learning classifiers—Naïve Bayes and Logistic Regression—were implemented using TF–IDF feature representation. To address class imbalance, the Synthetic Minority Oversampling Technique (SMOTE) was applied exclusively to the training data, and hyperparameter optimization was conducted using GridSearchCV. Model evaluation employed an 80/20 train–test split with accuracy, precision, recall, F1-score, and confusion matrices as performance metrics. Experimental results indicate that logistic regression combined with SMOTE and hyperparameter tuning achieved the highest accuracy of 93.08%, outperforming Naïve Bayes. The findings confirm that carefully designed preprocessing pipelines and class balancing significantly enhance the reliability of sentiment classification in political social media analysis.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 INOVTEK Polbeng - Seri Informatika

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

