Comparative Evaluation of Preprocessing Techniques in Twitter Sentiment Analysis for Indonesia’s 2024 Regional Elections

Asro; Solihin

doi:10.35314/tt65bb54

Authors

Asro PGRI Banten Polytechnic Author
Solihin PGRI Banten Polytechnic Author

DOI:

https://doi.org/10.35314/tt65bb54

Keywords:

Sentiment Analysis, Twitter, Regional Elections 2024, Naïve Bayes, Logistic Regression

Abstract

The rapid expansion of social media has positioned Twitter as a critical platform for capturing public opinion during political events, including Indonesia’s 2024 Regional Elections. This study investigates the impact of preprocessing strategies and class balancing on the performance of sentiment analysis models applied to election-related tweets. An initial dataset of 9,096 tweets was collected and refined into 6,202 relevant entries from 2024–2025 through text cleaning, normalization, tokenization, and duplicate removal. Sentiment distribution analysis reveals a dominance of positive sentiment (58.4%), followed by negative (33.6%) and neutral (8.0%) expressions. Two classical machine learning classifiers—Naïve Bayes and Logistic Regression—were implemented using TF–IDF feature representation. To address class imbalance, the Synthetic Minority Oversampling Technique (SMOTE) was applied exclusively to the training data, and hyperparameter optimization was conducted using GridSearchCV. Model evaluation employed an 80/20 train–test split with accuracy, precision, recall, F1-score, and confusion matrices as performance metrics. Experimental results indicate that logistic regression combined with SMOTE and hyperparameter tuning achieved the highest accuracy of 93.08%, outperforming Naive Bayes. The findings confirm that carefully designed preprocessing pipelines and class balancing significantly enhance the reliability of sentiment classification in political social media analysis.