Optimization of Sentiment Analysis on Tokopedia User Reviews Using Gridsearchcv and Smote with Machine Learning Algorithms
DOI:
https://doi.org/10.35314/5ax8km80Keywords:
Sentiment Analysis, Machine Learning, SMOTE, E-commerce, TF-IDFAbstract
Understanding user sentiment from e-commerce reviews is essential for platform improvement and business strategy. This study compares three machine learning algorithms—Logistic Regression, Random Forest, and XGBoost—for sentiment classification of Indonesian-language Tokopedia reviews. A dataset of 6,822 user reviews was preprocessed through tokenization, stopword removal, and TF-IDF vectorization. To address class imbalance, the Synthetic Minority Oversampling Technique (SMOTE) was applied to the training set. Models were evaluated using accuracy, precision, recall, and F1-score. Results demonstrate that Random Forest achieved the highest accuracy at 86.86%, followed by Logistic Regression at 84.86%, and XGBoost at 82.60%. The application of SMOTE significantly improved classification performance across all models, particularly for minority sentiment classes. These findings indicate that tree-based ensemble methods, especially Random Forest, are effective for sentiment analysis in imbalanced e-commerce datasets. This research provides practical insights for e-commerce platforms to implement automated sentiment monitoring systems, enabling faster response to customer feedback and targeted service improvements. However, the study is limited to Tokopedia reviews and may not generalize to other platforms or languages. Future work should explore deep learning approaches and cross-platform validation to enhance model robustness.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 INOVTEK Polbeng - Seri Informatika

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
