Comparative Analysis of Random Forest and Xgboost Performance for Network Flow-Based Malware Classification

Fajar Adji Wicaksana; Chaerul Umam

doi:10.35314/8f891c76

Authors

Fajar Adji Wicaksana Dian Nuswantoro University Author
Chaerul Umam Dian Nuswantoro University Author

DOI:

https://doi.org/10.35314/8f891c76

Keywords:

Malware Detection, Network Flow, Random Forest, XGBoost, Computational Efficiency

Abstract

The evolving complexity of cyber threats, particularly malware propagation through network infrastructure, necessitates intrusion detection mechanisms that are both precise and computationally efficient. This study presents an in-depth comparative analysis of two ensemble learning algorithms, Random Forest (RF) and Extreme Gradient Boosting (XGBoost), in classifying network traffic anomalies based on network flow features. Empirical validation was conducted using the CSE-CIC-IDS2018 dataset, which comprehensively represents a spectrum of modern attacks. The research methodology systematically includes data preprocessing, handling class imbalance via weighting techniques, and performance evaluation based on accuracy, F1-score, and inference time metrics. Experimental results indicate that both models achieved high performance convergence with perfect Area Under Curve (AUC) scores. However, XGBoost demonstrated technical superiority with an accuracy of 99.8%, slightly surpassing Random Forest at 99.4%. The most significant finding of this study lies in computational efficiency, where XGBoost proved to be 14% faster (6.36 seconds) in prediction compared to Random Forest (7.42 seconds) on a large-scale test set. This fact confirms that the boosting architecture in XGBoost offers an optimal balance between detection sensitivity and system latency. Based on this evidence, XGBoost is recommended as the best classification model for real-time intrusion detection system implementations that prioritize rapid threat response.