Application of Machine Learning for Classifying and Identifying Security Threats Using a Supervised Learning Algorithm Approach
DOI:
https://doi.org/10.35314/aqjdbj22Keywords:
Supervised Learning Algorithms, Random Forest, Malware, ImbalanceAbstract
The rapid growth of harmful web content has intensified the demand for intelligent systems capable of accurately classifying cyber threats based on URL patterns. This study evaluates two widely used supervised learning algorithms, Random Forest and Naïve Bayes, for probabilistic classification of multi-class URL datasets. A synthetic dataset comprising 547,775 URLs was designed to reflect realistic threat distribution: benign (65.74%), phishing (14.46%), defacement (14.81%), and malware (4.99%). Each sample included simple structural features such as URL length, number of dots, HTTPS usage, and keyword indicators. Both models were tested using identical stratified train-test splits with varying sample sizes, including focused experiments on 15,000 and 100,000 entries. Results revealed that both models achieved high recall and precision only for the benign class, while failing to detect minority classes. For Random Forest, precision and recall for benign URLs reached 1.00, but dropped to 0.00 for phishing, defacement, and malware in all test scenarios. Naïve Bayes exhibited similar shortcomings, highlighting the impact of class imbalance and limited feature expressiveness. This research concludes that while Random Forest and Naïve Bayes are computationally efficient, they are inadequate for detecting cyber threats without preprocessing techniques such as SMOTE, cost-sensitive learning, or feature enrichment. Future work will explore adaptive hybrid models with contextual features and deep learning frameworks to enhance multi-class detection in real-world cybersecurity scenarios.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 INOVTEK Polbeng - Seri Informatika

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.