From Data Imbalance to Precision: SMOTE-Driven Machine Learning for Early Detection of Kidney Disease

Aldani  Adi Bhirawa; Ucta  Pradema Sanjaya

doi:10.35314/7jgjmg64

Authors

Aldani Adi Bhirawa Universitas Ngudi Waluyo Author
Pradema Sanjaya Universitas Ngudi Waluyo Author

DOI:

https://doi.org/10.35314/7jgjmg64

Keywords:

Gradient Boosting, Chronic Kidney, SMOTE, Random Forest

Abstract

Chronic Kidney Disease (CKD) has become a significant global health issue, with its prevalence rising sharply, particularly in developing countries like Indonesia. According to the Kementrian Kesehatan (KEMENKES), the Synthetic Minority Over-sampling Technique (SMOTE) has been widely adopted to address this. SMOTE generates synthetic samples for the minority class, enhancing the model’s ability to identify high-risk patients. Studies demonstrate SMOTE’s effectiveness, particularly when combined with ensemble learning algorithms like Random Forest and Gradient Boosting. The data collection focused on relevant medical parameters critical for the study, encompassing laboratory test results, diagnostic reports, and clinical observations related to kidney function. This dataset in kidney disease is used to predict whether someone has chronic kidney disease or not with a total sample of 400 data obtained from the Ungaran Regional Hospital and several clinics that can detect kidney disease. Recent research highlights that SMOTE significantly improves model accuracy, with Random Forest achieving 99.30% accuracy. These findings emphasise the importance of data balancing in enhancing diagnostic precision, offering promising avenues for early CKD detection and improved patient outcomes.