Comparison of K-Means++ and Agglomerative Hierarchical Methods in Clustering Healthcare Workers

Citra Tjipta Nur Handayani; Melkior  N. N. Sitokdana

doi:10.35314/pcbrs043

Authors

Citra Tjipta Nur Handayani Universitas Kristen Satya Wacana Author
Melkior N. N. Sitokdana Universitas Kristen Satya Wacana Author

DOI:

https://doi.org/10.35314/pcbrs043

Keywords:

clustering, k-means++, agglomerative hierarchical, python, Healthcare Workforce

Abstract

As an archipelagic country, Indonesia faces disparities in the distribution of healthcare workers, influenced by its diverse geographical conditions. These disparities impact the equitable access to healthcare services across the country. This study aims to compare the effectiveness of two clustering methods, namely K-Means++ and Agglomerative Hierarchical Clustering, using secondary data from Statistics Indonesia (BPS) on the Number of Healthcare Workers by Province in 2023, covering 38 provinces and 13 categories of healthcare professions.The evaluation was conducted using three metrics: Silhouette Score to measure cluster cohesion, Davies-Bouldin Index to assess inter-cluster separation, and Calinski-Harabasz Index to compare inter-cluster variance. The results show that Agglomerative Hierarchical outperformed K-Means++ in Silhouette Score (0.550) and Davies-Bouldin Index (0.457), while K-Means++ performed better in the Calinski-Harabasz Index (63.630). A 2D PCA visualization further illustrates the structural differences between the clusters formed by each method. These findings provide insights into selecting the most appropriate clustering method for analyzing the distribution of healthcare workers and can support data-driven decision-making by policymakers