Modelling, Simulation, and Analysis of Sequence-Based Models for Smart Lighting Voice Command Classifiers with MFCC-Based Data Augmentation
DOI:
https://doi.org/10.35314/r4p60871Keywords:
Voice Command Classification, Smart Lighting, Data Augmentation, Bi-LSTM, DNN, GRU, MFCC Features, Temporal DependenciesAbstract
Voice command classification is essential for smart lighting systems in IoT environments. However, existing approaches often struggle in real-world scenarios with background noise and speaker variability due to limited and imbalanced training data. This indicates a need for models that maintain high accuracy under such conditions. To address this, the study evaluates three deep learning architectures: a Deep Neural Network (DNN), a Gated Recurrent Unit (GRU), and a bidirectional Long Short-Term Memory (LSTM) network, run on the Google Speech Commands dataset. The classification targets six voice commands (“right”, “off”, “left”, “on”, “down”, “up”) using Mel-Frequency Cepstral Coefficients (MFCCs) as features. Data augmentation techniques, including pitch shifting, time stretching, mix-up, and noise injection, are used to expand the dataset, balance class distributions, and simulate acoustic conditions such as background noise and speaker differences. Model performance is assessed through confusion matrices and receiver operating characteristic curves (ROC-AUC) across training, validation, and test sets. The bidirectional LSTM achieves the highest test accuracy (94%), followed by GRU (92%) and DNN (79%). The LSTM model also generalizes well, showing no signs of overfitting and maintaining stable performance in the presence of acoustic variation. These results suggest that combining bidirectional LSTM with MFCC-based augmentation provides a more robust approach to voice command recognition, particularly in IoT-based smart lighting contexts, where environmental variability is common.
Downloads
Downloads
Published
Issue
Section
License
Copyright (c) 2025 INOVTEK Polbeng - Seri Informatika

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
 
						
 
  
  
  
  
  
  
  
  
  
  
  
 