Optimization of Biobert Model for Medical Entity Recognition Through Bilstm and CNN-Char Integration
DOI:
https://doi.org/10.35314/bypwas91Keywords:
Named Entity Recognition (NER), BioBERT, BiLSTM, CNN-Char, BC5CDR DatasetAbstract
Biomedical Named Entity Recognition (NER) is essential for extracting structured information from medical texts. However, existing models like BioBERT face challenges when dealing with complex biomedical entities, particularly those with intricate morphological structures. This research enhances the BioBERT model by integrating BiLSTM and character-level CNN (CNN-Char), aiming to improve the recognition of Chemical and Disease entities. The proposed models were trained and evaluated on the BC5CDR dataset sourced from the official BioCreative V CDR Corpus. The modified model achieved an F1-score of 0.8678, indicating a significant improvement compared to the standard BioBERT model, which scored 0.8597. This increase is primarily observed in the recognition of complex entity structures, particularly those requiring character-level representation. Despite this improvement, the model is limited to Chemical and Disease entities and may not generalise to other biomedical categories. Future work should focus on expanding the entity types and exploring other model architectures, such as SciBERT or BioALBERT, to further enhance performance
Downloads
Downloads
Published
Issue
Section
License
Copyright (c) 2025 INOVTEK Polbeng - Seri Informatika

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.