A Multilabel Approach to Portuguese Clinical Named Entity Recognition

João Vitor Andrioli de Souza, Elisa Terumi Rubel Schneider, Josilaine Oliveira Cezar, Lucas Emanuel Silva e Oliveira, Yohan Bonescki Gumiel, Emerson Cabrera Paraiso, Douglas Teodoro, Claudia Maria Cabral Moro Barra


Objectives: Clinical Named Entity Recognition is a critical Natural Language Processing task, as it could support biomedical research and healthcare systems. While most extracted clinical entities are based on single-label concepts, it is very common in the clinical domain entities with more than one semantic category simultaneously. This work proposes BERT-based models to support multilabel clinical named entity recognition in the Portuguese language. Methods: For the experiment, we used the Label Powerset method applied to the multilabel corpus SemClinBr. Results: We compare our results with a Conditional Random Fields baseline, reaching +2.1 in precision, +11.2 in recall, and +7.4 in F1 with a clinical-biomedical BERT model (BioBERTpt). Conclusion: We achieved higher results for both exact and partial metrics, contributing to the multilabel semantic processing of clinical narratives in Portuguese.


Clinical Named Entity Recognition; Label Powerset; BERT

Texto completo: PDF

Journal of Health Informatics - ISSN 2175-4411
Rua Tenente Gomes Ribeiro, 57 - sala 33 CEP 04038-040 São Paulo - SP - Brasil
Tel./Fax: + 55 11 3791 3343 - E-mail: jhi@sbis.org.br