Overview of SympTEMIST at BioCreative VIII: corpus, guidelines and evaluation of systems for the detection and normalization of symptoms, signs and findings from text

Salvador Lima-López, Eulàlia Farré-Maduell, Luis Gasco, Jan Rodríguez-Miret, Martin Krallinger

November, 2023

Abstract

Recent advances in NLP techniques, the use of large language models and Transformers are showing promising results for processing clinical content. The development of tools for automatic recognition of medical concepts, variables, and clinical expressions is key for the semantic analysis of clinical records, semantic search engines and the generation of structured data representations. Despite the importance of medical procedures for management, diagnosis prevention and prognosis, there are few comprehensive resources for medical procedure extraction and normalization. In order to foster the development of procedure mention detection and entity linking systems, we have released the MedProcNER (Medical Procedures Name Entity Recognition) corpus, a high quality, manually annotated collection of 1000 clinical case reports written in Spanish. The corpus has been exhaustively labeled by physicians following detailed annotation guidelines and quality control measurements. Additionally, a multilingual Silver Standard corpus has also been generated for English, Italian, French, Portuguese, Romanian, Dutch, Swedish and Czech, to provide a clinical NLP resource for research in these languages. A total of 9 teams from 8 different countries have participated in the MedProcNER track of BioASQ 2023 (part of CLEF 2023), using mostly Transformers architectures and models like RoBERTA, BioMBERT, ALBERT, Longformers or SapBERT. MedProcNER was structured into three sub-tracks: a) Clinical Procedure Entity Recognition task, b) Clinical Procedure Normalization task to SNOMED CT and c) Clinical Procedure-based Document Indexing task. The MedProcNER corpus, guidelines, and resources (including cross-mappings to MeSH and ICD-10) are freely available at: https://zenodo.org/record/7929830

Type

Conference paper

Publication

Proceedings of the BioCreative VIII Challenge and Workshop: Curation and Evaluation in the era of Generative Models.At: AMIA 2023 Annual Symposium , New Orleans, USA, November 2023

Click the Cite button above to demo the feature to enable visitors to import publication metadata into their reference management software.

Overview of SympTEMIST at BioCreative VIII: corpus, guidelines and evaluation of systems for the detection and normalization of symptoms, signs and findings from text

Abstract

Luis Gasco

ML Engineer | NLP Researcher