Overview of SympTEMIST at BioCreative VIII: corpus, guidelines and evaluation of systems for the detection and normalization of symptoms, signs and findings from text

Salvador Lima-López, Eulàlia Farré-Maduell, Luis Gasco, Jan Rodríguez-Miret, Martin Krallinger

January, 2023

Abstract

Systems capable of detecting and normalizing symptom mentions from clinical texts are crucial for healthcare data mining, AI applied to clinical systems, medical analytics, and predictive applications. Unlike other clinical information types, such as diagnoses/diseases, procedures, lab test results, or medications, clinical symptoms are often only recoverable in detail from written clinical narratives. Due to the high complexity, variability, and difficulty in generating annotated corpora for clinical symptoms, few manually annotated data collections have been created so far. Previous efforts typically showed limitations, such as the absence of entity normalization to controlled vocabularies, reliance on dictionaries for pre-annotations, lack of multilingual solutions, or underspecified annotation guidelines. To address these issues, we proposed the SympTEMIST track as part of the BioCreative VIII initiative. The SympTEMIST task is structured into three sub-tracks: automatic detection of exact symptom mentions, normalization of symptoms to their SNOMED CT concept identifiers, and an experimental subtask aimed at promoting entity linking and concept normalization in several languages, including English, Portuguese, French, Italian, and Dutch. Out of 25 teams, 11 submitted results for at least one of the three sub-tasks. Top-scoring teams achieved an F1-score of 0.7477 for the SymptomNER task (with a precision of 0.8039 and recall of 0.6988). The top-performing team for the SymptomNorm task obtained an accuracy of 0.6070. Considering the complexity of symptom mentions, which often include long descriptive or nested entities and abbreviations, the results and datasets used are a significant contribution to future symptom mining approaches from clinical texts. The SympTEMIST Gold Standard is freely available at: https://zenodo.org/doi/10.5281/zenodo.8223653.

Type

Journal article

Publication

Proceedings of the BioCreative VIII Challenge and Workshop: Curation and Evaluation in the era of Generative Models

Click the Cite button above to demo the feature to enable visitors to import publication metadata into their reference management software.

Overview of SympTEMIST at BioCreative VIII: corpus, guidelines and evaluation of systems for the detection and normalization of symptoms, signs and findings from text

Abstract

Luis Gasco

ML Engineer | NLP Researcher