loading page

Medical Information Extraction with NLP-Powered QABots: a Real-World Scenario
  • +41
  • Claudio Crema,
  • Federico Verde,
  • Pietro Tiraboschi,
  • Camillo Marra,
  • Andrea Arighi,
  • Silvia Fostinelli,
  • Guido Maria Giuffré,
  • Vera Pacoova Dal Maschio,
  • Federica L'abbate,
  • Federica Solca,
  • Barbara Poletti,
  • Vincenzo Silani,
  • Emanuela Rotondo,
  • Vittoria Borracci,
  • Roberto Vimercati,
  • Valeria Crepaldi,
  • Emanuela Inguscio,
  • Massimo Filippi,
  • Francesca Caso,
  • Alessandra Maria Rosati,
  • Davide Quaranta,
  • Giuliano Binetti,
  • Ilaria Pagnoni,
  • Manuela Morreale,
  • Francesca Burgio,
  • Michelangelo Stanzani Maserati,
  • Sabina Capellari,
  • Matteo Pardini,
  • Nicola Girtler,
  • Federica Piras,
  • Fabrizio Piras,
  • Stefania Lalli,
  • Elena Perdixi,
  • Gemma Lombardi,
  • Sonia Di Tella,
  • Alfredo Costa,
  • Marco Capelli,
  • Cira Fundarò,
  • Marina Manera,
  • Cristina Muscio,
  • Elisa Pellencin,
  • Raffaele Lodi,
  • Fabrizio Tagliavini,
  • Alberto Redolfi
Claudio Crema

Corresponding Author:[email protected]

Author Profile
Federico Verde
Pietro Tiraboschi
Camillo Marra
Andrea Arighi
Silvia Fostinelli
Guido Maria Giuffré
Vera Pacoova Dal Maschio
Federica L'abbate
Federica Solca
Barbara Poletti
Vincenzo Silani
Emanuela Rotondo
Vittoria Borracci
Roberto Vimercati
Valeria Crepaldi
Emanuela Inguscio
Massimo Filippi
Francesca Caso
Alessandra Maria Rosati
Davide Quaranta
Giuliano Binetti
Ilaria Pagnoni
Manuela Morreale
Francesca Burgio
Michelangelo Stanzani Maserati
Sabina Capellari
Matteo Pardini
Nicola Girtler
Federica Piras
Fabrizio Piras
Stefania Lalli
Elena Perdixi
Gemma Lombardi
Sonia Di Tella
Alfredo Costa
Marco Capelli
Cira Fundarò
Marina Manera
Cristina Muscio
Elisa Pellencin
Raffaele Lodi
Fabrizio Tagliavini
Alberto Redolfi


The advent of computerized medical recording systems in healthcare facilities has made data retrieval tasks easier, compared with manual recording. Nevertheless, the potential of the information contained within medical records remains largely untapped, mostly due to the time and effort required to extract data from unstructured documents. Natural Language Processing (NLP) represents a promising solution to this challenge, as it enables the use of automated text-mining tools for clinical practitioners. In this work, we present the architecture of the Virtual Dementia Institute (IVD), a consortium of sixteen Italian hospitals, using the NLP Extraction and Management Tool (NEMT), a (semi-)automated end-to-end pipeline that extracts relevant information from clinical documents and stores it in a centralized database. NEMT core is a Question Answering Bot (QABot) based on a modern NLP model, fine-tuned using thousands of examples produced from IVD centers. Detailed descriptions of the process for defining a common minimum dataset, the Inter-Annotator Agreement calculated on clinical documents, and NEMT results, are provided. The best QABot performance in terms of Exact Match (EM) and F1-score (78.1% and 84.7%) outperforms ChatGPTv3.5 (68.9% and 52.5%). NEMT represents an efficient tool that paves the way for medical information extraction and exploitation for new research studies.
21 Dec 2023Submitted to TechRxiv
22 Dec 2023Published in TechRxiv