MediCaption: Integrating YOLO-Driven Computer Vision and NLP for Advanced Pharmaceutical Package Recognition and Annotation

Aarthi Lakshmipathy; Madhurima Vardhineedi; Venkata Ramana Patnaik Sekharamahanthi; Devanshi Dineshbhai Patel; Saurav Saini; Dr. Sabah Mohammed

doi:10.36227/techrxiv.171177556.62647395/v1

loading page

MediCaption: Integrating YOLO-Driven Computer Vision and NLP for Advanced Pharmaceutical Package Recognition and Annotation

Aarthi Lakshmipathy,
Madhurima Vardhineedi,
Venkata Ramana Patnaik Sekharamahanthi,
Devanshi Dineshbhai Patel,
Saurav Saini,
Dr. Sabah Mohammed

Abstract

To ensure patient safety and reduce the incidence of prescription errors, the healthcare industry places a high priority on the availability and accuracy of pharmaceutical information. MediCaption offers a unique solution to this issue with its integrated system, which combines the capabilities of computer vision driven with a state-of-the-art object detection model YOLO-v8 by Ultralytics [1], robust natural language processing (NLP), text-to-speech (TTS), and optical character recognition (OCR). This Project utilizes advanced AI and image processing to quickly and accurately annotate pharmaceutical packaging with key information like drug names, uses, and side effects, significantly reducing medication management errors and enhancing information precision and usability. Using a dataset of 372 pharmaceutical packages from Kaggle (Shah, 2021) [2], we annotated it with Roboflow and trained it using the YOLO-v8 model, achieving precise medicine name detection through bounding box accuracy. This enabled effective text extraction via OCR, following NLP preprocessing by matching against a medicinal database, allowed for the generation of informative captions. To improve user accessibility, these captions were subsequently translated into audio using Text-to-Speech (TTS) technology. This system is designed with computational efficiency and user accessibility in mind, making it beneficial for a wide array of users, including those with visual impairments.

27 Mar 2024Submitted to TechRxiv

30 Mar 2024Published in TechRxiv

Abstract

Peer review timeline