IIST BCI Dataset-2 for Selected Common Marathi Words

Shubham Tayade; Parvathy S S; Nancy Sunil; Charu Chauhan; S Sumitra; B S Manoj

doi:10.36227/techrxiv.171043118.80448751/v1

loading page

IIST BCI Dataset-2 for Selected Common Marathi Words

Shubham Tayade,
Parvathy S S,
Nancy Sunil,
Charu Chauhan,
S Sumitra,
B S Manoj

Abstract

To solve problems of neurodegenerative disorder patients, Brain-Computer Interface (BCI) based solutions require datasets relevant to the languages spoken by patients. BCI Research sometimes gets restricted due to the lack of datasets. For example, Marathi, a prominent language spoken by over 83 million people in India, lacks BCI datasets based on the language for research purposes. To tackle this gap, we created a dataset comprising of Electroencephalograph (EEG) signal samples of selected common Marathi words. EEG samples were captured using the OpenBCI Cyton device for constructing a dataset by volunteers who speak commonly used Marathi words. The dataset contains EEG recordings involving volunteers pronouncing commonly used Marathi words. It encompasses three main categories: (i) Utterances of Marathi words (Vocal), (ii) English translations of these Marathi words (Vocal), and (iii) Silent pronunciation (sub-vocalization) of the Marathi words. We compiled data for 100 distinct words, each with recordings for these three categories. Ten trials were conducted for each phrase. This dataset is valuable for developing BCI solutions to assist Marathi-speaking patients with neurodegenerative diseases. BCI solutions using Machine Learning (ML) classifiers and Deep Learning methods can be used to translate EEG signals into Marathi words, both vocal and sub-vocal.

07 Mar 2024Submitted to TechRxiv

14 Mar 2024Published in TechRxiv

Abstract

Peer review timeline