Aparna M

and 1 more

Named Entity Recognition (NER) task aims at automatically recognising and classifying named entities in a given natural language input. Majority of the studies related to Named Entity Recognition are focused on English language. Named Entity Recognition in Indian languages is challenging due to the complex grammar and morphology of Indian languages and the scarcity of good quality labelled data. The dissimilarity between the literary and spoken versions of these languages is also a big challenge regarding usability of NER models. Kannada is such an Indian language for which the task of NER is still an active area of research. Usage of Deep learning, especially Transfer learning using Transformer models has drastically improved performance of many NLP tasks at scale. However, Transfer learning still requires considerable amount of data for the required task. In case of low resource language like Kannada, very few labelled datasets are available publicly and creating one from scratch is expensive in terms of time and labor. Active learning (AL) aims to tackle the labelled data acquisition problem by having the learning model and an Oracle to cooperate. Active Learning iteratively builds an optimally labelled and sufficiently large dataset.  This study focuses on Named Entity Recognition in Kannada language. We explore the application of Active Learning technique to Named Entity Recognition problem in the low-resource language Kannada. Results show that AL can be used to boost a multilingual models performance in fine-tuning for NER. We also try to mitigate the gap between formal and colloquial dialects of Kannada in NER datasets.