Abstract
Named Entity Recognition (NER) task aims at automatically recognising
and classifying named entities in a given natural language input.
Majority of the studies related to Named Entity Recognition are focused
on English language. Named Entity Recognition in Indian languages is
challenging due to the complex grammar and morphology of Indian
languages and the scarcity of good quality labelled data. The
dissimilarity between the literary and spoken versions of these
languages is also a big challenge regarding usability of NER models.
Kannada is such an Indian language for which the task of NER is still an
active area of research. Usage of Deep learning, especially Transfer
learning using Transformer models has drastically improved performance
of many NLP tasks at scale. However, Transfer learning still requires
considerable amount of data for the required task. In case of low
resource language like Kannada, very few labelled datasets are available
publicly and creating one from scratch is expensive in terms of time and
labor. Active learning (AL) aims to tackle the labelled data acquisition
problem by having the learning model and an Oracle to cooperate. Active
Learning iteratively builds an optimally labelled and sufficiently large
dataset. Â This study focuses on Named Entity Recognition in Kannada
language. We explore the application of Active Learning technique to
Named Entity Recognition problem in the low-resource language Kannada.
Results show that AL can be used to boost a multilingual models
performance in fine-tuning for NER. We also try to mitigate the gap
between formal and colloquial dialects of Kannada in NER datasets.