loading page

Indian Legal Text Summarization: A Text Normalization-based Approach
  • Satyajit Ghosh ,
  • Mousumi Dutta ,
  • Tanaya Das
Satyajit Ghosh
Author Profile
Mousumi Dutta
Author Profile
Tanaya Das
Author Profile


In the Indian court system, pending cases have long been a problem. In the courts, there are more than 4 crore cases outstanding. Manual drafting of case summaries is done by legal stakeholders. Manually summarising hundreds of documents is a time-consuming and tedious task. Many state-of-the-art models for text summarization have emerged as machine learning has progressed. Domain-independent models don't do well with legal texts, and fine-tuning those models in the Indian Legal System is problematic due to a lack of publicly available datasets. To improve the performance of domain-independent models, the authors have proposed a methodology for normalizing legal texts in the Indian context. The authors experimented with two state-of-the-art domain-independent models for legal text summarization, namely BART and PEGASUS. BART and PEGASUS are put through their paces in terms of extractive and abstractive summarization to understand the effectiveness of the text normalization approach. Summarized texts are evaluated by domain experts on multiple parameters. It shows the proposed text normalization approach is effective in legal texts with domain-independent models.