TechRxiv
IDLS_V2.O.pdf (155.78 kB)
Download file

Indian Legal Text Summarization: A Text Normalization-based Approach

Download (155.78 kB)
preprint
posted on 07.06.2022, 20:13 authored by Satyajit GhoshSatyajit Ghosh, Mousumi Dutta, Tanaya Das

In the Indian court system, pending cases have long been a problem. There are more than 4 crore cases outstanding. Manually summarising hundreds of documents is a time-consuming and tedious task for legal stakeholders. Many state-of-the-art models for text summarization have emerged as machine learning has progressed. Domain-independent models don't do well with legal texts, and fine-tuning those models for the Indian Legal System is problematic due to a lack of publicly available datasets. To improve the performance of domain-independent models, the authors have proposed a methodology for normalising legal texts in the Indian context. The authors experimented with two state-of-the-art domain-independent models for legal text summarization, namely BART and PEGASUS. BART and PEGASUS are put through their paces in terms of extractive and abstractive summarization to understand the effectiveness of the text normalisation approach. Summarised texts are evaluated by domain experts on multiple parameters and using ROUGE metrics. It shows the proposed text normalisation approach is effective in legal texts with domain-independent models.

History

Email Address of Submitting Author

satyajit.ghosh@stu.adamasuniversity.ac.in

ORCID of Submitting Author

0000-0003-2791-5780

Submitting Author's Institution

Adamas University

Submitting Author's Country

India

Usage metrics

Licence

Exports