loading page

LexSUS: A Hybrid Lexical-Graph Salience based Text Summarization Technique using PEGASUS
  • Wazib Ansar ,
  • Saptarsi Goswami ,
  • Amlan Chakrabarti
Wazib Ansar
A. K. Choudhury School of IT

Corresponding Author:[email protected]

Author Profile
Saptarsi Goswami
Author Profile
Amlan Chakrabarti
Author Profile

Abstract

An ever-expanding plethora of textual content necessitates the automation of text summarization endeavors. The summaries containing salient information with minimal redundancy accelerate processing compared to the original text for further analysis. Contemporary works falter in capturing all the significant topics and eliminating redundancy. Besides, the complexity of the state-of-the-art transformer-based techniques renders them prohibitive for long sequences. To ameliorate these issues, a hybrid (extractive along with abstractive) summarization methodology LexSUS has been proposed in this paper. It comprises an extractive summarization approach to generate variable-length pre-summaries deploying a novel topic-supervised graph-based context-matching mechanism. It is accompanied by abstractive summarization deploying PEGASUS– a pre-trained transformer encoder-decoder model fine-tuned upon the pre-summaries. The generated pre-summaries are potent enough to capture salience and eliminate redundancy facilitating over 58% sequence length reduction and 82% efficiency enhancement compared to vanilla PEGASUS. Overall, the proposed LexSUS achieves 30% improvement over the state-of-the-art baseline upon the CNN/ Daily Mail and XSum data-sets.