Prompt scoring system for dialogue summarization using GPT-3
- George Prodan ,
- Elena Pelican
Abstract
Recent results in language processing show that language models are
capable of performing several natural language tasks without the need of
supervised learning. A challenging task for pre-trained language models
is dialogue summarization. One way of generating summaries is
engineering prompt templates for few-shot training. However, a static
approach of creating prompts leads to unreliable outcomes between
different classes of dialogues. Focusing on the dialogues structure
properties, we propose a scoring system to improve the few-shot training
performances by building tuned prompts composed by the highest scored
dialogue samples. Our evaluation based on ROUGE scores and human
evaluation shows that there is improvement for the experiments where we
made use of the score system. The positive results are validated by all
the three large-scale datasets we used in testing. All experiments were
performed within the framework of the GPT-3 API.