Prompt scoring system for dialogue summarization using GPT-3
preprintposted on 23.09.2021, 00:56 by George Prodan, Elena PelicanElena Pelican
Recent results in language processing show that language models are capable of performing several natural language tasks without the need of supervised learning. A challenging task for pre-trained language models is dialogue summarization. One way of generating summaries is engineering prompt templates for few-shot training. However, a static approach of creating prompts leads to unreliable outcomes between different classes of dialogues. Focusing on the dialogues structure properties we propose a scoring system to improve the few-shot training performances. We build tuned prompts composed by the highest scored dialogue samples. Our evaluation based on ROUGE scores and human evaluation shows that there is an improvement for the experiments in which we use the score system. All experiments are performed within the framework of the GPT-3 API. We use different engines for comparison. Moreover, the human evaluation we conducted showed that the number of failures decreased by 11\% after applying our scoring system.