Prompt scoring system for dialogue summarization using GPT-3
Recent results in language processing show that language models are capable of performing several natural language tasks without the need of supervised learning. A challenging task for pre-trained language models is dialogue summarization. One way of generating summaries is engineering prompt templates for few-shot training. However, a static approach of creating prompts leads to unreliable outcomes between different classes of dialogues. Focusing on the dialogues structure properties, we propose a scoring system to improve the few-shot training performances by building tuned prompts composed by the highest scored dialogue samples. Our evaluation based on ROUGE scores and human evaluation shows that there is improvement for the experiments where we made use of the score system. The positive results are validated by all the three large-scale datasets we used in testing. All experiments were performed within the framework of the GPT-3 API.