loading page

Can innovative prompt engineering with ChatGPT address imbalances in machine learning datasets?
  • +3
  • Mateusz Kochanek,
  • Przemysław Kazienko,
  • Jan Kocoń,
  • Igor Cichecki,
  • Oliwier Kaszyca,
  • Dominika Szydło
Mateusz Kochanek

Corresponding Author:[email protected]

Author Profile
Przemysław Kazienko
Jan Kocoń
Igor Cichecki
Oliwier Kaszyca
Dominika Szydło


Large language models are experiencing a significant surge of attention and rapid development. It is happening mainly due to the publication of OpenAI's ChatGPT models: GPT3.5-turbo and GPT-4. This article uses prompt engineering to present an innovative approach to synthetic data generation and knowledge distillation. Specifically, we focus on three methods: basic prompts, composite prompts, and similarity prompts. This research aims to investigate the potential of these techniques to address the problem of unbalanced datasets, a common issue in machine learning applications. Experimental results reveal that none of the prompt-based strategies achieve scores on par with the entire dataset. However, the similarity prompts method shows promising potential, outperforming other approaches. The study suggests a significant opportunity to develop these techniques further to generate more diverse synthetic data. Although the results are preliminary, they open up exciting possibilities for future research in this area, including integrating more advanced versions of Large Language Models and exploring other machine learning domains.
15 Dec 2023Submitted to TechRxiv
22 Dec 2023Published in TechRxiv