loading page

A Generalize Hardware Debugging Approach for Large Language Models Semi-Syntectic Datasets
  • +4
  • Weimin Fu,
  • Shijie Li,
  • Yifang Zhao,
  • Kaichen Yang,
  • Xuan Zhang,
  • Yier Jin,
  • Xiaolong Guo
Weimin Fu

Corresponding Author:[email protected]

Author Profile
Shijie Li
Yifang Zhao
Kaichen Yang
Xuan Zhang
Yier Jin
Xiaolong Guo

Corresponding Author:

Abstract

Large Language Models (LLMs) have precipitated emerging trends towards intelligent automation. However, integrating LLMs into the hardware debug domain encounters challenges: the datasets for LLMs for hardware are often plagued by a dual dilemma-scarcity and subpar quality. Traditional hardware debug approaches that rely on experienced labor to generate detailed prompts are not cheaply scalable. Similarly, strategies that depend on existing LLMs and randomly generated prompts fail to achieve sufficient reliability. We propose a directed, semisynthetic data synthetic method that leverages version control information and journalistic event descriptions. To produce high-quality data, this approach utilizes version control data from hardware projects combined with the 5W1H (Who, What, When, Where, Why, How) journalistic principles. It facilitates the linear scaling of dataset volumes without depending on skilled labor. We have implemented this method on a collected dataset of open-source hardware designs and fine-tuned fifteen general-purpose LLMs to enable their capability in hardware debugging tasks, thereby validating the efficacy of our approach.
06 May 2024Submitted to TechRxiv
09 May 2024Published in TechRxiv