loading page

Measuring the IQ of Mainstream Large Language Models in Chinese Using the Wechsler Adult Intelligence Scale
  • Jingjing Huang,
  • Ou Li
Jingjing Huang
Jiangnan Xinrui Co., Ltd.
Author Profile
Ou Li
Jiangnan Xinrui Co., Ltd.

Corresponding Author:[email protected]

Author Profile

Abstract

Artificial intelligence continues to revolutionize various domains, with large language models (LLMs) pushing the boundaries of what machines can understand and generate. Evaluating the intellectual and linguistic capabilities of LLMs using standardized intelligence tests like the Wechsler Adult Intelligence Scale (WAIS) provides a novel and significant approach to understanding their cognitive strengths and limitations. This research presents a comprehensive evaluation of Baidu Ernie and OpenAI ChatGPT, comparing their performance in IQ tests and Chinese linguistic tasks. The IQ assessments revealed that OpenAI ChatGPT achieved a marginally higher composite IQ score, excelling particularly in verbal comprehension and working memory. Baidu Ernie demonstrated superior performance in cultural appropriateness and linguistic accuracy, reflecting its strong alignment with the Chinese language and cultural context. The study involved translating the WAIS into Chinese, integrating multimodal inputs, and applying rigorous statistical analyses to ensure robust and reliable results. The findings demonstrate the distinct strengths of each model, with OpenAI ChatGPT showing versatility in handling diverse textual data and Baidu Ernie excelling in culturally relevant and grammatically precise responses. The implications for future development of LLMs emphasize the importance of contextually relevant training data and the integration of multimodal capabilities to enhance cognitive and linguistic performance. This evaluation framework offers valuable insights for advancing artificial intelligence, guiding future research and development towards more intelligent, adaptable, and culturally aware language models.
03 Jun 2024Submitted to TechRxiv
07 Jun 2024Published in TechRxiv