当前位置: X-MOL 学术Perspect. Psychol. Sci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
AI Psychometrics: Assessing the Psychological Profiles of Large Language Models Through Psychometric Inventories
Perspectives on Psychological Science ( IF 10.5 ) Pub Date : 2024-01-02 , DOI: 10.1177/17456916231214460
Max Pellert 1 , Clemens M Lechner 2 , Claudia Wagner 2, 3, 4 , Beatrice Rammstedt 2 , Markus Strohmaier 1, 2, 4
Affiliation  

We illustrate how standard psychometric inventories originally designed for assessing noncognitive human traits can be repurposed as diagnostic tools to evaluate analogous traits in large language models (LLMs). We start from the assumption that LLMs, inadvertently yet inevitably, acquire psychological traits (metaphorically speaking) from the vast text corpora on which they are trained. Such corpora contain sediments of the personalities, values, beliefs, and biases of the countless human authors of these texts, which LLMs learn through a complex training process. The traits that LLMs acquire in such a way can potentially influence their behavior, that is, their outputs in downstream tasks and applications in which they are employed, which in turn may have real-world consequences for individuals and social groups. By eliciting LLMs’ responses to language-based psychometric inventories, we can bring their traits to light. Psychometric profiling enables researchers to study and compare LLMs in terms of noncognitive characteristics, thereby providing a window into the personalities, values, beliefs, and biases these models exhibit (or mimic). We discuss the history of similar ideas and outline possible psychometric approaches for LLMs. We demonstrate one promising approach, zero-shot classification, for several LLMs and psychometric inventories. We conclude by highlighting open challenges and future avenues of research for AI Psychometrics.

中文翻译:


AI 心理测量:通过心理测量量表评估大型语言模型的心理特征



我们说明了最初设计用于评估非认知人类特征的标准心理测量量表如何重新用作诊断工具,以评估大型语言模型( LLMs )中的类似特征。我们从这样的假设开始: LLMs无意中但不可避免地从他们接受训练的庞大文本语料库中获得了心理特征(隐喻而言)。这些语料库包含了这些文本的无数人类作者的个性、价值观、信仰和偏见的沉淀物, LLMs通过复杂的培训过程学习这些内容。 LLMs以这种方式获得的特质可能会影响他们的行为,即他们在下游任务和受雇应用中的产出,这反过来可能对个人和社会群体产生现实世界的影响。通过引出LLMs对基于语言的心理测量量表的反应,我们可以揭示他们的特征。心理测量分析使研究人员能够研究和比较LLMs非认知特征,从而提供了解这些模型表现出(或模仿)的个性、价值观、信仰和偏见的窗口。我们讨论了类似想法的历史,并概述了LLMs可能的心理测量方法。我们针对多个LLMs和心理测量清单展示了一种有前途的方法,即零样本分类。最后,我们强调了人工智能心理测量学的开放挑战和未来研究途径。
更新日期:2024-01-02
down
wechat
bug