当前位置: X-MOL 学术Cogn. Syst. Res. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Cognitive Evaluation of Machine Learning Agents
Cognitive Systems Research ( IF 2.1 ) Pub Date : 2021-03-01 , DOI: 10.1016/j.cogsys.2020.11.003
Suvarna Kadam , Vinay Vaidya

Abstract Advances in applying statistical Machine Learning (ML) led to several claims of human-level or near-human performance in tasks such as image classification & speech recognition. Such claims are unscientific primarily for two reasons, (1) They incorrectly enforce the notion that task-specific performance can be treated as manifestation of General Intelligence and (2) They are not verifiable as currently there is no set benchmark for measuring human-like cognition in a machine learning agent. Moreover, ML agent’s performance is influenced by knowledge ingested in it by its human designers. Therefore, agent’s performance may not necessarily reflect its true cognition. In this paper, we propose a framework that draws parallels from human cognition to measure machine’s cognition. Human cognitive learning is quite well studied in developmental psychology with frameworks and metrics in place to measure actual learning. To either believe or refute the claims of human-level performance of machine learning agent, we need scientific methodology to measure its cognition. Our framework formalizes incremental implementation of human-like cognitive processes in ML agents with an implicit goal to measure it. The framework offers guiding principles for measuring, (1) Task-specific machine cognition and (2) General machine cognition that spans across tasks. The framework also provides guidelines for building domain-specific task taxonomies to cognitively profile tasks. We demonstrate application of the framework with a case study where two ML agents that perform Vision and NLP tasks are cognitively evaluated.

中文翻译:

机器学习代理的认知评估

摘要 应用统计机器学习 (ML) 的进步导致了多项声称在图像分类和语音识别等任务中达到人类水平或接近人类水平的表现。这种说法是不科学的,主要有两个原因,(1) 他们错误地认为特定任务的表现可以被视为一般智力的表现,以及 (2) 他们是不可验证的,因为目前没有设定的基准来衡量人类机器学习代理中的认知。此外,机器学习代理的性能受到人类设计者吸收的知识的影响。因此,Agent 的表现不一定反映其真实认知。在本文中,我们提出了一个框架,该框架借鉴了人类认知来衡量机器认知。人类认知学习在发展心理学中得到了很好的研究,并提供了衡量实际学习的框架和指标。为了相信或反驳机器学习代理的人类水平性能的主张,我们需要科学的方法来衡量其认知。我们的框架将 ML 代理中类人认知过程的增量实现形式化,并带有一个隐含的目标来衡量它。该框架提供了测量的指导原则,(1)特定于任务的机器认知和(2)跨越任务的通用机器认知。该框架还提供了构建特定领域任务分类法以认知分析任务的指南。我们通过一个案例研究展示了该框架的应用,其中两个执行视觉和 NLP 任务的 ML 代理被认知评估。
更新日期:2021-03-01
down
wechat
bug