Dual Indicators to Analyse AI Benchmarks: Difficulty, Discrimination, Ability and Generality,IEEE Transactions on Games

当前位置： X-MOL 学术 › IEEE Trans. Games › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Dual Indicators to Analyse AI Benchmarks: Difficulty, Discrimination, Ability and Generality
IEEE Transactions on Games ( IF 1.7 ) Pub Date : 2020-06-01 , DOI: 10.1109/tg.2018.2883773
Fernando Martinez-Plumed , Jose Hernandez-Orallo

With the purpose of better analyzing the result of artificial intelligence (AI) benchmarks, we present two indicators on the side of the AI problems, difficulty and discrimination, and two indicators on the side of the AI systems, ability and generality. The first three are adapted from psychometric models in item response theory (IRT), whereas generality is defined as a new metric that evaluates whether an agent is consistently good at easy problems and bad at difficult ones. We illustrate how these key indicators give us more insight on the results of two popular benchmarks in AI, the Arcade Learning Environment (Atari 2600 games) and the General Video Game AI competition, and we include some guidelines to estimate and interpret these indicators for other AI benchmarks and competitions.

中文翻译：

分析AI基准的双重指标：难度、辨别力、能力和通用性

为了更好地分析人工智能（AI）基准测试的结果，我们在人工智能问题方面提出了两个指标，难度和歧视，在人工智能系统方面提出了两个指标，能力和通用性。前三个是从项目反应理论 (IRT) 中的心理测量模型改编而来的，而通用性被定义为一种新的度量标准，用于评估代理是否始终擅长于简单的问题而在困难的问题上表现不佳。我们说明了这些关键指标如何让我们更深入地了解 AI 中两个流行的基准测试结果，即 Arcade 学习环境（Atari 2600 游戏）和通用视频游戏 AI 竞赛，我们还提供了一些指南来估计和解释这些指标以用于其他AI 基准测试和竞赛。

更新日期：2020-06-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文