当前位置: X-MOL 学术arXiv.cs.MM › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Can we trust online crowdworkers? Comparing online and offline participants in a preference test of virtual agents
arXiv - CS - Multimedia Pub Date : 2020-09-22 , DOI: arxiv-2009.10760
Patrik Jonell, Taras Kucherenko, Ilaria Torre, Jonas Beskow

Conducting user studies is a crucial component in many scientific fields. While some studies require participants to be physically present, other studies can be conducted both physically (e.g. in-lab) and online (e.g. via crowdsourcing). Inviting participants to the lab can be a time-consuming and logistically difficult endeavor, not to mention that sometimes research groups might not be able to run in-lab experiments, because of, for example, a pandemic. Crowdsourcing platforms such as Amazon Mechanical Turk (AMT) or Prolific can therefore be a suitable alternative to run certain experiments, such as evaluating virtual agents. Although previous studies investigated the use of crowdsourcing platforms for running experiments, there is still uncertainty as to whether the results are reliable for perceptual studies. Here we replicate a previous experiment where participants evaluated a gesture generation model for virtual agents. The experiment is conducted across three participant pools -- in-lab, Prolific, and AMT -- having similar demographics across the in-lab participants and the Prolific platform. Our results show no difference between the three participant pools in regards to their evaluations of the gesture generation models and their reliability scores. The results indicate that online platforms can successfully be used for perceptual evaluations of this kind.

中文翻译:

我们可以信任在线众包吗?在虚拟代理的偏好测试中比较在线和离线参与者

进行用户研究是许多科学领域的重要组成部分。虽然有些研究要求参与者亲自到场,但其他研究可以在现场(例如在实验室内)和在线(例如通过众包)进行。邀请参与者到实验室可能是一项耗时且在后勤上困难的工作,更不用说有时研究小组可能无法进行实验室实验,例如,由于大流行。因此,Amazon Mechanical Turk (AMT) 或 Prolific 等众包平台可以成为运行某些实验(例如评估虚拟代理)的合适替代方案。尽管之前的研究调查了使用众包平台进行实验,但对于感知研究的结果是否可靠仍存在不确定性。在这里,我们复制了之前的实验,参与者评估了虚拟代理的手势生成模型。该实验在三个参与者池中进行——in-lab、Prolific 和 AMT——在实验室参与者和 Prolific 平台之间具有相似的人口统计数据。我们的结果表明,三个参与者池在对手势生成模型的评估及其可靠性分数方面没有差异。结果表明,在线平台可以成功地用于此类感知评估。我们的结果表明,三个参与者池在对手势生成模型的评估及其可靠性分数方面没有差异。结果表明,在线平台可以成功地用于此类感知评估。我们的结果表明,三个参与者池在对手势生成模型的评估及其可靠性分数方面没有差异。结果表明,在线平台可以成功地用于此类感知评估。
更新日期:2020-10-26
down
wechat
bug