当前位置: X-MOL 学术Visual Communication › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
The agency of computer vision models as optical instruments
Visual Communication ( IF 1.2 ) Pub Date : 2021-03-19 , DOI: 10.1177/1470357221992097
Thomas Smits 1 , Melvin Wevers 2
Affiliation  

Industry and governments have deployed computer vision models to make high-stake decisions in society. While they are often presented as neutral and objective, scholars have recognized that bias in these models might lead to the reproduction of racial, social, cultural and economic inequity. A growing body of work situates the provenance of bias in the collection and annotation of datasets that are needed to train computer vision models. This article moves from studying bias in computer vision models to the agency that is commonly attributed to them: the fact that they are universally seen as being able to make biased decisions. Building on the work of Bruno Latour and Jonathan Crary, the authors discuss computer vision models as agential optical instruments in the production of contemporary visuality. They analyse five interconnected research steps – task selection, category selection, data collection, data labelling and evaluation – of six widely cited benchmark datasets, published during a critical stage in the development of the field (2004–2020): Caltech 101, Caltech 256, PASCAL VOC, ImageNet, MS COCO and Google Open Images. They found that, despite all sorts of justifications, the selection of categories is not based on any general notion of visuality, but depends heavily upon perceived practical applications, the availability of downloadable images and, in conjunction with data collection, favours categories that can be unambiguously described by text. Second, the reliance on Flickr for data collection introduces a temporal bias in computer vision datasets. Third, by comparing aggregate accuracy rates and ‘human’ performance, the dataset papers introduce a false dichotomy between the agency of computer vision models and human observers. In general, the authors argue that the agency of datasets is produced by obscuring the power and subjective choices of its creators and the countless hours of highly disciplined labour of crowd workers.



中文翻译:

作为光学仪器的计算机视觉模型的代理

工业界和政府已经部署了计算机视觉模型,以在社会中做出重大决策。尽管它们通常被认为是中立和客观的,但学者们已经意识到,这些模式的偏见可能导致种族,社会,文化和经济不平等的再现。越来越多的工作使训练计算机视觉模型所需的数据集的收集和注释中存在偏见。本文从研究计算机视觉模型中的偏见转移到通常归因于它们的代理:这一事实,即普遍认为它们能够做出有偏见的决定。在布鲁诺·拉图尔(Bruno Latour)和乔纳森·克拉里(Jonathan Crary)的工作基础上,作者讨论了计算机视觉模型,作为当代视觉产生中的代理光学仪器。他们分析了六个相互关联的研究步骤-任务选择,类别选择,数据收集,数据标签和评估-六个在该领域发展的关键阶段(2004-2020年)发布的基准数据集:Caltech 101,Caltech 256 ,PASCAL VOC,ImageNet,MS COCO和Google Open Images。他们发现,尽管有各种各样的理由,但类别的选择不是基于任何可视性的一般概念,而是在很大程度上取决于感知的实际应用,可下载图像的可用性以及与数据收集的结合,支持可以用文字明确描述的类别。其次,依靠Flickr进行数据收集会在计算机视觉数据集中引入时间偏差。第三,通过比较总体准确率和“人类”表现,数据集论文在计算机视觉模型的代理人和人类观察者之间引入了错误的二分法。通常,作者认为,数据集的代理是通过掩盖其创建者的力量和主观选择而产生的以及无数小时的群众工作者的严格纪律劳动。

更新日期:2021-03-21
down
wechat
bug