A Survey of Dataset Refinement for Problems in Computer Vision Datasets,ACM Computing Surveys

当前位置： X-MOL 学术 › ACM Comput. Surv. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Survey of Dataset Refinement for Problems in Computer Vision Datasets
ACM Computing Surveys ( IF 16.6 ) Pub Date : 2024-04-09 , DOI: 10.1145/3627157
Zhijing Wan ₁ , Zhixiang Wang ₂ , CheukTing Chung ₁ , Zheng Wang ₁

Affiliation

Large-scale datasets have played a crucial role in the advancement of computer vision. However, they often suffer from problems such as class imbalance, noisy labels, dataset bias, or high resource costs, which can inhibit model performance and reduce trustworthiness. With the advocacy of data-centric research, various data-centric solutions have been proposed to solve the dataset problems mentioned above. They improve the quality of datasets by re-organizing them, which we call dataset refinement. In this survey, we provide a comprehensive and structured overview of recent advances in dataset refinement for problematic computer vision datasets.¹ Firstly, we summarize and analyze the various problems encountered in large-scale computer vision datasets. Then, we classify the dataset refinement algorithms into three categories based on the refinement process: data sampling, data subset selection, and active learning. In addition, we organize these dataset refinement methods according to the addressed data problems and provide a systematic comparative description. We point out that these three types of dataset refinement have distinct advantages and disadvantages for dataset problems, which informs the choice of the data-centric method appropriate to a particular research objective. Finally, we summarize the current literature and propose potential future research topics.

中文翻译：

计算机视觉数据集中问题的数据集细化调查

大规模数据集在计算机视觉的进步中发挥了至关重要的作用。然而，它们经常遇到类不平衡、标签噪声、数据集偏差或高资源成本等问题，这些问题会抑制模型性能并降低可信度。随着以数据为中心的研究的倡导，各种以数据为中心的解决方案被提出来解决上述数据集问题。他们通过重新组织数据集来提高数据集的质量，我们称之为数据集细化。在本次调查中，我们对有问题的计算机视觉数据集的数据集细化的最新进展进行了全面、结构化的概述。¹首先，我们总结并分析了大规模计算机视觉数据集中遇到的各种问题。然后，我们根据细化过程将数据集细化算法分为三类：数据采样、数据子集选择和主动学习。此外，我们根据所解决的数据问题组织这些数据集细化方法，并提供系统的比较描述。我们指出，这三种类型的数据集细化对于数据集问题具有明显的优点和缺点，这有助于选择适合特定研究目标的以数据为中心的方法。最后，我们总结了当前的文献并提出了未来潜在的研究主题。

更新日期：2024-04-09

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>