当前位置: X-MOL 学术IEEE Trans. Knowl. Data. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Automated Data Slicing for Model Validation: A Big data - AI Integration Approach
IEEE Transactions on Knowledge and Data Engineering ( IF 8.9 ) Pub Date : 2020-12-01 , DOI: 10.1109/tkde.2019.2916074
Yeounoh Chung , Tim Kraska , Neoklis Polyzotis , Ki Hyun Tae , Steven Euijong Whang

As machine learning systems become democratized, it becomes increasingly important to help users easily debug their models. However, current data tools are still primitive when it comes to helping users trace model performance problems all the way to the data. We focus on the particular problem of slicing data to identify subsets of the validation data where the model performs poorly. This is an important problem in model validation because the overall model performance can fail to reflect that of the smaller subsets, and slicing allows users to analyze the model performance on a more granular-level. Unlike general techniques (e.g., clustering) that can find arbitrary slices, our goal is to find interpretable slices (which are easier to take action compared to arbitrary subsets) that are problematic and large. We propose $\mathsf{Slice Finder}$SliceFinder, which is an interactive framework for identifying such slices using statistical techniques. Applications include diagnosing model fairness and fraud detection, where identifying slices that are interpretable to humans is crucial. This research is part of a larger trend of Big data and Artificial Intelligence (AI) integration and opens many opportunities for new research.

中文翻译:

用于模型验证的自动数据切片:大数据 - AI 集成方法

随着机器学习系统的民主化,帮助用户轻松调试模型变得越来越重要。但是,当前的数据工具在帮助用户将模型性能问题一直追踪到数据方面仍然很原始。我们专注于切片数据以识别模型性能不佳的验证数据子集的特定问题。这是模型验证中的一个重要问题,因为整体模型性能可能无法反映较小子集的性能,而切片允许用户在更细粒度的级别上分析模型性能。与可以找到任意切片的通用技术(例如聚类)不同,我们的目标是找到有问题且大的可解释切片(与任意子集相比更容易采取行动)。我们建议$\mathsf{切片查找器}$切片查找器,这是一个交互式框架,用于使用统计技术识别此类切片。应用包括诊断模型公平性和欺诈检测,其中识别人类可解释的切片至关重要。这项研究是大数据和人工智能 (AI) 集成大趋势的一部分,并为新研究开辟了许多机会。
更新日期:2020-12-01
down
wechat
bug