当前位置: X-MOL 学术arXiv.cs.DB › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Informal Data Transformation Considered Harmful
arXiv - CS - Databases Pub Date : 2020-01-02 , DOI: arxiv-2001.00338
Eric Daimler, Ryan Wisnesky

In this paper we take the common position that AI systems are limited more by the integrity of the data they are learning from than the sophistication of their algorithms, and we take the uncommon position that the solution to achieving better data integrity in the enterprise is not to clean and validate data ex-post-facto whenever needed (the so-called data lake approach to data management, which can lead to data scientists spending 80% of their time cleaning data), but rather to formally and automatically guarantee that data integrity is preserved as it transformed (migrated, integrated, composed, queried, viewed, etc) throughout the enterprise, so that data and programs that depend on that data need not constantly be re-validated for every particular use.

中文翻译:

非正式数据转换被认为是有害的

在本文中,我们采取的共同立场是,AI 系统更多地受到它们所学习数据的完整性的限制,而不是其算法的复杂程度,我们采取的罕见立场是,在企业中实现更好的数据完整性的解决方案不是在需要时事后清理和验证数据(所谓的数据管理数据湖方法,这可能导致数据科学家花费 80% 的时间清理数据),而是正式和自动保证数据完整性在整个企业中进行转换(迁移、集成、组合、查询、查看等)时,数据被保留下来,因此依赖于该数据的数据和程序无需针对每个特定用途不断重新验证。
更新日期:2020-01-03
down
wechat
bug