当前位置: X-MOL 学术arXiv.cs.DB › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
AI Data Wrangling with Associative Arrays
arXiv - CS - Databases Pub Date : 2020-01-18 , DOI: arxiv-2001.06731
Jeremy Kepner, Vijay Gadepally, Hayden Jananthan, Lauren Milechin, Siddharth Samsi

The AI revolution is data driven. AI "data wrangling" is the process by which unusable data is transformed to support AI algorithm development (training) and deployment (inference). Significant time is devoted to translating diverse data representations supporting the many query and analysis steps found in an AI pipeline. Rigorous mathematical representations of these data enables data translation and analysis optimization within and across steps. Associative array algebra provides a mathematical foundation that naturally describes the tabular structures and set mathematics that are the basis of databases. Likewise, the matrix operations and corresponding inference/training calculations used by neural networks are also well described by associative arrays. More surprisingly, a general denormalized form of hierarchical formats, such as XML and JSON, can be readily constructed. Finally, pivot tables, which are among the most widely used data analysis tools, naturally emerge from associative array constructors. A common foundation in associative arrays provides interoperability guarantees, proving that their operations are linear systems with rigorous mathematical properties, such as, associativity, commutativity, and distributivity that are critical to reordering optimizations.

中文翻译:

使用关联数组处理 AI 数据

人工智能革命是数据驱动的。AI“数据整理”是将不可用数据转化为支持AI算法开发(训练)和部署(推理)的过程。大量时间用于翻译支持 AI 管道中发现的许多查询和分析步骤的各种数据表示。这些数据的严格数学表示能够在步骤内和跨步骤进行数据转换和分析优化。关联数组代数提供了一个数学基础,可以自然地描述作为数据库基础的表格结构和集合数学。同样,关联数组也很好地描述了神经网络使用的矩阵运算和相应的推理/训练计算。更令人惊讶的是,层次格式的一般非规范化形式,例如 XML 和 JSON,可以很容易地构建。最后,数据透视表是使用最广泛的数据分析工具之一,它自然地从关联数组构造函数中出现。关联数组中的一个共同基础提供了互操作性保证,证明它们的操作是具有严格数学属性的线性系统,例如对重新排序优化至关重要的关联性、交换性和分布性。
更新日期:2020-01-22
down
wechat
bug