当前位置: X-MOL 学术VLDB J. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
TADOC: Text analytics directly on compression
The VLDB Journal ( IF 4.2 ) Pub Date : 2020-09-19 , DOI: 10.1007/s00778-020-00636-3
Feng Zhang , Jidong Zhai , Xipeng Shen , Dalin Wang , Zheng Chen , Onur Mutlu , Wenguang Chen , Xiaoyong Du

This article provides a comprehensive description of text analytics directly on compression (TADOC), which enables direct document analytics on compressed textual data. The article explains the concept of TADOC and the challenges to its effective realizations. Additionally, a series of guidelines and technical solutions that effectively address those challenges, including the adoption of a hierarchical compression method and a set of novel algorithms and data structure designs, are presented. Experiments on six data analytics tasks of various complexities show that TADOC can save 90.8% storage space and 87.9% memory usage, while halving data processing times.



中文翻译:

TADOC:直接在压缩时进行文本分析

本文提供了直接在压缩上进行文本分析(TADOC)的全面描述,它可以对压缩的文本数据进行直接文档分析。本文介绍了TADOC的概念及其有效实现所面临的挑战。此外,还介绍了有效解决这些挑战的一系列指南和技术解决方案,包括采用分层压缩方法以及一组新颖的算法和数据结构设计。对六个复杂程度不同的数据分析任务进行的实验表明,TADOC可以节省90.8%的存储空间和87.9%的内存使用量,同时将数据处理时间减半。

更新日期:2020-09-20
down
wechat
bug