Computing ( IF 3.7 ) Pub Date : 2021-06-13 , DOI: 10.1007/s00607-021-00964-4 Feng Wu , Hongwei Lv , Tongrang Fan , Wenbin Zhao , Jiaqi Wang
Data reuse strategy is an effective method to save storage space and improve data utilization in data management. In view of the successful application of deep learning in the field of text mining, a data reuse strategy based on deep learning is proposed for high dimensional data’s pattern and instance similarity. With traditional feature analysis and deep learning model of convolutional neural network, the pattern similarity of data dimension is analyzed so as to optimize the similar dimension pairs among high dimensional data sets. Combining inner-attention mechanism, a semantic similarity model IA-LSTM is designed for instance similarity, which can build the association mapping among data entities by the calculation of the similarity of short text. Based on the pattern and instance similarity in the proposed strategy, reusable data entities are discovered, and column storage is designed to improve data reuse efficiency.
中文翻译:
一种基于深度学习的高维数据模式和实例相似度数据重用策略
数据重用策略是数据管理中节省存储空间、提高数据利用率的有效方法。针对深度学习在文本挖掘领域的成功应用,针对高维数据的模式和实例相似度,提出了一种基于深度学习的数据重用策略。利用卷积神经网络的传统特征分析和深度学习模型,分析数据维度的模式相似性,从而优化高维数据集之间的相似维度对。结合inner-attention机制,针对实例相似度设计了语义相似度模型IA-LSTM,通过计算短文本的相似度来构建数据实体间的关联映射。基于所提出策略中的模式和实例相似性,