Abstract
Data reuse strategy is an effective method to save storage space and improve data utilization in data management. In view of the successful application of deep learning in the field of text mining, a data reuse strategy based on deep learning is proposed for high dimensional data’s pattern and instance similarity. With traditional feature analysis and deep learning model of convolutional neural network, the pattern similarity of data dimension is analyzed so as to optimize the similar dimension pairs among high dimensional data sets. Combining inner-attention mechanism, a semantic similarity model IA-LSTM is designed for instance similarity, which can build the association mapping among data entities by the calculation of the similarity of short text. Based on the pattern and instance similarity in the proposed strategy, reusable data entities are discovered, and column storage is designed to improve data reuse efficiency.
Similar content being viewed by others
References
Jha NK, Mittal S (2020) modeling data reuse in deep neural networks by taking data-types into cognizance. In: IEEE transactions on computers
Nie Y, Tang X, Ma Y, et al. (2020) Design of CNN computing module to improve data reuse. In: Microcontrollers and embedded systems
Belhadi H, Akli-Astouati K, Djenouri Y et al (2020) Data mining-based approach for ontology matching problem. Appl Intell 50(11):1204–1221
Chung TL, Xu B, Liu YB, Ouyang CP, Li SL, Luo LY (2019) Empirical study on character level neural network classifier for Chinese text. Eng Appl Artif Intell 802(1):1–6
Wei L, Guo XP (2017) Data reuse strategy based on parallel processing mechanism. Appl Res Comput 34(8):2324–2328
Zhao WB, Fan TR, Nie YC et al (2018) Research on attribute dimension partition based on SVM classifying and MapReduce. Wirel Pers Commun 102(4):2759–2774
Sun ZQ, Hu W, Zhang QH, Qu YZ (2018) Bootstrapping entity alignment with knowledge graph embedding. In: Twenty-seventh international joint conference on artificial intelligence, IJCAI-18, pp 4396–4402
Xu K, Wang L, Yu M, et al. (2019) Cross-lingual knowledge graph alignment via graph matching neural network. In: Proceedings of the annual meeting of theassociation for computational linguistics, ACL, pp 3156–3161
Li C, Cao Y, Hou L, et al. (2019) Semi-supervised entity alignment via joint knowledge embedding model and cross-graph model. In: Proceedings of the conference on empirical methods in natural language processing and the international joint conference on natural language processing, EMNLP-IJCNLP, pp 2723–2732
Paulheim H (2017) Data-driven joint debugging of the dbpedia mappings and ontology. In: European semantic web conference. Springer, Cham, pp 404–418
Majid M, Wout H, Tan YH (2018) A comparative study of ontology matching systems via inferential statistics. IEEE Trans Knowl Data Eng 31:615–628
Xue X, Liu J (2017) A compact hybrid evolutionary algorithm for large scale instance matching in linked open data cloud. Int J Artif Intell Tools 26(4):1750013
Ochieng P, Kyanda S (2018) A statistically-based ontology matching tool. Distrib Parallel Databases 36(1):195–217
Sang CJ, Pierro MD (2018) Improving trading technical analysis with TensorFlow long short-term memory (LSTM) neural network. J Finance Data Sci 2(1):1–6
Pratim Barman P, Boruah A (2018) A RNN based approach for next word prediction in assamese phonetic transcription. Proc Comput Sci 143(2):825–834
Wang HY, Luo C, Wang XY (2019) Synchronization and identification of nonlinear systems by using a novel self-evolving interval type-2 fuzzy LSTM-neural network. Eng Appl Artif Intell 81(1):123–136
Wu Y, Liu X, Feng Y, et al. (2019) Relation-aware entity alignment for heterogeneous knowledge graphs. In: Proceedings of the international joint conference on artificial intelligence, IJCAI, pp 5278–5284
Zhao WB, Fan TR, Yin ZX et al (2020) An evaluation method of scientific research team influence based on heterogeneity and node similarity of content and structure. J Ambient Intell Human Comput 11:3617–3626
Sun Z, Wang C, Hu W, et al. (2020) Knowledge graph alignment network with gated multi-hop neighborhood aggregation. In: Proceedings of the AAAI conference on artificial intelligence, AAAI, pp 222–229
Acknowledgements
The authors acknowledge the national natural science foundation of China (61373160), the research project “Research on Repetition Detection Technology of High Dimensional Data based on Deep Learning” of Hebei science and technology information processing laboratory, the research project “Research on recognition method of knowledge evolution path for sequential associated text based on graph neural network” of the natural science foundation of Hebei province and the research project "Knowledge Graph Construction of Multi-Source Domain data based on Knowledge Representation learning" of the education department of Hebei province.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wu, F., Lv, H., Fan, T. et al. A data reuse strategy based on deep learning for high dimensional data’s pattern and instance similarity. Computing (2021). https://doi.org/10.1007/s00607-021-00964-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00607-021-00964-4