Case2vec: joint variational autoencoder for case text embedding representation

Song, Ran; Gao, Shengxiang; Yu, Zhengtao; Zhang, Yafei; Zhou, Gaofeng

doi:10.1007/s13042-021-01335-3

Case2vec: joint variational autoencoder for case text embedding representation

Original Article
Published: 07 July 2021

Volume 12, pages 2517–2528, (2021)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Ran Song^1,2,
Shengxiang Gao ORCID: orcid.org/0000-0002-2980-8420^1,2,
Zhengtao Yu^1,2,
Yafei Zhang^1,2 &
…
Gaofeng Zhou^1,2

417 Accesses
3 Citations
2 Altmetric
Explore all metrics

Abstract

The embedding representation of the case text represent text as vector which consist information of original texts abundantly. Text embedding representation usually uses text statistical features or content features alone. However, case texts have characteristics that include similar structure, repeated words, and different text lengths. And the statistical feature or content feature cannot represent case text efficiently. In this paper, we propose a joint variational autoencoder (VAE) to represent case text embedding representation. We consider the statistical features and content features of case texts together, and use VAE to align the two features into the same space. We compare our representations with existing methods in terms of quality, relationship, and efficiency. The experiment results show that our method has achieved good results, which have higher performance than the model using single feature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Integrating Topic Information into VAE for Text Semantic Similarity

Binary Autoencoder for Text Modeling

Text Classification with Document Embeddings

Notes

https://github.com/Maxpa1n/case2vec.

References

Gururangan S, Dang T, Card D et al (2019) Variational pretraining for semi-supervised text classification. In: Proceedings of the 57th annual meeting of the association for computational linguistics. pp 5880–5894
Zhao R, Mao K (2017) Fuzzy bag-of-words model for document representation. IEEE Trans Fuzzy Syst 26(2):794–804
Article Google Scholar
Ma S, Sun X, Wang Y et al (2018) Bag-of-words as target for neural machine translation. In: Proceedings of the 56th annual meeting of the association for computational linguistics, vol 2 (Short Papers). pp 332–338
Trstenjak B, Mikac S, Donko D (2014) KNN with TF-IDF based framework for text categorization. Proc Eng 69:1356–1364
Article Google Scholar
Zhu Z, Liang J, Li D et al (2019) Hot topic detection based on a refined TF-IDF algorithm. IEEE Access 7:26996–27007
Article Google Scholar
Blei DM, Ng AY, Jordan MI et al (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022
MATH Google Scholar
Johnson R, Zhang T (2015) Effective use of word order for text categorization with convolutional neural networks. In: Proceedings of the 2015 conference of the North American chapter of the association for computational linguistics: human language technologies, NAACL HLT 2015
Naz S, Umar AI, Ahmad R et al (2017) Urdu Nasta’liq text recognition system based on multi-dimensional recurrent neural network and statistical features. Neural Comput Appl 28(2):219–231
Article Google Scholar
Kim Y (2014) Convolutional neural networks for sentence classification. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). pp 1746–1751
Yang Z, Yang D, Dyer C et al (2016) Hierarchical attention networks for document classification. In: Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies. pp 1480–1489
Gupta P, Pagliardini M, Jaggi M (2019) Better word embeddings by disentangling contextual n-gram information. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, vol 1 (Long and Short Papers). pp 933–939
Yang M et al (2018) Investigating capsule networks with dynamic routing for text classification. In: Proceedings of the 2018 conference on empirical methods in natural language processing
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In NeurIPS
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Article Google Scholar
Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018) Deep contextualized word representations. In NAACL-HLT
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In NeurIPS
Radford A, Narasimhan K, Salimans T, Sutskever I (2018) Improving language understanding by generative pre-training
Devlin J, Chang M-W, Lee K, Toutanova K (2019). BERT: pre-training of deep bidirectional transformers for language understanding. In NAACL-HLT
Kingma DP, Welling M (2014) Auto-encoding variational bayes. Stat 1050:1
MATH Google Scholar
Bowman S, Vilnis L, Vinyals O, et al (2016) Generating sentences from a continuous space[C]. In: Proceedings of the 20th SIGNLL conference on computational natural language learning, p 10–21
Yishu M, Yu L, Blunsom P (2016) Neural variational inference for text processing. In: International conference on machine learning
Yang Z, Hu Z, Salakhutdinov R et al (2017) Improved variational autoencoders for text modeling using dilated convolutions. In: Proceedings of the 34th international conference on machine learning, vol 70. JMLR. org, pp 3881–3890
Hoyle AM, Wolf-Sonkin L, Wallach H et al (2019) combining sentiment lexica with a multi-view variational autoencoder. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, vol 1 (Long and Short Papers). pp 635–640
Zhao T, Zhao R, Eskenazi M (2017) Learning discourse-level diversity for neural dialog models using conditional variational autoencoders. In: Proceedings of the 55th annual meeting of the association for computational linguistics, vol 1 (Long Papers). pp 654–664
Kusner MJ, Paige B, Hernández-Lobato JM (2017) Grammar variational autoencoder. In: Proceedings of the 34th international conference on machine learning, vol 70. JMLR. org, pp 1945–1954
Li X, Chen Z, Poon LKM et al (2019) Learning latent superstructures in variational autoencoders for deep multidimensional clustering. In: Proceedings of international conference on learning representations
Paszke A, Gross S, Chintala S (2017) Automatic differentiation in PyTorch. In: Proceedings of the NIPS auto diff workshop. MIT Press
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980
Yinhan L et al (2019) Roberta: a robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692
Zhenzhong L et al (2020) ALBERT: a lite BERT for self-supervised learning of language representations. In: Proceedings of international conference on learning representations

Download references

Acknowledgements

The work was supported by National Key Research and Development Plan (Grant Nos. 2018YFC0830101, 2018YFC0830105, 2018YFC0830100), National Natural Science Foundation of China (Grant Nos. 61972186, 61761026, 61732005, 61672271 and 61762056), Yunnan high-tech industry development project (Grant No. 201606), Yunnan provincial major science and technology special plan projects: digitization research and application demonstration of Yunnan characteristic industry (Grant No. 202002AD080001-5), Yunnan Basic Research Project (Grant Nos. 202001AS070014, 2018FB104), and Talent Fund for Kunming University of Science and Technology (Grant No. KKSY201703005).

Author information

Authors and Affiliations

Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, 650500, Yunnan, China
Ran Song, Shengxiang Gao, Zhengtao Yu, Yafei Zhang & Gaofeng Zhou
Yunnan Key Laboratory of Artificial Intelligence, Kunming University of Science and Technology, Kunming, 650500, Yunnan, China
Ran Song, Shengxiang Gao, Zhengtao Yu, Yafei Zhang & Gaofeng Zhou

Authors

Ran Song
View author publications
You can also search for this author in PubMed Google Scholar
Shengxiang Gao
View author publications
You can also search for this author in PubMed Google Scholar
Zhengtao Yu
View author publications
You can also search for this author in PubMed Google Scholar
Yafei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Gaofeng Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shengxiang Gao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Song, R., Gao, S., Yu, Z. et al. Case2vec: joint variational autoencoder for case text embedding representation. Int. J. Mach. Learn. & Cyber. 12, 2517–2528 (2021). https://doi.org/10.1007/s13042-021-01335-3

Download citation

Received: 17 July 2020
Accepted: 14 April 2021
Published: 07 July 2021
Issue Date: September 2021
DOI: https://doi.org/10.1007/s13042-021-01335-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Case2vec: joint variational autoencoder for case text embedding representation

Abstract

Access this article

Similar content being viewed by others

Integrating Topic Information into VAE for Text Semantic Similarity

Binary Autoencoder for Text Modeling

Text Classification with Document Embeddings

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Case2vec: joint variational autoencoder for case text embedding representation

Abstract

Access this article

Similar content being viewed by others

Integrating Topic Information into VAE for Text Semantic Similarity

Binary Autoencoder for Text Modeling

Text Classification with Document Embeddings

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation