Label-Embedding Bi-directional Attentive Model for Multi-label Text Classification

Liu, Naiyin; Wang, Qianlong; Ren, Jiangtao

doi:10.1007/s11063-020-10411-8

Label-Embedding Bi-directional Attentive Model for Multi-label Text Classification

Published: 01 January 2021

Volume 53, pages 375–389, (2021)
Cite this article

Neural Processing Letters Aims and scope Submit manuscript

1496 Accesses
16 Citations
Explore all metrics

Abstract

Multi-label text classification is a critical task in natural language processing field. As the latest language representation model, BERT obtains new state-of-the-art results in the classification task. Nevertheless, the text classification framework of BERT neglects to make full use of the token-level text representation and label embedding, since it only utilizes the final hidden state corresponding to CLS token as sequence-level text representation for classification. We assume that the finer-grained token-level text representation and label embedding contribute to classification. Consequently, in this paper, we propose a Label-Embedding Bi-directional Attentive model to improve the performance of BERT’s text classification framework. In particular, we extend BERT’s text classification framework with label embedding and bi-directional attention. Experimental results on the five datasets indicate that our model has notable improvements over both baselines and state-of-the-art models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Impact of word embedding models on text analytics in deep learning environment: a review

Article 22 February 2023

FakeBERT: Fake news detection in social media with a BERT-based deep learning approach

Article 07 January 2021

TextConvoNet: a convolutional neural network based architecture for text classification

Article 22 October 2022

Notes

References

Boutell MR, Luo J, Shen X, Brown CM (2004) Learning multi-label scene classification. Pattern Recogn 37(9):1757–1771
Article Google Scholar
Chen G, Ye D, Xing Z, Chen J, Cambria E (2017) Ensemble application of convolutional and recurrent neural networks for multi-label text categorization. In: 2017 international joint conference on neural networks (IJCNN). IEEE, pp 2377–2383
Clare A, King RD (2001) Knowledge discovery in multi-label phenotype data. Lect Notes Comput Sci 2168(2168):42–53
Article Google Scholar
Dembczynski K, Cheng W, Hüllermeier E (2010) Bayes optimal multilabel classification via probabilistic classifier chains. ICML 10:279–286
Google Scholar
Devlin J, Chang MW, Lee K, Toutanova K (2019) Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp 4171–4186
Elisseeff A, Weston J (2001) A kernel method for multi-labelled classification. In: Dietterich TG, Becker S, Ghahramani Z (eds) Advances in neural information processing systems 14. Neural information processing systems: natural and synthetic (NIPS). MIT Press, Vancouver, British Columbia, Canada, pp 681–687. http://papers.nips.cc/paper/1964-a-kernel-method-for-multi-labelled-classification
Gui Y, Gao Z, Li R, Yang X (2012) Hierarchical text classification for news articles based-on named entities. In: International conference on advanced data mining and applications. Springer, pp 318–329
Kim Y (2014) Convolutional neural networks for sentence classification. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1746–1751
Kingma DP, Ba JL (2015) Adam: a method for stochastic optimization. In: International conference on learning representations
Kurata G, Xiang B, Zhou B (2016) Improved neural network-based multi-label classification with better initialization leveraging label co-occurrence. In: Proceedings of the 2016 conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 521–526
Li J, Ren F (2011) Creating a Chinese emotion lexicon based on corpus REN-CECPS. In: 2011 IEEE international conference on cloud computing and intelligence systems. IEEE, pp 80–84
Lin J, Su Q, Yang P, Ma S, Sun X (2018) Semantic-unit-based dilated convolution for multi-label text classification. In: Proceedings of the 2018 conference on empirical methods in natural language processing, pp 4554–4564
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Burges CJC, Bottou L, Ghahramani Z, Weinberger KQ (eds) Advances in neural information processing systems 26: 27th annual conference on neural information processing systems 2013. Proceedings of a meeting held December 5–8, 2013, Lake Tahoe, Nevada, United States, pp 3111–3119. http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality
Mullenbach J, Wiegreffe S, Duke J, Sun J, Eisenstein J (2018) Explainable prediction of medical codes from clinical text. In: NAACL HLT 2018: 16th annual conference of the North American Chapter of the Association for Computational Linguistics: human language technologies, vol 1, pp 1101–1111
Nam J, Mencía EL, Kim HJ, Fürnkranz J (2017) Maximizing subset accuracy with recurrent neural networks in multi-label classification. In: Guyon I, von Luxburg U, Bengio S, Wallach HM, Fergus R, Vishwanathan SVN, Garnett R (eds) Advances in neural information processing systems 30: annual conference on neural information processing systems 2017, December 4–9, 2017, Long Beach, CA, USA, pp 5413–5423. http://papers.nips.cc/paper/7125-maximizing-subset-accuracy-with-recurrent-neural-networks-in-multi-label-classification
Qin K, Li C, Pavlu V, Aslam J (2019) Adapting RNN sequence prediction model to multi-label set prediction. In: Proceedings of the 2019 conference of the North American Chapter of the Association for Computational Linguistics: human language technologies, Volume 1 (Long and Short Papers), pp 3181–3190
Radford A, Narasimhan K, Salimans T, Sutskever I (2018) Improving language understanding with unsupervised learning. Technical report, OpenAI
Read J, Pfahringer B, Holmes G, Frank E (2011) Classifier chains for multi-label classification. Mach Learn 85(3):333
Article MathSciNet Google Scholar
Tang J, Qu M, Mei Q (2015) PTE: Predictive text embedding through large-scale heterogeneous text networks. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1165–1174
Tsoumakas G, Katakis I (2007) Multi-label classification: an overview. Int J Data Warehousing Min (IJDWM) 3(3):1–13
Article Google Scholar
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008
Wang B, Li C, Pavlu V, Aslam J (2018) A pipeline for optimizing f1-measure in multi-label text classification. In: 2018 17th IEEE international conference on machine learning and applications (ICMLA). IEEE, pp 913–918
Wang G, Li C, Wang W, Zhang Y, Shen D, Zhang X, Henao R, Carin L (2018) Joint embedding of words and labels for text classification. Meeting Assoc Comput Linguist 1:2321–2331
Google Scholar
Wu Y, Schuster M, Chen Z, Le QV, Norouzi M, Macherey W, Krikun M, Cao Y, Gao Q, Macherey K et al (2016) Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144
Yang P, Sun X, Li W, Ma S, Wu W, Wang H (2018) SGM: sequence generation model for multi-label classification. In: Proceedings of the 27th international conference on computational linguistics, pp 3915–3926
Zhang H, Xiao L, Chen W, Wang Y, Jin Y (2018) Multi-task label embedding for text classification. In: Riloff E, Chiang D, Hockenmaier J, Tsujii J (eds) Proceedings of the 2018 conference on empirical methods in natural language processing, Brussels, Belgium, October 31–November 4, 2018. Association for Computational Linguistics, pp 4545–4553. https://www.aclweb.org/anthology/D18-1484/
Zhang ML, Zhou ZH (2006) Multilabel neural networks with applications to functional genomics and text categorization. IEEE Trans Knowl Data Eng 18(10):1338–1351
Article Google Scholar
Zhang ML, Zhou ZH (2007) ML-KNN: a lazy learning approach to multi-label learning. Pattern Recogn 40(7):2038–2048
Article Google Scholar

Download references

Acknowledgements

This work is partially supported by National Natural Science Foundation of China (No. U1711263).

Author information

Authors and Affiliations

School of Data and Computer Science, Guangdong Province Key Lab of Computational Science, Sun Yat-Sen University, Guangdong, 510275, People’s Republic of China
Naiyin Liu, Qianlong Wang & Jiangtao Ren

Authors

Naiyin Liu
View author publications
You can also search for this author in PubMed Google Scholar
Qianlong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jiangtao Ren
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jiangtao Ren.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, N., Wang, Q. & Ren, J. Label-Embedding Bi-directional Attentive Model for Multi-label Text Classification. Neural Process Lett 53, 375–389 (2021). https://doi.org/10.1007/s11063-020-10411-8

Download citation

Accepted: 05 December 2020
Published: 01 January 2021
Issue Date: February 2021
DOI: https://doi.org/10.1007/s11063-020-10411-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Label-Embedding Bi-directional Attentive Model for Multi-label Text Classification

Abstract

Access this article

Similar content being viewed by others

Impact of word embedding models on text analytics in deep learning environment: a review

FakeBERT: Fake news detection in social media with a BERT-based deep learning approach

TextConvoNet: a convolutional neural network based architecture for text classification

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Label-Embedding Bi-directional Attentive Model for Multi-label Text Classification

Abstract

Access this article

Similar content being viewed by others

Impact of word embedding models on text analytics in deep learning environment: a review

FakeBERT: Fake news detection in social media with a BERT-based deep learning approach

TextConvoNet: a convolutional neural network based architecture for text classification

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation