Abstract
Internet of Things (IoT) based voice interaction system, as a new artificial intelligence application, provides a new human–computer interaction mode. The more intelligent and efficient communication approach poses greater challenges to the semantic understanding module in the system. Facing with the complex and diverse interactive scenarios in practical applications, the academia and the industry urgently need more powerful Natural Language Understanding (NLU) methods as support. Intent Detection and Slot Filling joint task, as one of the core sub-tasks in NLU, has been widely used in different human–computer interaction scenarios. In the current era of deep learning, the joint task of Intent Detection and Slot Filling has also changed from previous rule-based methods to deep learning-based methods. It is an important problem to explore how to realize the models of these tasks to be refined and targeted designed, and to make the Intent Detection task better serve the improvement of precision of Slot Filling task by connecting the before and after tasks. It has great significance for building a more humanized IoT voice interaction system. In this study, we designed two joint models to realize Intent Detection and Slot Filling joint task. For the Intent Detection type task, one is based on BiGRU-Att-CapsuleNet (hybrid-based model) and the other is based on the RCNN model. Both methods use the BiGRU-CRF model for the Slot Filling type task. The hybrid-based model can enhance the semantic capture capability of a single model. And by combining specialized models built independently for each task to achieve a complete joint task, it can be better to achieve optimal performance on each task. This study also carried out detailed comparative experiments of tasks and joint tasks on multiple datasets. Experiments show that the joint models have achieved competitive results in 7 typical datasets included in multiple scenarios in English and Chinese compared with other models.
Similar content being viewed by others
References
Behera TM, Mohapatra SK, Samal UC, Khan MS, Daneshmand M, Gandomi AH (2019) Residual energy based cluster-head selection in wsns for iot application. IEEE Internet Things J 6:5132
CCKS2019: Shared tasks—2019 china conference on knowledge graph and semantic computing. CCKS (2019). http://www.ccks2019.cn/?page_id=62. Accessed 3 Aug 2019
Chen Q, Zhuo Z, Wang W (2019) Bert for joint intent classification and slot filling. arXiv preprint arXiv:1902.10909
Chen S, Yu S (2019) Wais: Word attention for joint intent detection and slot filling. Proc AAAI Conf Artif Intell 33:9927–9928
Chen T, Lin M, Li Y (2019) Joint intention detection and semantic slot filling based on blstm and attention. In: 2019 IEEE 4th international conference on cloud computing and big data analysis (ICCCBDA), pp 690–694. IEEE
Chen YN, Hakkani-Tür D, Tür G, Gao J, Deng L (2016) End-to-end memory networks with knowledge carryover for multi-turn spoken language understanding. In: Interspeech, pp 3245–3249
Coucke A, Saade A, Ball A, Bluche T, Caulier A, Leroy D, Doumouro C, Gisselbrecht T, Caltagirone F, Lavril T, et al (2018) Snips voice platform: an embedded spoken language understanding system for private-by-design voice interfaces. arXiv preprint arXiv:1805.10190
de Barcelos Silva A, Gomes MM, da Costa CA, da Rosa Righi R, Barbosa JLV, Pessin G, De Doncker G, Federizzi G (2020) Intelligent personal assistants: a systematic literature review. Expert Syst Appl 147:113193
Devlin J, Chang MW, Lee K, Toutanova K (2019) Bert: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp 4171–4186
Niu P, Chen Z, Song M (2019) A novel bi-directional interrelated model for joint intent detection and slot filling. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 5467–5471
Firdaus M, Bhatnagar S, Ekbal A, Bhattacharyya P (2018) Intent detection for spoken language understanding using a deep ensemble model. In: Pacific Rim international conference on artificial intelligence, pp 629–642. Springer
Firdaus M, Kumar A, Ekbal A, Bhattacharyya P (2019) A multi-task hierarchical approach for intent detection and slot filling. Knowl Based Syst 183:104846
Gong Y, Luo X, Zhu Y, Ou W, Li Z, Zhu M, Zhu KQ, Duan L, Chen X (2019) Deep cascade multi-task learning for slot filling in online shopping assistant. Proceedings of the AAAI conference on artificial intelligence 33:6465–6472
Goo CW, Gao G, Hsu YK, Huo CL, Chen TC, Hsu KW, Chen YN (2018) Slot-gated modeling for joint slot filling and intent prediction. In: Proceedings of the 2018 conference of the North American chapter of the association for computational linguistics: human language technologies, Volume 2 (Short Papers), pp 753–757
Gupta A, Hewitt J, Kirchhoff K (2019) Simple, fast, accurate intent classification and slot labeling for goal-oriented dialogue systems. In: Proceedings of the 20th annual SIGdial meeting on discourse and dialogue, pp 46–55
Hemphill CT, Godfrey JJ, Doddington GR (1990) The atis spoken language systems pilot corpus. In: Speech and natural language: proceedings of a workshop held at Hidden Valley, Pennsylvania, June 24–27, 1990
Iosif E, Klasinas I, Athanasopoulou G, Palogiannidi E, Georgiladakis S, Louka K, Potamianos A (2018) Speech understanding for spoken dialogue systems: from corpus harvesting to grammar rule induction. Comput Speech Lang 47:272–297
Jiao L, Yanling L, Min L (2019) Review of intent detection methods in the human-machine dialogue system. J Phys Conf Ser 1267:012059
Kim J, Jeong Y, Lee JH (2019) Speaker-informed time-and-content-aware attention for spoken language understanding. Comput Speech Lang 60:101022
Kim Y (2014) Convolutional neural networks for sentence classification. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1746–1751
Kranz M, Holleis P, Schmidt A (2010) Embedded interaction: Interacting with the internet of things. IEEE Internet Comput 14(2):46–53
Lai S, Xu L, Liu K, Zhao J (2015) Recurrent convolutional neural networks for text classification. In: Twenty-ninth AAAI conference on artificial intelligence
Li Y, Ni P, Peng J, Zhu J, Dai Z, Li G, Bai X (2019) A joint model of clinical domain classification and slot filling based on RCNN and BiGRU-CRF. In: 2019 IEEE international conference on big data (Big Data). IEEE, pp 6133–6135
Lin SC, Hsu CH, Talamonti W, Zhang Y, Oney S, Mars J, Tang L (2018) Adasa: A conversational in-vehicle digital assistant for advanced driver assistance features. In: The 31st annual ACM symposium on user interface software and technology. ACM, pp 531–542
Liu B, Lane I (2016) Attention-based recurrent neural network models for joint intent detection and slot filling. Interspeech 2016:685–689
Liu B, Lane I (2017) Multi-domain adversarial learning for slot filling in spoken language understanding. arXiv preprint arXiv:1711.11310
Liu Z, Shin J, Xu Y, Winata GI, Xu P, Madotto A, Fung P (2019) Zero-shot cross-lingual dialogue systems with transferable latent variables. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th International joint conference on natural language processing (EMNLP-IJCNLP), pp 1297–1303
Luria M, Hoffman G, Zuckerman O (2017) Comparing social robot, screen and voice interfaces for smart-home control. In: Proceedings of the 2017 CHI conference on human factors in computing systems, pp. 580–628. ACM
Matani J, Gervais P, Calvo M, Feuz S, Deselaers, T (2018) Matching language and accent in virtual assistant responses. Technical Disclosure Commons. https://www.tdcommons.org/dpubs_series/1239/. Accessed 19 Dec 2019
Matsuda M, Nonaka T, Hase T (2006) An av control method using natural language understanding. IEEE Trans Consumer Electr 52(3):990–997
Mehrabani M, Bangalore S, Stern B (2015) Personalized speech recognition for internet of things. In: 2015 IEEE 2nd world forum on internet of things (WF-IoT). IEEE, pp 369–374
Mesnil G, Dauphin Y, Yao K, Bengio Y, Deng L, Hakkani-Tur D, He X, Heck L, Tur G, Yu D et al (2014) Using recurrent neural networks for slot filling in spoken language understanding. IEEE/ACM Trans Audio Speech Lang Process 23(3):530–539
Mesnil G, He X, Deng L, Bengio Y (2013) Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding. In: Interspeech, pp 3771–3775
MIT-CSAIL: MIT restaurant corpus and mit movie corpus. MIT-CSAIL (2014). https://groups.csail.mit.edu/sls/downloads/. Accessed 15 Oct 2019
Morris RR, Kouddous K, Kshirsagar R, Schueller SM (2018) Towards an artificially empathic conversational agent for mental health applications: system design and user perceptions. J Med Internet Res 20(6):e10148
Ni P, Li Y, Zhu J, Peng J, Dai Z, Li G, Bai X (2019) Disease diagnosis prediction of emr based on BiGRU-ATT-capsnetwork model. In: 2019 IEEE international conference on big data (Big Data). IEEE, pp 6166–6168
Paranjothi A, Khan MS, Zeadally S, Pawar A, Hicks D (2019) GSTR: Secure multi-hop message dissemination in connected vehicles using social trust model. Internet Things 7:100071
Park SY, Byun J, Rim HC, Lee DG, Lim H (2010) Natural language-based user interface for mobile devices with limited resources. IEEE Trans Consumer Electr 56(4):2086–2092
Peng B, Yao K, Jing L, Wong KF (2015) Recurrent neural networks with external memory for spoken language understanding. In: Natural Language Processing and Chinese Computing. Springer, pp 25–35
Peng CY, Chen RC (2018) Voice recognition by google home and raspberry pi for smart socket control. In: 2018 Tenth international conference on advanced computational intelligence (ICACI). IEEE, pp 324–329
Pennington J, Socher R, Manning C (2014) Glove: Global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543
Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018) Deep contextualized word representations. In: Proceedings of NAACL-HLT, pp 2227–2237
Petnik J, Vanus J (2018) Design of smart home implementation within iot with natural language interface. IFAC-PapersOnLine 51(6):174–179
Pradhan A, Mehta K, Findlater L (2018) Accessibility came by accident: use of voice-controlled intelligent personal assistants by people with disabilities. In: Proceedings of the 2018 CHI conference on human factors in computing systems. ACM, p 459
Reis A, Paulino D, Paredes H, Barroso J (2017) Using intelligent personal assistants to strengthen the elderlies’ social bonds. In: International conference on universal access in human–computer interaction. Springer, pp 593–602
Rubio-Drosdov E, Díaz-Sánchez D, Almenárez F, Arias-Cabarcos P, Marín A (2017) Seamless human-device interaction in the internet of things. IEEE Trans Consumer Electr 63(4):490–498
Saad U, Afzal U, El-Issawi A, Eid M (2017) A model to measure qoe for virtual personal assistant. Multimed Tools Appl 76(10):12517–12537
Sabour S, Frosst N, Hinton GE (2017) Dynamic routing between capsules. In: Advances in neural information processing systems, pp 3856–3866
Santos J, Rodrigues JJ, Casal J, Saleem K, Denisov V (2016) Intelligent personal assistants based on internet of things approaches. IEEE Syst J 12(2):1793–1802
Sekaran K, Khan MS, Patan R, Gandomi AH, Krishna PV, Kallam S (2019) Improving the response time of m-learning and cloud computing environments using a dominant firefly approach. IEEE Access 7:30203–30212
Shilin I, Kovriguina L, Mouromtsev D, Wohlgenannt G, Ivanitskiy R (2018) A method for dataset creation for dialogue state classification in voice control systems for the internet of things. In: R. Piotrowski’s readings in language engineering and applied linguistics, pp 96–106
Shridhar K, Dash A, Sahu A, Pihlgren GG, Alonso P, Pondenkandath V, Kovács G, Simistira F, Liwicki M (2019) Subword semantic hashing for intent classification on small datasets. In: 2019 International joint conference on neural networks (IJCNN). IEEE, pp 1–6
Siddhant A, Goyal A, Metallinou A (2019) Unsupervised transfer learning for spoken language understanding in intelligent agents. Proceedings of the AAAI conference on artificial intelligence 33:4959–4966
Singanamalla V, Patan R, Khan MS, Kallam S (2019) Reliable and energy-efficient emergency transmission in wireless sensor networks. Internet Technol Lett 2(2):e91
Snipsco: Nlu-benchmark. Github (2019). https://www.github.com/snipsco/nlu-benchmark. Accessed 07 Oct 2019
sz128: Slot filling and intent detection of SLU. Github (2019). https://www.github.com/sz128/slot_filling_and_intent_detection_of_SLU. Accessed 15 Oct 2019
Vtyurina A, Fourney A (2018) Exploring the role of conversational cues in guided task support with virtual assistants. In: Proceedings of the 2018 CHI conference on human factors in computing systems. ACM, p 208
Vu NT (2016) Sequential convolutional neural networks for slot filling in spoken language understanding. Interspeech 2016:3250–3254
Wang Y, Tang L, He T (2018) Attention-based cnn-blstm networks for joint intent detection and slot filling. In: Chinese computational linguistics and natural language processing based on naturally annotated big data. Springer, pp 250–261
Xu C, Li Q, Zhang D, Cui J, Sun Z, Zhou H (2020) A model with length-variable attention for spoken language understanding. Neurocomputing 379:197–202
Xu P, Sarikaya R (2013) Convolutional neural network based triangular crf for joint intent detection and slot filling. In: 2013 IEEE workshop on automatic speech recognition and understanding. IEEE, pp 78–83
Yao K, Peng B, Zhang Y, Yu D, Zweig G, Shi Y (2014) Spoken language understanding using long short-term memory neural networks. In: 2014 IEEE spoken language technology workshop (SLT). IEEE, pp 189–194
Yao K, Zweig G, Hwang MY, Shi Y, Yu D (2013) Recurrent neural networks for language understanding. In: Interspeech, pp 2524–2528
Yoo KM, Shin Y, Lee Sg (2019) Data augmentation for spoken language understanding via joint variational generation. Proceedings of the AAAI conference on artificial intelligence 33:7402–7409
Yu S, Shen L, Zhu P, Chen J (2018) ACJIS: A novel attentive cross approach for joint intent detection and slot filling. In: 2018 International joint conference on neural networks (IJCNN). IEEE, pp 1–7
yuanxiaosc: Smp2018. Github (2018). https://github.com/yuanxiaosc/SMP2018. Accessed 14 Oct 2019
Zhang X, Wang H (2016) A joint model of intent determination and slot filling for spoken language understanding. IJCAI 16:2993–2999
Zhu S, Yu K (2017) Encoder-decoder with focus-mechanism for sequence labelling based spoken language understanding. In: 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 5675–5679
Funding
Funding was provided by VC Research (Grant No. VCR 0000021).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declared that they have no conflicts of interest to this work.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ni, P., Li, Y., Li, G. et al. Natural language understanding approaches based on joint task of intent detection and slot filling for IoT voice interaction. Neural Comput & Applic 32, 16149–16166 (2020). https://doi.org/10.1007/s00521-020-04805-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-020-04805-x