Skip to main content

Advertisement

Log in

Natural language understanding approaches based on joint task of intent detection and slot filling for IoT voice interaction

  • S.I. : Applying Artificial Intelligence to the Internet of Things
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Internet of Things (IoT) based voice interaction system, as a new artificial intelligence application, provides a new human–computer interaction mode. The more intelligent and efficient communication approach poses greater challenges to the semantic understanding module in the system. Facing with the complex and diverse interactive scenarios in practical applications, the academia and the industry urgently need more powerful Natural Language Understanding (NLU) methods as support. Intent Detection and Slot Filling joint task, as one of the core sub-tasks in NLU, has been widely used in different human–computer interaction scenarios. In the current era of deep learning, the joint task of Intent Detection and Slot Filling has also changed from previous rule-based methods to deep learning-based methods. It is an important problem to explore how to realize the models of these tasks to be refined and targeted designed, and to make the Intent Detection task better serve the improvement of precision of Slot Filling task by connecting the before and after tasks. It has great significance for building a more humanized IoT voice interaction system. In this study, we designed two joint models to realize Intent Detection and Slot Filling joint task. For the Intent Detection type task, one is based on BiGRU-Att-CapsuleNet (hybrid-based model) and the other is based on the RCNN model. Both methods use the BiGRU-CRF model for the Slot Filling type task. The hybrid-based model can enhance the semantic capture capability of a single model. And by combining specialized models built independently for each task to achieve a complete joint task, it can be better to achieve optimal performance on each task. This study also carried out detailed comparative experiments of tasks and joint tasks on multiple datasets. Experiments show that the joint models have achieved competitive results in 7 typical datasets included in multiple scenarios in English and Chinese compared with other models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Behera TM, Mohapatra SK, Samal UC, Khan MS, Daneshmand M, Gandomi AH (2019) Residual energy based cluster-head selection in wsns for iot application. IEEE Internet Things J 6:5132

    Article  Google Scholar 

  2. CCKS2019: Shared tasks—2019 china conference on knowledge graph and semantic computing. CCKS (2019). http://www.ccks2019.cn/?page_id=62. Accessed 3 Aug 2019

  3. Chen Q, Zhuo Z, Wang W (2019) Bert for joint intent classification and slot filling. arXiv preprint arXiv:1902.10909

  4. Chen S, Yu S (2019) Wais: Word attention for joint intent detection and slot filling. Proc AAAI Conf Artif Intell 33:9927–9928

    Google Scholar 

  5. Chen T, Lin M, Li Y (2019) Joint intention detection and semantic slot filling based on blstm and attention. In: 2019 IEEE 4th international conference on cloud computing and big data analysis (ICCCBDA), pp 690–694. IEEE

  6. Chen YN, Hakkani-Tür D, Tür G, Gao J, Deng L (2016) End-to-end memory networks with knowledge carryover for multi-turn spoken language understanding. In: Interspeech, pp 3245–3249

  7. Coucke A, Saade A, Ball A, Bluche T, Caulier A, Leroy D, Doumouro C, Gisselbrecht T, Caltagirone F, Lavril T, et al (2018) Snips voice platform: an embedded spoken language understanding system for private-by-design voice interfaces. arXiv preprint arXiv:1805.10190

  8. de Barcelos Silva A, Gomes MM, da Costa CA, da Rosa Righi R, Barbosa JLV, Pessin G, De Doncker G, Federizzi G (2020) Intelligent personal assistants: a systematic literature review. Expert Syst Appl 147:113193

    Article  Google Scholar 

  9. Devlin J, Chang MW, Lee K, Toutanova K (2019) Bert: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp 4171–4186

  10. Niu P, Chen Z, Song M (2019) A novel bi-directional interrelated model for joint intent detection and slot filling. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 5467–5471

  11. Firdaus M, Bhatnagar S, Ekbal A, Bhattacharyya P (2018) Intent detection for spoken language understanding using a deep ensemble model. In: Pacific Rim international conference on artificial intelligence, pp 629–642. Springer

  12. Firdaus M, Kumar A, Ekbal A, Bhattacharyya P (2019) A multi-task hierarchical approach for intent detection and slot filling. Knowl Based Syst 183:104846

    Article  Google Scholar 

  13. Gong Y, Luo X, Zhu Y, Ou W, Li Z, Zhu M, Zhu KQ, Duan L, Chen X (2019) Deep cascade multi-task learning for slot filling in online shopping assistant. Proceedings of the AAAI conference on artificial intelligence 33:6465–6472

    Article  Google Scholar 

  14. Goo CW, Gao G, Hsu YK, Huo CL, Chen TC, Hsu KW, Chen YN (2018) Slot-gated modeling for joint slot filling and intent prediction. In: Proceedings of the 2018 conference of the North American chapter of the association for computational linguistics: human language technologies, Volume 2 (Short Papers), pp 753–757

  15. Gupta A, Hewitt J, Kirchhoff K (2019) Simple, fast, accurate intent classification and slot labeling for goal-oriented dialogue systems. In: Proceedings of the 20th annual SIGdial meeting on discourse and dialogue, pp 46–55

  16. Hemphill CT, Godfrey JJ, Doddington GR (1990) The atis spoken language systems pilot corpus. In: Speech and natural language: proceedings of a workshop held at Hidden Valley, Pennsylvania, June 24–27, 1990

  17. Iosif E, Klasinas I, Athanasopoulou G, Palogiannidi E, Georgiladakis S, Louka K, Potamianos A (2018) Speech understanding for spoken dialogue systems: from corpus harvesting to grammar rule induction. Comput Speech Lang 47:272–297

    Article  Google Scholar 

  18. Jiao L, Yanling L, Min L (2019) Review of intent detection methods in the human-machine dialogue system. J Phys Conf Ser 1267:012059

    Article  Google Scholar 

  19. Kim J, Jeong Y, Lee JH (2019) Speaker-informed time-and-content-aware attention for spoken language understanding. Comput Speech Lang 60:101022

    Article  Google Scholar 

  20. Kim Y (2014) Convolutional neural networks for sentence classification. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1746–1751

  21. Kranz M, Holleis P, Schmidt A (2010) Embedded interaction: Interacting with the internet of things. IEEE Internet Comput 14(2):46–53

    Article  Google Scholar 

  22. Lai S, Xu L, Liu K, Zhao J (2015) Recurrent convolutional neural networks for text classification. In: Twenty-ninth AAAI conference on artificial intelligence

  23. Li Y, Ni P, Peng J, Zhu J, Dai Z, Li G, Bai X (2019) A joint model of clinical domain classification and slot filling based on RCNN and BiGRU-CRF. In: 2019 IEEE international conference on big data (Big Data). IEEE, pp 6133–6135

  24. Lin SC, Hsu CH, Talamonti W, Zhang Y, Oney S, Mars J, Tang L (2018) Adasa: A conversational in-vehicle digital assistant for advanced driver assistance features. In: The 31st annual ACM symposium on user interface software and technology. ACM, pp 531–542

  25. Liu B, Lane I (2016) Attention-based recurrent neural network models for joint intent detection and slot filling. Interspeech 2016:685–689

    Article  Google Scholar 

  26. Liu B, Lane I (2017) Multi-domain adversarial learning for slot filling in spoken language understanding. arXiv preprint arXiv:1711.11310

  27. Liu Z, Shin J, Xu Y, Winata GI, Xu P, Madotto A, Fung P (2019) Zero-shot cross-lingual dialogue systems with transferable latent variables. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th International joint conference on natural language processing (EMNLP-IJCNLP), pp 1297–1303

  28. Luria M, Hoffman G, Zuckerman O (2017) Comparing social robot, screen and voice interfaces for smart-home control. In: Proceedings of the 2017 CHI conference on human factors in computing systems, pp. 580–628. ACM

  29. Matani J, Gervais P, Calvo M, Feuz S, Deselaers, T (2018) Matching language and accent in virtual assistant responses. Technical Disclosure Commons. https://www.tdcommons.org/dpubs_series/1239/. Accessed 19 Dec 2019

  30. Matsuda M, Nonaka T, Hase T (2006) An av control method using natural language understanding. IEEE Trans Consumer Electr 52(3):990–997

    Article  Google Scholar 

  31. Mehrabani M, Bangalore S, Stern B (2015) Personalized speech recognition for internet of things. In: 2015 IEEE 2nd world forum on internet of things (WF-IoT). IEEE, pp 369–374

  32. Mesnil G, Dauphin Y, Yao K, Bengio Y, Deng L, Hakkani-Tur D, He X, Heck L, Tur G, Yu D et al (2014) Using recurrent neural networks for slot filling in spoken language understanding. IEEE/ACM Trans Audio Speech Lang Process 23(3):530–539

    Article  Google Scholar 

  33. Mesnil G, He X, Deng L, Bengio Y (2013) Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding. In: Interspeech, pp 3771–3775

  34. MIT-CSAIL: MIT restaurant corpus and mit movie corpus. MIT-CSAIL (2014). https://groups.csail.mit.edu/sls/downloads/. Accessed 15 Oct 2019

  35. Morris RR, Kouddous K, Kshirsagar R, Schueller SM (2018) Towards an artificially empathic conversational agent for mental health applications: system design and user perceptions. J Med Internet Res 20(6):e10148

    Article  Google Scholar 

  36. Ni P, Li Y, Zhu J, Peng J, Dai Z, Li G, Bai X (2019) Disease diagnosis prediction of emr based on BiGRU-ATT-capsnetwork model. In: 2019 IEEE international conference on big data (Big Data). IEEE, pp 6166–6168

  37. Paranjothi A, Khan MS, Zeadally S, Pawar A, Hicks D (2019) GSTR: Secure multi-hop message dissemination in connected vehicles using social trust model. Internet Things 7:100071

    Article  Google Scholar 

  38. Park SY, Byun J, Rim HC, Lee DG, Lim H (2010) Natural language-based user interface for mobile devices with limited resources. IEEE Trans Consumer Electr 56(4):2086–2092

    Article  Google Scholar 

  39. Peng B, Yao K, Jing L, Wong KF (2015) Recurrent neural networks with external memory for spoken language understanding. In: Natural Language Processing and Chinese Computing. Springer, pp 25–35

  40. Peng CY, Chen RC (2018) Voice recognition by google home and raspberry pi for smart socket control. In: 2018 Tenth international conference on advanced computational intelligence (ICACI). IEEE, pp 324–329

  41. Pennington J, Socher R, Manning C (2014) Glove: Global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543

  42. Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018) Deep contextualized word representations. In: Proceedings of NAACL-HLT, pp 2227–2237

  43. Petnik J, Vanus J (2018) Design of smart home implementation within iot with natural language interface. IFAC-PapersOnLine 51(6):174–179

    Article  Google Scholar 

  44. Pradhan A, Mehta K, Findlater L (2018) Accessibility came by accident: use of voice-controlled intelligent personal assistants by people with disabilities. In: Proceedings of the 2018 CHI conference on human factors in computing systems. ACM, p 459

  45. Reis A, Paulino D, Paredes H, Barroso J (2017) Using intelligent personal assistants to strengthen the elderlies’ social bonds. In: International conference on universal access in human–computer interaction. Springer, pp 593–602

  46. Rubio-Drosdov E, Díaz-Sánchez D, Almenárez F, Arias-Cabarcos P, Marín A (2017) Seamless human-device interaction in the internet of things. IEEE Trans Consumer Electr 63(4):490–498

    Article  Google Scholar 

  47. Saad U, Afzal U, El-Issawi A, Eid M (2017) A model to measure qoe for virtual personal assistant. Multimed Tools Appl 76(10):12517–12537

    Article  Google Scholar 

  48. Sabour S, Frosst N, Hinton GE (2017) Dynamic routing between capsules. In: Advances in neural information processing systems, pp 3856–3866

  49. Santos J, Rodrigues JJ, Casal J, Saleem K, Denisov V (2016) Intelligent personal assistants based on internet of things approaches. IEEE Syst J 12(2):1793–1802

    Article  Google Scholar 

  50. Sekaran K, Khan MS, Patan R, Gandomi AH, Krishna PV, Kallam S (2019) Improving the response time of m-learning and cloud computing environments using a dominant firefly approach. IEEE Access 7:30203–30212

    Article  Google Scholar 

  51. Shilin I, Kovriguina L, Mouromtsev D, Wohlgenannt G, Ivanitskiy R (2018) A method for dataset creation for dialogue state classification in voice control systems for the internet of things. In: R. Piotrowski’s readings in language engineering and applied linguistics, pp 96–106

  52. Shridhar K, Dash A, Sahu A, Pihlgren GG, Alonso P, Pondenkandath V, Kovács G, Simistira F, Liwicki M (2019) Subword semantic hashing for intent classification on small datasets. In: 2019 International joint conference on neural networks (IJCNN). IEEE, pp 1–6

  53. Siddhant A, Goyal A, Metallinou A (2019) Unsupervised transfer learning for spoken language understanding in intelligent agents. Proceedings of the AAAI conference on artificial intelligence 33:4959–4966

    Article  Google Scholar 

  54. Singanamalla V, Patan R, Khan MS, Kallam S (2019) Reliable and energy-efficient emergency transmission in wireless sensor networks. Internet Technol Lett 2(2):e91

    Article  Google Scholar 

  55. Snipsco: Nlu-benchmark. Github (2019). https://www.github.com/snipsco/nlu-benchmark. Accessed 07 Oct 2019

  56. sz128: Slot filling and intent detection of SLU. Github (2019). https://www.github.com/sz128/slot_filling_and_intent_detection_of_SLU. Accessed 15 Oct 2019

  57. Vtyurina A, Fourney A (2018) Exploring the role of conversational cues in guided task support with virtual assistants. In: Proceedings of the 2018 CHI conference on human factors in computing systems. ACM, p 208

  58. Vu NT (2016) Sequential convolutional neural networks for slot filling in spoken language understanding. Interspeech 2016:3250–3254

    Article  Google Scholar 

  59. Wang Y, Tang L, He T (2018) Attention-based cnn-blstm networks for joint intent detection and slot filling. In: Chinese computational linguistics and natural language processing based on naturally annotated big data. Springer, pp 250–261

  60. Xu C, Li Q, Zhang D, Cui J, Sun Z, Zhou H (2020) A model with length-variable attention for spoken language understanding. Neurocomputing 379:197–202

    Article  Google Scholar 

  61. Xu P, Sarikaya R (2013) Convolutional neural network based triangular crf for joint intent detection and slot filling. In: 2013 IEEE workshop on automatic speech recognition and understanding. IEEE, pp 78–83

  62. Yao K, Peng B, Zhang Y, Yu D, Zweig G, Shi Y (2014) Spoken language understanding using long short-term memory neural networks. In: 2014 IEEE spoken language technology workshop (SLT). IEEE, pp 189–194

  63. Yao K, Zweig G, Hwang MY, Shi Y, Yu D (2013) Recurrent neural networks for language understanding. In: Interspeech, pp 2524–2528

  64. Yoo KM, Shin Y, Lee Sg (2019) Data augmentation for spoken language understanding via joint variational generation. Proceedings of the AAAI conference on artificial intelligence 33:7402–7409

    Article  Google Scholar 

  65. Yu S, Shen L, Zhu P, Chen J (2018) ACJIS: A novel attentive cross approach for joint intent detection and slot filling. In: 2018 International joint conference on neural networks (IJCNN). IEEE, pp 1–7

  66. yuanxiaosc: Smp2018. Github (2018). https://github.com/yuanxiaosc/SMP2018. Accessed 14 Oct 2019

  67. Zhang X, Wang H (2016) A joint model of intent determination and slot filling for spoken language understanding. IJCAI 16:2993–2999

    Google Scholar 

  68. Zhu S, Yu K (2017) Encoder-decoder with focus-mechanism for sequence labelling based spoken language understanding. In: 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 5675–5679

Download references

Funding

Funding was provided by VC Research (Grant No. VCR 0000021).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Victor Chang.

Ethics declarations

Conflict of interest

The authors declared that they have no conflicts of interest to this work.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ni, P., Li, Y., Li, G. et al. Natural language understanding approaches based on joint task of intent detection and slot filling for IoT voice interaction. Neural Comput & Applic 32, 16149–16166 (2020). https://doi.org/10.1007/s00521-020-04805-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-020-04805-x

Keywords

Navigation