Elsevier

Future Generation Computer Systems

Volume 115, February 2021, Pages 610-618
Future Generation Computer Systems

Estimation of laryngeal closure duration during swallowing without invasive X-rays

https://doi.org/10.1016/j.future.2020.09.040Get rights and content

Highlights

  • Estimation of laryngeal closure duration of swallowing based on neck-sensor signals.

  • A hybrid deep learning structure is employed for modeling temporal dynamics.

  • Model was trained and validated by 588 swallows and tested on 45 swallows.

  • Our method achieved 78.94% and 74.89% accuracies for these two datasets.

Abstract

Laryngeal vestibule (LV) closure is a critical physiologic event during swallowing, since it is the first line of defense against food bolus entering the airway. Identifying the laryngeal vestibule status, including closure, reopening and closure duration, provides indispensable references for assessing the risk of dysphagia and neuromuscular function. However, commonly used radiographic examinations, known as videofluoroscopy swallowing studies, are highly constrained by their radiation exposure and cost. Here, we introduce a non-invasive sensor-based system, that acquires high-resolution cervical auscultation signals from the neck and accommodates advanced deep learning techniques for the detection of LV behaviors. The deep learning algorithm, which combined convolutional and recurrent neural networks, was developed with a dataset of 588 swallows from 120 patients with suspected dysphagia and further clinically tested on 45 samples from 16 healthy participants. For classifying the LV closure and opening statuses, our method achieved 78.94% and 74.89% accuracies for these two datasets, suggesting the feasibility of implementing sensor signals for LV prediction without traditional videofluoroscopy screening methods. The sensor supported system offers a broadly applicable computational approach for clinical diagnosis and biofeedback purposes in patients with swallowing disorders without the use of radiographic examination.

Introduction

For humans, the respiration and digestive systems share the same entrance, therefore, protecting the airway from food bolus entering the trachea or lungs is a fundamental requirement for safe swallowing. Laryngeal vestibule (LV) closure has been considered the first line of defense against swallowed material entering the airway [1], [2], [3]. Likewise, the duration of LV closure is a predictor of airway invasion during swallowing. If the laryngeal closure is absent or its duration is too short, this can lead to aspiration/penetration [4], [5]. Aspiration has been considered as a major concern for individuals with dysphagia (swallowing disorders), especially in neurologic and neurodegenerative diseases, where aspiration-related respiratory infections are a leading cause of death [6]. Therefore, proper evaluation of LV closure and duration could provide an objective outcome measure to improve the assessment of swallowing safety, provide clinical evidence of increased risk of airway compromise during swallowing, and guide the instigation of appropriate compensatory interventions.

The videofluoroscopy swallowing study (VFSS) is the only instrumental assessment technique that can visualize the event of LV closure and determine its duration during swallowing through the kinematic analysis of radiographic images [6], [7], [8]. However, practical issues will raise when the VFSS is implemented: it exposes patients to radiation, and is not feasible in all facilities without x-ray departments or qualified clinicians to perform and interpret the VFSS images. [9], [10]. Additionally, it is not suitable for the cases in which patients prefer not to undergo x-ray testing or when patients are unable to participate in the examination protocols [10], [11], [12], [13].

Furthermore, there are certain limitations of the ordinary clinical setting preventing the more frequent temporal analysis of swallowing events using VFSS images which provides quantification of LV closure at baseline and assessment of treatment efficacy. Frame-by-frame review of VFSS video is time-consuming and some clinicians may not have the ability to record VFSS images for secondary review due to lack of equipment or limited access to archived materials. Clinicians tend to comment on whether and at what phase of the swallow, the material enters the laryngeal vestibule without determining whether LV closure itself was shortened, further limiting inferences that can lead to treatment decisions [14].

Because of the previously mentioned drawbacks and limitations of VFSS in the detection of LV closure and reopening, it would be practically beneficial for patients and clinicians to investigate an alternative, non-invasive tool. High-resolution cervical auscultation (HRCA) is a promising non-invasive method for dysphagia screening assessment and management [15]. It uses high-resolution accelerometers and microphones, attached to patients’ necks to record vibratory and acoustic signals during swallowing [16], [17]. The advantages of such a sensor supported approach include mobility, cost-effectiveness, non-invasiveness, and suitability for day-to-day and even minute-to-minute monitoring [15], [18]. To investigate the relationship between the signals and the LV shifting, previous studies postulated the cardiac analogy hypothesis that explained the elusive physiologic cause of swallowing sounds [19]. This theory suggested that cervical auscultation acoustic signals are generated via vibrations caused by valve and pump systems within the vocal tract. Moreover, HRCA signal features have been found to be associated with LV closure onset and LV reopening [20]. The slapping of the epiglottis and aryepiglottic fold may provide the valve activity that generates swallowing sounds and neck vibration which can be recorded with HRCA.

All these studies indicated the possibility of detecting the LV closure and reopening, and a method of determining the closure duration solely based on the HRCA signals. However, no studies attempted to quantitatively implement such an idea. The main challenge was that the explicit dependencies between the signal features and the LV behaviors were not mathematically established. In this study, we sought to investigate the ability of HRCA signals to identify LV status with an advanced deep learning model, which approximated the relationship with training examples. We hypothesized that the computer-aided algorithm with HRCA signals which were acquired from the neck was able to detect the event of LV switching, and estimate the duration of LV closure.

The machine learning and deep learning methods have already become powerful tools in the health-care applications and widely employed in the computer-assisted diagnosis for swallowing, laryngeal and neck disorders or disease [21]. Based on larynx contact endoscopic video images, Esmaeili et al. attempted to apply support vector machine, k-nearest neighbor, and random forest to classify benign and malignant lesions on the superficial layers of laryngeal mucosa [22]. For early stage diagnosis of laryngeal squamous cell carcinoma, Moccia et al. implemented a support vector machine classifier with features extracted from the laryngeal endoscopic frames, and they achieved 93% sensitivity [23]. For the similar purpose, Araújo et al. applied transfer learning with pre-trained Convolutional Neural Network (CNN) models to process laryngoscopy images, and they achieved state-of-art performance [24]. Our previous study also performed multi-scale CNN filers for hyoid bone detection on the VFSS images [25]. All those studies were conducted based on images as model input. Only several studies attempted to use signals in time series as input and build up deep learning models to serve the swallowing or laryngeal applications. Our previous research verified that the on-neck sensor signals can effectively identify swallowing activity, track hoyid bone and segment upper esophageal sphincter opening with the help of deep learning [26], [27], [28]. However, more studies are needed for analyzing the LV closure.

In this study, we used a deep learning architecture with the combination of CNN and Recurrent Neural Network (RNN), which composed an artificial intelligence topology with high nonlinearity to build the relationship between the sensor signals and the LVC duration. To verify the efficacy of the proposed method, we developed the model with a dataset of 588 swallowing samples and a 10-fold subject cross-validation technique as used. Then we applied the model on an independent dataset with 45 new samples as a testing set for analyzing the capacity of generalization.

Section snippets

Data collection and equipment

This research aimed to estimate LV closure duration using HRCA signals by temporally identifying the LV status, including LV closure and reopening. We collected two sets of data. The first dataset was composed of 588 swallows from 120 enrolled patients with suspected dysphagia. The second dataset was composed of 45 swallows from 16 healthy participants. All the enrolled participants underwent VFSS at the University of Pittsburgh Medical Center Presbyterian Hospital. Participants in the first

Results

This research aimed to estimate LV closure duration using HRCA signals by identifying the LV status, including LV closure and reopening. The deep learning architecture with combined CNN and RNN was applied to fulfill this goal. To further investigate the efficacy of the proposed model, comparative studies were also conducted from two aspects: ablation studies and overlapping rate tuning for the sliding window.

Beside the proposed C-RNN model, we also implemented three types of baseline models in

Discussion

The VFSS has long been thought to be the only assessment tool for detecting laryngeal activities, although the limitations exist. To break these limitations, the primary aim of this study was to determine the feasibility of HRCA signals to predict LV status (opening or closure) with an advanced computer-aided approach and thus estimate the duration of LVC non-invasively. The HRCA signal features acquired by the sensors are strongly associated with several swallowing kinematic events including

Conclusion

In this research, we proposed a new method for detecting the LV closure and opening status based on the HRCA signals and hybrid deep learning algorithm. The results revealed that it is possible to identify LV status and further calculate LV closure duration solely based on the information provided by HRCA signals. This study demonstrates the feasibility of using the sensor as a potential non-invasive swallow screening method to judge the swallowing function. The sensor supported approach

CRediT authorship contribution statement

Shitong Mao: Methodology, Formal analysis, Software, Writing - original draft. Aliaa Sabry: Data curation, Resources, Investigation, Writing - original draft. Yassin Khalifa: Data Curation, Methodology, Investigation. James L. Coyle: Conceptualization, Data curation, Resources, Writing - review & editing. Ervin Sejdic: Writing - review & editing, Supervision, Project administration, Funding acquisition.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

Research reported in this publication was supported by the Eunice Kennedy Shriver National Institute of Child Health & Human Development of the National Institute of Health, USA under Award no. R01HD092239, while the data were collected under Award no. R01HD074819. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institute of Health.

Shitong Mao received the B.Sc. and M.Sc. degree from Harbin Institute of Technology, China, in 2008 and 2010 respectively. He is working toward the Ph.D. degree at the University of Pittsburgh, Swanson school of engineering. His current research interests include pattern recognition, machine learning, biomedical signal processing and computer vision.

References (44)

  • KurosuA. et al.

    Detection of swallow kinematic events from acoustic high-resolution cervical auscultation signals in patients with stroke

    Arch. Phys. Med. Rehabil.

    (2019)
  • FawcettT.

    An introduction to ROC analysis

    Pattern Recognit. Lett.

    (2006)
  • VoseA. et al.

    Hidden in plain sight: A descriptive review of laryngeal vestibule closure

    Dysphagia

    (2018)
  • EkbergO.

    Closure of the laryngeal vestibule during deglutition

    Acta Otolaryngol

    (1982)
  • EkbergO. et al.

    Cineradiography of the pharyngeal stage of deglutition in 150 individuals without dysphagia

    Br. J. Radiol.

    (1982)
  • ParkT. et al.

    Initiation and duration of laryngeal closure during the pharyngeal swallow in post-stroke patients

    Dysphagia

    (2010)
  • ShibataS. et al.

    The effect of bolus volume on laryngeal closure and UES opening in swallowing: Kinematic analysis using 320-row area detector CT study

    J. Oral Rehabil.

    (2017)
  • PowerM.L. et al.

    Deglutitive laryngeal closure in stroke patients

    J. Neurol. Neurosurg. Psychiatry

    (2007)
  • VilardellN. et al.

    Videofluoroscopic assessment of the pathophysiology of chronic poststroke oropharyngeal dysphagia

    Neurogastroenterol. Motility

    (2017)
  • CoyleJ.L. et al.

    Assessment and behavioral management of oropharyngeal dysphagia

    Curr. Opin. Otolaryngol. Head Neck Surg.

    (1997)
  • MaheshM.

    Fluoroscopy: Patient radiation exposure issues

    Radiographics

    (2001)
  • Zammit-MaempelI. et al.

    Radiation dose in videofluoroscopic swallow studies

    Dysphagia

    (2007)
  • NierengartenM.B.

    Evaluating dysphagia: Current approaches

    Oncol. Times

    (2009)
  • SteeleC. et al.

    Dysphagia service delivery by speech-language pathologists in Canada: Results of a national survey

    Can. J. Speech Lang. Pathol. Audiol.

    (2007)
  • BonilhaH.S. et al.

    Radiation exposure time during MBSS: Influence of swallowing impairment severity, medical diagnosis, clinician experience, and standardized protocol use

    Dysphagia

    (2013)
  • BonilhaH.S. et al.

    Preliminary investigation of the effect of pulse rate on judgments of swallowing impairment and treatment recommendations

    Dysphagia

    (2013)
  • SejdicE. et al.

    Computational deglutition: Using signal-and image-processing methods to understand swallowing and associated disorders [Life sciences]

    IEEE Signal Process. Mag.

    (2019)
  • MovahediF. et al.

    Anatomical directional dissimilarities in tri-axial swallowing accelerometry signals

    IEEE Trans. Neural Syst. Rehabil. Eng.

    (2017)
  • DudikJ.M. et al.

    Characteristics of dry chin-tuck swallowing vibrations and sounds

    IEEE Trans. Biomed. Eng.

    (2015)
  • DudikJ.M. et al.

    Dysphagia and its effects on swallowing sounds and vibrations in adults

    Biomed. Eng. Online

    (2018)
  • CicheroJ.A. et al.

    The physiologic cause of swallowing sounds: Answers from heart sounds and vocal tract acoustics

    Dysphagia

    (1998)
  • ResteghiniC. et al.

    Big data in head and neck cancer

    Curr. Treat. Options Oncol.

    (2018)
  • Cited by (15)

    • Automated pharyngeal phase detection and bolus localization in videofluoroscopic swallowing study: Killing two birds with one stone?

      2022, Computer Methods and Programs in Biomedicine
      Citation Excerpt :

      Several artificial intelligence applications in swallowing science have been proposed over the past few years, fostering novel approaches aimed at improving and automating swallowing and dysphagia assessment [7]. Most of these approaches involve the use of computer vision - for example, to analyze VFSS recordings [9,14–17,19,21–23] - or signal processing techniques to extract and analyze acoustic, accelerometric, and/or electromyographic signals [24–30]. Despite emerging evidence that wearable sensors may be able to detect key outcomes when VFSS is not available [26,27,31], VFSS still remains the most widely accepted imaging technique for assessing dysphagia [7].

    • Machine learning in the evaluation of voice and swallowing in the head and neck cancer patient

      2024, Current Opinion in Otolaryngology and Head and Neck Surgery
    • A Review of Recurrent Neural Network-Based Methods in Computational Physiology

      2023, IEEE Transactions on Neural Networks and Learning Systems
    View all citing articles on Scopus

    Shitong Mao received the B.Sc. and M.Sc. degree from Harbin Institute of Technology, China, in 2008 and 2010 respectively. He is working toward the Ph.D. degree at the University of Pittsburgh, Swanson school of engineering. His current research interests include pattern recognition, machine learning, biomedical signal processing and computer vision.

    Aliaa Sabry is a phoniatrician, and first year post doctoral research fellow in Krembil research institute at the University health network. She graduated from the Mansoura University, Egypt with a Bachelor of Medicine and Surgery (MBBCh) in 2006. She earned M.S.c in speech-language pathology from Ain Shams University, Egypt in 2011. She received her Ph.D., with specialization in Dysphagia, from Mansoura University, Egypt, in 2019, and practiced clinically as a phoniatrician with people with swallowing disorders. Before starting her post doctoral research fellow at the University health network, Toronto, Ontario, Canada, she worked as a researcher at the swallowing lab in the Department of Communication Science and Disorders at the University of Pittsburgh (from 2016 to 2019). Current research areas of interest include non-invasive instrumental methods of swallow screening and management through signal processing of patients’ voice and vibratory signals generated during swallowing.

    Yassin Khalifa received his B.S. and M.S. degrees in 2010 and 2013 respectively from Biomedical Engineering Department, Cairo University, Egypt.

    He is currently a Research Assistant at the department of Electrical and Computer Engineering, Swanson School of Engineering, University of Pittsburgh, PA, USA.

    From 2011 to 2016, Yassin worked as a teaching and research assistant for Cairo University and Nile University, Egypt, where he helped in teaching of several courses in both graduate and undergraduate levels and participated in multiple research projects focusing on big data analytics, machine learning and statistical analysis. His current research interests include biomedical signal processing, data science especially big data analytics and the applications of deep learning.

    James L Coyle received the Ph.D. degree in rehabilitation science from the University of Pittsburgh in 2008 with a focus in neuroscience. He is currently a Professor of Communication Sciences and Disorders at the School of Health and Rehabilitation Sciences, and Professor of Otolaryngology in the school of medicine, University of Pittsburgh, Pittsburgh, PA, USA. He maintains an active clinical practice in the Department of Otolaryngology, Head and Neck Surgery and the Speech Language Pathology Service, University of Pittsburgh Medical Center. Dr. Coyle is in the Board Certified by the American Board of Swallowing and Swallowing Disorders. He is a Fellow of the American Speech Language and Hearing Association.

    Ervin Sejdić received the B.E.Sc. and Ph.D. degrees in electrical engineering from the University of Western Ontario, London, ON, Canada, in 2002 and 2008, respectively.

    From 2008 to 2010, he was a Post-Doctoral Fellow with the University of Toronto, Toronto, ON, Canada, with a cross-appointment with Bloorview Kids Rehab, Toronto, ON, Canada, Canada’s largest children’s rehabilitation teaching hospital. From 2010 to 2011, he was a Research Fellow with the Harvard Medical School, Boston, MA, USA, with a cross-appointment with the Beth Israel Deaconess Medical Center. In 2011, he joined the Department of Electrical and Computer Engineering, University of Pittsburgh, Pittsburgh, PA, USA. as a tenure-track Assistant Professor. In 2017, he was promoted to a tenured Associate Professor. He holds secondary appointments with the Department of Bioengineering, Swanson School of Engineering, with the Department of Biomedical Informatics, School of Medicine, and with the Intelligent Systems Program, School of Computing and Information, University of Pittsburgh. His current research interests include biomedical signal processing, gait analyses, swallowing difficulties, advanced information systems in medicine, rehabilitation engineering, assistive technologies, and anticipatory medical devices.

    Dr. Sejdic was a recipient of many awards. As a graduate student, he was awarded two prestigious awards from the Natural Sciences and Engineering Research Council of Canada. In 2010, he was the recipient of the Melvin First Young Investigator’s Award from the Institute for Aging Research at Hebrew Senior Life, Boston, MA, USA. In 2016, President Obama named Prof. Sejdi\’{c} as a recipient of the Presidential Early Career Award for Scientists and Engineers, the highest honor bestowed by the U.S. Government on science and engineering professionals in the early stages of their independent research careers. In 2017, he was the recipient of the National Science Foundation CAREER Award, which is the National Science Foundation’s most prestigious award in support of career-development activities of those scholars who most effectively integrate research and education within the context of the mission of their organization.

    View full text