Skip to main content

Advertisement

Log in

Multimodal temporal machine learning for Bipolar Disorder and Depression Recognition

  • Original Article
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

Mental disorder is a serious public health concern that affects the life of millions of people throughout the world. Early diagnosis is essential to ensure timely treatment and to improve the well-being of those affected by a mental disorder. In this paper, we present a novel multimodal framework to perform mental disorder recognition from videos. The proposed approach employs a combination of audio, video and textual modalities. Using recurrent neural network architectures, we incorporate the temporal information in the learning process and model the dynamic evolution of the features extracted for each patient. For multimodal fusion, we propose an efficient late fusion strategy based on a simple feed-forward neural network that we call adaptive nonlinear judge classifier. We evaluate the proposed framework on two mental disorder datasets. On both, the experimental results demonstrate that the proposed framework outperforms the state-of-the-art approaches. We also study the importance of each modality for mental disorder recognition and infer interesting conclusions about the temporal nature of each modality. Our findings demonstrate that careful consideration of the temporal evolution of each modality is of crucial importance to accurately perform mental disorder recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Data availability

The Bipolar Disorder Corpus was part of the AVEC2018 challenge and it can be accessed by contacting the authors. The Well-Being dataset is a private dataset collected at University of Cambridge by Dr Marwa Mahmoud.

Code availability

Code can be found at https://github.com/cecca46/MentalDisorderRecogntion

Notes

  1. the test set is not accessible as it is reserved for evaluation in the AVEC 2018 Workshop and Challenge [14].

  2. https://cloud.google.com/speech-to-text

  3. Analogously to the concept of thin-slice, we used the word “action” to refer to a piece of relevant information—or change—about a modality (for instance, an eyebrow raise or an head shake) which can be captured in a small fragment of a video. We refer to “temporal interval” as the (minimum) amount of time the video fragment has to last for in order to capture that information.

References

  1. Ritchie Hannah, Roser Max (2020) Mental health. Our World in Data. https://ourworldindata.org/mental-health

  2. Lisa Dixon, Leticia Postrado, Janine Delahanty, Fischer Pamela J, Anthony Lehman (1999) The association of medical comorbidity in schizophrenia with poor physical and mental health. J Nerv Ment Dis 187(8):496–502

    Article  Google Scholar 

  3. Francine Cournos, McKinnon Karen M, Greer Sullivan (2005) Schizophrenia and comorbid human immunodeficiency virus or hepatitis c virus. J Clin Psychiatry 66:2005

    Google Scholar 

  4. Ferrari Alize J, Charlson Fiona J, Norman Rosana E, Patten Scott B, Freedman Greg, Murray Christopher JL, Vos Theo, Whiteford Harvey A (2013) Burden of depressive disorders by country, sex, age, and year: findings from the global burden of disease study 2010. PLoS medicine, 10(11)

  5. Lucio Ghio, Simona Gotelli, Maurizio Marcenaro, Mario Amore, Werner Natta (2014) Duration of untreated illness and outcomes in unipolar depression: a systematic review and meta-analysis. J Affect Disord 152:45–51

    Google Scholar 

  6. Carlo Altamura A, Bernardo Dell’Osso, Berlin Heather A, Massimiliano Buoli, Roberta Bassetti, Emanuela Mundo (2010) Duration of untreated illness and suicide in bipolar disorder: a naturalistic study. Eur Arch Psychiatry Clin Neurosci 260(5):385–391

    Article  Google Scholar 

  7. Cheung Ricky, O’Donnell Siobhan Madi Nawaf et al (2017) Factors associated with delayed diagnosis of mood and/or anxiety disorders. Health promotion and chronic disease prevention in Canada: research, policy and practice 37(5):137

    Article  Google Scholar 

  8. Kazdin Alan E, Blase Stacey L (2011) Rebooting psychotherapy research and practice to reduce the burden of mental illness. Perspectives on psychological science 6(1):21–37

    Article  Google Scholar 

  9. Wang Philip S, Patricia Berglund, Mark Olfson, Pincus Harold A, Wells Kenneth B, Kessler Ronald C (2005) Failure and delay in initial treatment contact after first onset of mental disorders in the national comorbidity survey replication. Arch Gen Psychiatry 62(6):603–613

    Article  Google Scholar 

  10. Williamson James R, Quatieri Thomas F, Helfer Brian S, Horwitz Rachelle, Yu Bea, Mehta Daryush D (2013) Vocal biomarkers of depression based on motor incoordination. In Proceedings of the 3rd ACM international workshop on Audio/visual emotion challenge, pages 41–48

  11. Kaya Heysem, Salah Albert Ali (2014) Eyes whisper depression: A cca based multimodal approach. In Proceedings of the 22nd ACM international conference on Multimedia, pages 961–964

  12. Çiftçi Elvan, Kaya Heysem, Güleç Hüseyin, Salah Albert Ali (2018) The turkish audio-visual bipolar disorder corpus. In 2018 First Asian Conference on Affective Computing and Intelligent Interaction (ACII Asia), pages 1–6. IEEE

  13. Yang Le, Li Yan, Chen Haifeng, Jiang Dongmei, Oveneke Meshia Cédric, Sahli Hichem (2018) Bipolar disorder recognition with histogram features of arousal and body gestures. In Proceedings of the 2018 on Audio/Visual Emotion Challenge and Workshop, pages 15–21

  14. Ringeval Fabien, Schuller Björn, Valstar Michel, Cowie Roddy, Kaya Heysem, Schmitt Maximilian, Amiriparian Shahin, Cummins Nicholas, Lalanne Denis, Michaud Adrien, et al. (2018) Avec 2018 workshop and challenge: Bipolar disorder and cross-cultural affect recognition. In Proceedings of the 2018 on Audio/Visual Emotion Challenge and Workshop, pages 3–13. ACM

  15. Tadas Baltrušaitis, Chaitanya Ahuja, Louis-Philippe Morency (2018) Multimodal machine learning: a survey and taxonomy. IEEE Trans Pattern Anal Mach Intell 41(2):423–443

    Google Scholar 

  16. Ngiam Jiquan, Khosla Aditya, Kim Mingyu, Nam Juhan, Lee Honglak, Ng Andrew Y (2011) Multimodal deep learning. In Proceedings of the 28th international conference on machine learning (ICML-11), pages 689–696

  17. Snoek Cees GM, Worring Marcel, Smeulders Arnold WM (2005) Early versus late fusion in semantic video analysis. In Proceedings of the 13th annual ACM international conference on Multimedia, pages 399–402

  18. Song Yale, Morency Louis-Philippe, Davis Randall (2013) Learning a sparse codebook of facial and body microexpressions for emotion recognition. In Proceedings of the 15th ACM on International conference on multimodal interaction, pages 237–244

  19. Hardoon David R, Sandor Szedmak, John Shawe-Taylor (2004) Canonical correlation analysis: An overview with application to learning methods. Neural Comput 16(12):2639–2664

    Article  Google Scholar 

  20. Dibeklioğlu Hamdi, Hammal Zakia, Yang Ying, Cohn Jeffrey F (2015) Multimodal detection of depression in clinical interviews. In Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, pages 307–310

  21. Hanchuan Peng, Fuhui Long, Chris Ding (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238

    Article  Google Scholar 

  22. Alghowinem Sharifa, Goecke Roland, Cohn Jeffrey F, Wagner Michael, Parker Gordon, Breakspear Michael (2015) Cross-cultural detection of depression from nonverbal behaviour. In 2015 11th IEEE International conference and workshops on automatic face and gesture recognition (FG), volume 1, pages 1–8. IEEE

  23. Corinna Cortes, Vladimir Vapnik (1995) Support-vector networks. Machine learning 20(3):273–297

    MATH  Google Scholar 

  24. Huang Jian, Li Ya, Tao Jianhua, Lian Zheng, Wen Zhengqi, Yang Minghao, Yi Jiangyan (2017) Continuous multimodal emotion prediction based on long short term memory recurrent neural network. In Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge, pages 11–18

  25. Awad Mariette, Khanna Rahul (2015) Support Vector Regression, pages 67–80. Apress, Berkeley, CA

  26. Ringeval Fabien, Schuller Björn, Valstar Michel, Gratch Jonathan, Cowie Roddy, Scherer Stefan, Mozgai Sharon, Cummins Nicholas, Schmitt Maximilian, Pantic Maja (2017) Avec 2017: Real-life depression, and affect recognition workshop and challenge. In Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge, pages 3–9

  27. Ma Xingchen, Yang Hongyu, Chen Qiang, Huang Di, Wang Yunhong (2016) Depaudionet: An efficient deep model for audio based depression classification. In Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge, pages 35–42

  28. Szegedy Christian, Ioffe Sergey, Vanhoucke Vincent, Alemi Alexander A (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. In Thirty-first AAAI conference on artificial intelligence

  29. Szegedy Christian, Vanhoucke Vincent, Ioffe Sergey, Shlens Jon, Wojna Zbigniew (2016) Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2818–2826

  30. Du Zhengyin, Li Weixin, Huang Di, Wang Yunhong (2018) Bipolar disorder recognition via multi-scale discriminative audio temporal representation. In Proceedings of the 2018 on Audio/Visual Emotion Challenge and Workshop, pages 23–30

  31. Xiaofen Xing, Bolun Cai, Yinhu Zhao, Shuzhen Li, Zhiwei He, and Weiquan Fan. Multi-modality hierarchical recall based on gbdts for bipolar disorder classification. In Proceedings of the 2018 on Audio/Visual Emotion Challenge and Workshop, pages 31–37, 2018

  32. Jerome H Friedman. Greedy function approximation: a gradient boosting machine. Annals of statistics, pages 1189–1232, 2001

  33. Zafi Sherhan Syed, Kirill Sidorov, and David Marshall. Automated screening for bipolar disorder from audio/visual modalities. In Proceedings of the 2018 on Audio/Visual Emotion Challenge and Workshop, pages 39–45, 2018

  34. Weiwei Zong, Guang-Bin Huang, Yiqiang Chen (2013) Weighted extreme learning machine for imbalance learning. Neurocomputing 101:229–242

    Article  Google Scholar 

  35. Ziheng Zhang, Weizhe Lin, Mingyu Liu, and Marwa Mahmoud. Multimodal deep learning framework for mental disorder recognition. In 2020 15th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2020). IEEE, 2020

  36. Orton Indigo JD (2020) Vision based body gesture meta features for affective computing. arXiv preprint arXiv:2003.00809,

  37. Robert C Young, Jeffery T Biggs, Veronika E Ziegler, and Dolores A Meyer. A rating scale for mania: reliability, validity and sensitivity. The British journal of psychiatry, 133(5):429–435, 1978

  38. Kurt Kroenke, Tara W Strine, Robert L Spitzer, Janet BW Williams, Joyce T Berry, and Ali H Mokdad. The phq-8 as a measure of current depression in the general population. Journal of affective disorders, 114(1-3):163–173, 2009

  39. Kurt Kroenke, Robert L Spitzer, and Janet BW Williams. The phq-9: validity of a brief depression severity measure. Journal of general internal medicine, 16(9):606–613, 2001

  40. Robert L Spitzer, Kurt Kroenke, Janet BW Williams, and Bernd Löwe. A brief measure for assessing generalized anxiety disorder: the gad-7. Archives of internal medicine, 166(10):1092–1097, 2006

  41. Benjamin Gierk, Sebastian Kohlmann, Kurt Kroenke, Lena Spangenberg, Markus Zenger, Elmar Brähler, Bernd Löwe (2014) The somatic symptom scale-8 (sss-8): a brief measure of somatic symptom burden. JAMA internal medicine 174(3):399–407

    Article  Google Scholar 

  42. Sheldon Cohen, T Kamarck, R Mermelstein, et al. Perceived stress scale. Measuring stress: A guide for health and social scientists, 10, 1994

  43. Hamdi Dibeklioğlu, Zakia Hammal, and Jeffrey F Cohn. Dynamic multimodal measurement of depression severity using deep autoencoding. IEEE journal of biomedical and health informatics, 22(2):525–536, 2017

  44. Nalini Ambady, Robert Rosenthal (1992) Thin slices of expressive behavior as predictors of interpersonal consequences: a meta-analysis. Psychol Bull 111(2):256

    Article  Google Scholar 

  45. Nalini Ambady and Heather M Gray. On being sad and mistaken: Mood effects on the accuracy of thin-slice judgments. Journal of personality and social psychology, 83(4):947, 2002

  46. Nalini Ambady, Mark Hallahan, Brett Conner (1999) Accuracy of judgments of sexual orientation from thin slices of behavior. J Pers Soc Psychol 77(3):538

    Article  Google Scholar 

  47. Jacqueline NW Friedman, Thomas F Oltmanns, and Eric Turkheimer. Interpersonal perception and personality disorders: Utilization of a thin slice approach. Journal of Research in Personality, 41(3):667–688, 2007

  48. Jorge Sánchez, Florent Perronnin, Thomas Mensink, Jakob Verbeek (2013) Image classification with the fisher vector: theory and practice. Int J Comput Vision 105(3):222–245

    Article  MathSciNet  Google Scholar 

  49. Florent Perronnin, Jorge Sánchez, and Thomas Mensink. Improving the fisher kernel for large-scale image classification. In European conference on computer vision, pages 143–156. Springer, 2010

  50. Quoc Le and Tomas Mikolov. Distributed representations of sentences and documents. In International conference on machine learning, pages 1188–1196, 2014

  51. Tadas Baltrusaitis, Amir Zadeh, Yao Chong Lim, and Louis-Philippe Morency. Openface 2.0: Facial behavior analysis toolkit. In 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), pages 59–66. IEEE, 2018

  52. Florian Eyben, Martin Wöllmer, and Björn Schuller. Opensmile: the munich versatile and fast open-source audio feature extractor. In Proceedings of the 18th ACM international conference on Multimedia, pages 1459–1462, 2010

  53. Jonathan T Foote. Content-based retrieval of music and audio. In Multimedia Storage and Archiving Systems II, volume 3229, pages 138–147. International Society for Optics and Photonics, 1997

  54. Beth Logan et al (2000) Mel frequency cepstral coefficients for music modeling. Ismir 270:1–11

    Google Scholar 

  55. Florian Eyben, Klaus R Scherer, Björn W Schuller, Johan Sundberg, Elisabeth André, Carlos Busso, Laurence Y Devillers, Julien Epps, Petri Laukka, Shrikanth S Narayanan, et al. The geneva minimalistic acoustic parameter set (gemaps) for voice research and affective computing. IEEE transactions on affective computing, 7(2):190–202, 2015

  56. Jey Han Lau and Timothy Baldwin. An empirical evaluation of doc2vec with practical insights into document embedding generation. arXiv preprint arXiv:1607.05368, 2016

  57. Weizhe Lin, Indigo Orton, Mingyu Liu, and Marwa Mahmoud. Automatic detection of self-adaptors for psychological distress. In 2020 15th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2020). IEEE, 2020

Download references

Funding

Part of this research is funded by King’s College Cambridge.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Francesco Ceccarelli.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Ethical approval

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ceccarelli, F., Mahmoud, M. Multimodal temporal machine learning for Bipolar Disorder and Depression Recognition. Pattern Anal Applic 25, 493–504 (2022). https://doi.org/10.1007/s10044-021-01001-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-021-01001-y

Keywords

Navigation