Skip to main content
Log in

Harnessing emotions for depression detection

  • Original Article
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

Human emotions using textual cues, speech patterns, and facial expressions can give insight into their mental state. Although there are several uni-modal datasets for emotion recognition, there are very few labeled datasets for multi-modal depression detection. Uni-modal emotion recognition datasets can be harnessed, using the technique of transfer learning, for multi-modal binary emotion detection through video, audio, and text. We propose emotion transfer for mood indication framework based on deep learning to address the task of binary classification of depression using a one-of-three scheme: If the prediction from the network for at least one modality is of the depressed class, we consider the final output as depressed. Such a scheme is beneficial since it will detect an abnormality in any of the modalities and will alert a user to seek help well in advance. Long short-term memory is used to combine the temporal aspects of the audio and the video modalities, and the context of the text. This is followed by fine-tuning the network on a binary dataset for depression detection that has been independently labeled by a standard questionnaire used by psychologists. Data augmentation techniques are used for the generalization of data and to resolve the class imbalance. Our experiments show that our method for binary depression classification (using an ensemble of three modalities) on the Distress Analysis Interview Corpus—Wizard of Oz dataset has higher accuracy in comparison with other benchmark methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Availability of data and material (data transparency)

We have used already available public data for this work.

Code availability (software application or custom code)

We have not released the code.

References

  1. Alberdi A, Aztiria A, Basarab A (2016) Towards an automatic early stress recognition system for office environments based on multimodal measurements: a review. J Biomed Inform 59:49–75

    Article  Google Scholar 

  2. Alizadeh S, Fazel A (2017) Convolutional neural networks for facial expression recognition. arXiv preprint arXiv:1704.06756

  3. Baltrušaitis T, Robinson P, Morency LP (2016) Openface: an open source facial behavior analysis toolkit. In: 2016 IEEE winter conference on applications of computer vision (WACV). IEEE, pp 1–10

  4. Berretti S, Del Bimbo A, Pala P, Amor BB, Daoudi M (2010) A set of selected sift features for 3d facial expression recognition. In: 20th International conference on pattern recognition (ICPR). IEEE, pp 4125–4128

  5. Busso C, Bulut M, Lee CC, Kazemzadeh A, Mower E, Kim S, Chang JN, Lee S, Narayanan SS (2008) IEMOCAP: interactive emotional dyadic motion capture database. Lang Resour Eval 42(4):335

    Article  Google Scholar 

  6. Correia J, Trancoso I, Raj B (2016) Detecting psychological distress in adults through transcriptions of clinical interviews. In: International conference on advances in speech and language technologies for Iberian languages. Springer, pp 162–171

  7. Cugu I, Sener E, Akbas E (2019) Microexpnet: an extremely small and fast model for expression recognition from face images. In: International conference on image processing theory, tools and applications (IPTA). IEEE, pp 1–6

  8. Danisman T, Alpkocak A (2008) Feeler: emotion classification of text using vector space model. In: AISB 2008 convention communication, interaction and social intelligence, vol 1, p 53

  9. De Silva LC, Miyasato T, Nakatsu R (1997) Facial emotion recognition using multi-modal information. In: Proceedings of the international conference on information, communications and signal processing, ICICS, vol 1. IEEE, pp 397–401

  10. Degottex G, Kane J, Drugman T, Raitio T, Scherer S (2014) COVAREP—a collaborative voice analysis repository for speech technologies. In: IEEE Conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 960–964

  11. Dhamija S, Boult TE (2017) Exploring contextual engagement for trauma recovery. In: IEEE computer vision and pattern recognition workshops (CVPRW). IEEE, pp 2267–2277

  12. Ekman P, Friesen WV (1976) Measuring facial movement. Environ Psychol Nonverb Behav 1(1):56–75

    Article  Google Scholar 

  13. Giannakakis G, Pediaditis M, Manousos D, Kazantzaki E, Chiarugi F, Simos PG, Marias K, Tsiknakis M (2017) Stress and anxiety detection using facial cues from videos. Biomed Signal Process Control 31:89–101

    Article  Google Scholar 

  14. Girard JM, Cohn JF, De la Torre F (2015) Estimating smile intensity: a better way. Pattern Recognit Lett 66:13–21

    Article  Google Scholar 

  15. Gratch J, Artstein R, Lucas GM, Stratou G, Scherer S, Nazarian A, Wood R, Boberg J, DeVault D, Marsella S, et al (2014) The distress analysis interview corpus of human and computer interviews. In: LREC. Citeseer, pp 3123–3128

  16. Hasani B, Mahoor MH (2017) Facial expression recognition using enhanced deep 3d convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 30–40

  17. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778

  18. He L, Cao C (2018) Automated depression analysis using convolutional neural networks from speech. J Biomed Inform 83:103–111

    Article  Google Scholar 

  19. Hosseini S, Lee SH, Cho NI (2018) Feeding hand-crafted features for enhancing the performance of convolutional neural networks. arXiv preprint arXiv:1801.07848

  20. Ko BC (2018) A brief review of facial emotion recognition based on visual information. Sensors 18(2):401

    Article  Google Scholar 

  21. Lopez-Otero P, Docio-Fernandez L, Garcia-Mateo C (2015) Assessing speaker independence on a speech-based depression level estimation system. Pattern Recognit Lett 68:343–350

    Article  Google Scholar 

  22. Lopez-Otero P, Fernández LD, Abad A, Garcia-Mateo C (2017) Depression detection using automatic transcriptions of de-identified speech. In: INTERSPEECH, pp 3157–3161

  23. Lucey P, Cohn JF, Kanade T, Saragih J, Ambadar Z, Matthews I (2010) The extended cohn-kanade dataset (ck+): a complete dataset for action unit and emotion-specified expression. In: IEEE computer vision and pattern recognition workshops (CVPRW). IEEE, pp 94–101

  24. Medhat W, Hassan A, Korashy H (2014) Sentiment analysis algorithms and applications: a survey. Ain Shams Eng J 5(4):1093–1113

    Article  Google Scholar 

  25. Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119

  26. Mollahosseini A, Chan D, Mahoor MH (2016) Going deeper in facial expression recognition using deep neural networks. In: 2016 IEEE winter conference on applications of computer vision (WACV). IEEE, pp 1–10

  27. Mollahosseini A, Hasani B, Mahoor MH (2017) Affectnet: a database for facial expression, valence, and arousal computing in the wild. arXiv preprint arXiv:1708.03985

  28. Mollahosseini A, Hasani B, Salvador MJ, Abdollahi H, Chan D, Mahoor MH (2016) Facial expression recognition from world wild web. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 58–65

  29. Ortega JD, Senoussaoui M, Granger E, Pedersoli M, Cardinal P, Koerich AL (2019) Multimodal fusion with deep neural networks for audio-video emotion recognition. arXiv preprint arXiv:1907.03196

  30. Pampouchidou A, Marias K, Tsiknakis M, Simos P, Yang F, Meriaudeau F (2015) Designing a framework for assisting depression severity assessment from facial image analysis. In: IEEE international conference on signal and image processing applications (ICSIPA). IEEE, pp 578–583

  31. Poria S, Cambria E, Gelbukh A (2016) Aspect extraction for opinion mining with a deep convolutional neural network. Knowl Based Syst 108:42–49

    Article  Google Scholar 

  32. Poria S, Cambria E, Hazarika D, Vij P (2016) A deeper look into sarcastic tweets using deep convolutional neural networks. arXiv preprint arXiv:1610.08815

  33. Poria S, Chaturvedi I, Cambria E, Hussain A (2016) Convolutional MKL based multimodal emotion recognition and sentiment analysis. In: 2016 IEEE 16th international conference on data mining (ICDM). IEEE, pp 439–448

  34. Qureshi SA, Saha S, Hasanuzzaman M, Dias G (2019) Multitask representation learning for multimodal estimation of depression level. IEEE Intell Syst 34(5):45–52

    Article  Google Scholar 

  35. Ray A, Kumar S, Reddy R, Mukherjee P, Garg R (2019) Multi-level attention network using text, audio and video for depression prediction. In: Proceedings of the 9th international on audio/visual emotion challenge and workshop, pp 81–88

  36. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

  37. Strapparava C, Mihalcea R (2007) Semeval-2007 task 14: affective text. In: Proceedings of the fourth international workshop on semantic evaluations (SemEval-2007), pp 70–74

  38. Stratou G, Morency LP (2017) Multisense-context-aware nonverbal behavior analysis framework: a psychological distress use case. IEEE Trans Affect Comput 8(2):190–203

    Article  Google Scholar 

  39. Taboada M, Brooke J, Tofiloski M, Voll K, Stede M (2011) Lexicon-based methods for sentiment analysis. Comput Linguist 37(2):267–307

    Article  Google Scholar 

  40. Tarnowski P, Kołodziej M, Majkowski A, Rak RJ (2017) Emotion recognition using facial expressions. Procedia Comput Sci 108:1175–1184

    Article  Google Scholar 

  41. Thomas B, Vinod P, Dhanya K (2014) Multiclass emotion extraction from sentences. Int J Sci Eng Res (IJSER) 5(2):12–15

    Google Scholar 

  42. Tyagi D, Verma A, Sharma S (2017) An improved method for facial expression recognition using hybrid approach of CLBP and Gabor filter. In: 2017 International conference on computing, communication and automation (ICCCA). IEEE, pp 1019–1024

  43. Tzirakis P, Trigeorgis G, Nicolaou MA, Schuller BW, Zafeiriou S (2017) End-to-end multimodal emotion recognition using deep neural networks. IEEE J Sel Top Signal Process 11(8):1301–1309

    Article  Google Scholar 

  44. Won TTD, Won CS (2019) Facial action units for training convolutional neural networks. IEEE Access 7:77816–77824

    Article  Google Scholar 

  45. Yang L, Sahli H, Xia X, Pei E, Oveneke MC, Jiang D (2017) Hybrid depression classification and estimation from audio video and text information. In: Proceedings of the 7th annual workshop on audio/visual emotion challenge, pp 45–51

Download references

Acknowledgements

We thank the reviewers for their detailed comments, which have greatly enhanced the presentation of the paper.

Funding

There was no external funding received for this work.

Author information

Authors and Affiliations

Authors

Contributions

All the authors have contributed to this work in the order in which their names are mentioned.

Corresponding author

Correspondence to Sahana Prabhu.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Prabhu, S., Mittal, H., Varagani, R. et al. Harnessing emotions for depression detection. Pattern Anal Applic 25, 537–547 (2022). https://doi.org/10.1007/s10044-021-01020-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-021-01020-9

Keywords

Navigation