Abstract
This article proposes a novel study on personality recognition using video data from different scenarios. Our goal is to jointly model nonverbal behavioral cues with contextual information for a robust, multi-scenario, personality recognition system. Therefore, we build a novel multi-stream Convolutional Neural Network (CNN) framework, which considers multiple sources of information. From a given scenario, we extract spatio-temporal motion descriptors from every individual in the scene, spatio-temporal motion descriptors encoding social group dynamics, and proxemics descriptors to encode the interaction with the surrounding context. All the proposed descriptors are mapped to the same feature space facilitating the overall learning effort. Experiments on two public datasets demonstrate the effectiveness of jointly modeling the mutual Person-Context information, outperforming the state-of-the art-results for personality recognition in two different scenarios. Last, we present CNN class activation maps for each personality trait, shedding light on behavioral patterns linked with personality attributes.
- Henri Achten. 2013. Buildings with an attitude. In -Proceedings of the 31st eCAADe Conference on Computation and Performance, R. Stouffs and S. Andsariyildiz (eds.), Vol. 1. 477--485.Google Scholar
- Xavier Alameda-Pineda, Jacopo Staiano, Ramanathan Subramanian, Ligia Batrinca, Elisa Ricci, Bruno Lepri, Oswald Lanz, and Nicu Sebe. 2016. Salsa: A novel dataset for multimodal group behavior analysis. IEEE Trans. Pattern Anal. Mach. Intell. 38, 8 (2016), 1707--1720.Google ScholarCross Ref
- Xavier Alameda-Pineda, Yan Yan, Elisa Ricci, Oswald Lanz, and Nicu Sebe. 2015. Analyzing free-standing conversational groups: A multimodal approach. In Proceedings of the 23rd ACM International Conference on Multimedia. ACM, 5--14. Google ScholarDigital Library
- Sharifa Alghowinem, Roland Goecke, Michael Wagner, Julien Epps, Matthew Hyett, Gordon Parker, and Michael Breakspear. 2016. Multimodal depression detection: Fusion analysis of paralinguistic, head pose and eye gaze behaviors. IEEE Trans. Affect. Comput. 9.4 (2016), 478--490.Google Scholar
- Timur M. Bagautdinov, Alexandre Alahi, François Fleuret, Pascal Fua, and Silvio Savarese. 2017. Social scene understanding: End-to-end multi-person action localization and collective activity recognition. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’17). 3425--3434.Google ScholarCross Ref
- Jeffrey D. Banfield and Adrian E. Raftery. 1993. Model-based gaussian and non-gaussian clustering. Biometrics 49.3 (1993), 803--821.Google Scholar
- Cigdem Beyan, Muhammad Shahid, and Vittorio Murino. 2018. Investigation of small group social interactions using deep visual activity-based nonverbal features. In Proceedings of the ACM Multimedia Conference on Multimedia Conference. ACM, 311--319. Google ScholarDigital Library
- Sovan Biswas and Juergen Gall. 2018. Structural recurrent neural network (SRNN) for group activity analysis. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV’18). IEEE, 1625--1632.Google ScholarCross Ref
- Jack Block and Jeanne H. Block. 2014. The role of ego-control and ego-resiliency in the organization of behavior. In Development of Cognition, Affect, and Social Relations. Psychology Press, 49--112.Google Scholar
- Kevin W. Bowyer. 2004. Face recognition technology: Security versus privacy. IEEE Technol. Soc. Mag. 23, 1 (2004), 9--19.Google ScholarCross Ref
- G. Bradski. 2000. The opencv library. Dr. Dobb’s J. Softw. Tools (2000).Google Scholar
- Zhe Cao, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2017. Realtime multi-person 2d pose estimation using part affinity fields. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7291--7299.Google ScholarCross Ref
- Oya Celiktutan, Efstratios Skordos, and Hatice Gunes. 2017. Multimodal human-human-robot interactions (mhhri) dataset for studying personality and engagement. IEEE Trans. Affect. Comput. (2017).Google Scholar
- Florence Corpet. 1988. Multiple sequence alignment with hierarchical clustering. Nucleic Acids Res. 16, 22 (1988), 10881--10890.Google ScholarCross Ref
- Marco Cristani, Loris Bazzani, Giulia Paggetti, Andrea Fossati, Diego Tosato, Alessio Del Bue, Gloria Menegaz, and Vittorio Murino. 2011. Social interaction discovery by statistical analysis of f-formations. In Proceedings of the British Machine Vision Conference (BMVC’11), Vol. 2. 4.Google ScholarCross Ref
- Marco Cristani, Vittorio Murino, and Alessandro Vinciarelli. 2010. Socially intelligent surveillance and monitoring: Analysing social dimensions of physical space. In Proceedings of the IEEE Computer Vision and Pattern Recognition Workshops (CVPRW’10). IEEE, 51--58.Google ScholarCross Ref
- Marco Cristani, Giulia Paggetti, Alessandro Vinciarelli, Loris Bazzani, Gloria Menegaz, and Vittorio Murino. 2011. Towards computational proxemics: Inferring social relations from interpersonal distances. In Proceedings of the IEEE 3rd International Conference on Privacy, Security, Risk and Trust (PASSAT’11) and IEEE 3rd International Conference on Social Computing (SocialCom’11). IEEE, 290--297.Google ScholarCross Ref
- Dario Dotti, Mirela Popa, and Stylianos Asteriadis. 2018. Behavior and personality analysis in a nonsocial context dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2354--2362.Google ScholarCross Ref
- Nicholas Epley, Adam Waytz, and John T. Cacioppo. 2007. On seeing human: A three-factor theory of anthropomorphism.Psychol. Rev. 114, 4 (2007), 864.Google ScholarCross Ref
- Yağmur Güçlütürk, Umut Güçlü, Xavier Baro, Hugo Jair Escalante, Isabelle Guyon, Sergio Escalera, Marcel A. J. Van Gerven, and Rob Van Lier. 2017. Multimodal first impression analysis with deep residual networks. IEEE Trans. Affect. Comput. 9.3 (2017), 316--329. Google ScholarDigital Library
- Wilfred J. Hansen. 1971. User engineering principles for interactive systems. In Proceedings of the Fall Joint Computer Conference. ACM, 523--532. Google ScholarDigital Library
- Ann Hutchinson. 1954. Labanotation. J. Aesthet. Art Crit. 13, 2 (1954), 276--277.Google ScholarCross Ref
- Oliver P. John and Sanjay Srivastava. 1999. The big five trait taxonomy: History, measurement, and theoretical perspectives. Handbook Personal. Theory Res. 2, 1999 (1999), 102--138.Google Scholar
- Qiuhong Ke, Mohammed Bennamoun, Senjian An, Ferdous Sohel, and Farid Boussaid. 2017. A new representation of skeleton sequences for 3D action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). IEEE, 4570--4579.Google ScholarCross Ref
- Markus Koppensteiner. 2013. Motion cues that make an impression: Predicting perceived personality by minimal motion information. J. Exper. Soc. Psychol. 49, 6 (2013), 1137--1143.Google ScholarCross Ref
- Shun Lau and Youyan Nie. 2008. Interplay between personal goals and classroom goal structures in predicting student outcomes: A multilevel analysis of person-context interactions.J. Edu. Psycholo. 100, 1 (2008), 15.Google Scholar
- Yun-Shao Lin and Chi-Chun Lee. 2018. Using interlocutor-modulated attention BLSTM to predict personality traits in small group interaction. In Proceedings of the International Conference on Multimodal Interaction. ACM, 163--169. Google ScholarDigital Library
- Jian Liu, Naveed Akhtar, and Ajmal Mian. 2019. Skepxels: Spatio-temporal image representation of human skeleton joints for action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Computer Vision Foundation, 2019.Google Scholar
- Yu-En Lu, Sam Roberts, Pietro Lio, Robin Dunbar, and Jon Crowcroft. 2009. Size matters: Variation in personal network size, personality and effect on information transmission. In Proceedings of the International Conference on Computational Science and Engineering (CSE’09), Vol. 4. IEEE, 188--193. Google ScholarDigital Library
- François Mairesse and Marilyn A. Walker. 2010. Towards personality-based user adaptation: Psychologically informed stylistic language generation. User Model. User-Adapt. Interact. 20, 3 (2010), 227--278. Google ScholarDigital Library
- Marcin Marszalek, Ivan Laptev, and Cordelia Schmid. 2009. Actions in context. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’09). IEEE, 2929--2936.Google ScholarCross Ref
- Christopher McCarty and H. D. Green. 2005. Personality and personal networks. In Proceedings of the 25th International Sunbelt Social Network Conference (Sunbelt’05).Google Scholar
- Robert R. McCrae and Oliver P. John. 1992. An introduction to the five-factor model and its applications. J. Personal. 60, 2 (1992), 175--215.Google ScholarCross Ref
- Juan Abdon Miranda-Correa, Mojtaba Khomami Abadi, Nicu Sebe, and Ioannis Patras. 2018. AMIGOS: A dataset for affect, personality and mood research on individuals and groups. IEEE Trans. Affect. Comput. (2018).Google Scholar
- Hossein Mousavi, Sadegh Mohammadi, Alessandro Perina, Ryad Chellali, and Vittorio Mur. 2015. Analyzing tracklets for the detection of abnormal crowd behavior. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV’15). IEEE, 148--155. Google ScholarDigital Library
- Vinod Nair and Geoffrey E. Hinton. 2010. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning (ICML’10). 807--814. Google ScholarDigital Library
- Víctor Ponce-López, Baiyu Chen, Marc Oliu, Ciprian Corneanu, Albert Clapés, Isabelle Guyon, Xavier Baró, Hugo Jair Escalante, and Sergio Escalera. 2016. Chalearn lap 2016: First round challenge on first impressions-dataset and results. In Proceedings of the European Conference on Computer Vision. Springer, 400--418.Google ScholarCross Ref
- Beatrice Rammstedt and Oliver P. John. 2007. Measuring personality in one minute or less: A 10-item short version of the big five inventory in english and german. J. Res. Personal. 41, 1 (2007), 203--212.Google ScholarCross Ref
- Kamrad Khoshhal Roudposhti and Jorge Dias. 2013. Probabilistic human interaction understanding: Exploring relationship between human body motion and the environmental context. Pattern Recogn. Lett. 34, 7 (2013), 820--830. Google ScholarDigital Library
- Kamrad Khoshhal Roudposhti, Urbano Nunes, and Jorge Dias. 2016. Probabilistic social behavior analysis by exploring body motion-based patterns. IEEE Trans. Pattern Anal. Mach. Intell. 38, 8 (2016), 1679--1691.Google ScholarCross Ref
- Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. 2015. Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 3 (2015), 211--252. Google ScholarDigital Library
- Dairazalia Sanchez-Cortes, Oya Aran, Marianne Schmid Mast, and Daniel Gatica-Perez. 2011. A nonverbal behavior approach to identify emergent leaders in small groups. IEEE Trans. Multimedia 14, 3 (2011), 816--832. Google ScholarDigital Library
- David P. Schmitt, Jüri Allik, Robert R. McCrae, and Verónica Benet-Martínez. 2007. The geographic distribution of big five personality traits: Patterns and profiles of human self-description across 56 nations. J. Cross-cultur. Psychol. 38, 2 (2007), 173--212.Google ScholarCross Ref
- Karen Simonyan and Andrew Zisserman. 2014. Two-stream convolutional networks for action recognition in videos. In Advances in Neural Information Processing Systems. MIT Press, 568--576. Google ScholarDigital Library
- Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. Arxiv Preprint Arxiv:1409.1556.Google Scholar
- Cees G. M. Snoek, Marcel Worring, and Arnold W. M. Smeulders. 2005. Early versus late fusion in semantic video analysis. In Proceedings of the 13th Annual ACM International Conference on Multimedia. ACM, 399--402. Google ScholarDigital Library
- Mark Snyder, Jeffry A. Simpson, and Steve Gangestad. 1986. Personality and sexual relations.J. Personal. Soc. Psychol. 51, 1 (1986), 181.Google ScholarCross Ref
- Adriana Tapus, Cristian Tapus, and Maja J Mataric. 2007. Hands-off therapist robot behavior adaptation to user personality for post-stroke rehabilitation therapy. In Proceedings of the IEEE International Conference on Robotics and Automation. IEEE, 1547--1553.Google ScholarCross Ref
- Alessandro Vinciarelli and Gelareh Mohammadi. 2014. A survey of personality computing. IEEE Trans. Affect. Comput. 5, 3 (2014), 273--291.Google ScholarCross Ref
- D. Wang, B. Subagdja, Y. Kang, A. H. Tan, and D. Zhang. 2014. Towards intelligent caring agents for aging-in-place: Issues and challenges. In Proceedings of the IEEE Symposium on Computational Intelligence for Human-like Intelligence. IEEE Computer Society, 1--8.Google Scholar
- Minsi Wang, Bingbing Ni, and Xiaokang Yang. 2017. Recurrent modeling of interaction context for collective activity recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17).Google ScholarCross Ref
- P. Wang, W. Li, P. Ogunbona, J. Wan, and S. Escalera. 2018. RGB-D-based human motion recognition with deep learning: A survey. Comput. Vision Image Understand. 171 (2018), 118--139.Google ScholarCross Ref
- Weichen Wang, Gabriella M. Harari, Rui Wang, Sandrine R. Müller, Shayan Mirjafari, Kizito Masaba, and Andrew T. Campbell. 2018. Sensing behavioral change over time: Using within-person variability features from mobile sensing to predict personality traits. Proc. ACM Interact. Mobile Wear. Ubiq. Technol. 2, 3 (2018), 141. Google ScholarDigital Library
- Xiu-Shen Wei, Chen-Lin Zhang, Hao Zhang, and Jianxin Wu. 2018. Deep bimodal regression of apparent personality traits from short video sequences. IEEE Trans. Affect. Comput. 9, 3 (2018), 303--315. Google ScholarDigital Library
- Daniel Weinland, Remi Ronfard, and Edmond Boyer. 2006. Free viewpoint action recognition using motion history volumes. Comput. Vis. Image Understand. 104, 2--3 (2006), 249--257. Google ScholarDigital Library
- Yanna J. Weisberg, Colin G. DeYoung, and Jacob B. Hirsh. 2011. Gender differences in personality across the ten aspects of the big five. Front. Psychology 2 (2011), 178.Google ScholarCross Ref
- Svante Wold, Kim Esbensen, and Paul Geladi. 1987. Principal component analysis. Chemo. Intell. Lab. Syst. 2, 1--3 (1987), 37--52.Google ScholarCross Ref
- Bangpeng Yao and Li Fei-Fei. 2012. Recognizing human-object interactions in still images by modeling the mutual context of objects and human poses. IEEE Trans. Pattern Anal. Mach. Intell. 34, 9 (2012), 1691--1703. Google ScholarDigital Library
- Shuai Yi, Hongsheng Li, and Xiaogang Wang. 2016. Pedestrian behavior modeling from stationary crowds with applications to intelligent surveillance. IEEE Trans. Image Process. 25, 9 (2016), 4354--4368.Google ScholarCross Ref
- Gloria Zen, Bruno Lepri, Elisa Ricci, and Oswald Lanz. 2010. Space speaks: Towards socially and personality aware visual surveillance. In Proceedings of the 1st ACM International Workshop on Multimodal Pervasive Video Analysis. ACM, 37--42. Google ScholarDigital Library
- Dingwen Zhang, Guangyu Guo, Dong Huang, and Junwei Han. 2018. PoseFlow: A deep motion representation for understanding human behaviors in videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6762--6770.Google ScholarCross Ref
- Le Zhang, Songyou Peng, and Stefan Winkler. 2018. PersEmoN: A deep network for joint analysis of apparent personality, emotion and their relationship. Arxiv Preprint Arxiv:1811.08657.Google Scholar
- Lei Zhao, Qinghua Hu, and Yucan Zhou. 2015. Heterogeneous features integration via semi-supervised multi-modal deep networks. In Proceedings of the International Conference on Neural Information Processing. Springer, 11--19.Google ScholarCross Ref
- Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba. 2016. Learning deep features for discriminative localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2921--2929.Google ScholarCross Ref
Index Terms
- Being the Center of Attention: A Person-Context CNN Framework for Personality Recognition
Recommendations
Look! Who's Talking?: Projection of Extraversion Across Different Social Contexts
WCPR '14: Proceedings of the 2014 ACM Multi Media on Workshop on Computational Personality RecognitionAutomatic classification of personality from language depends upon large quantities of relevant training data, which raises two potential problems. First, collecting personality information from the author or speaker can be invasive and expensive, ...
The role of personality in shaping social networks and mediating behavioral change
In this paper, we exploit different facets of the Friends and Family study to deal with two personality-related tasks of paramount importance for the user modeling and ubiquitous computing fields. First, we propose and validate an approach for automatic ...
Implicit User-centric Personality Recognition Based on Physiological Responses to Emotional Videos
ICMI '15: Proceedings of the 2015 ACM on International Conference on Multimodal InteractionWe present a novel framework for recognizing personality traits based on users' physiological responses to affective movie clips. Extending studies that have correlated explicit/implicit affective user responses with Extraversion and Neuroticism traits, ...
Comments