Multi-modal aggression identification using Convolutional Neural Network and Binary Particle Swarm Optimization

https://doi.org/10.1016/j.future.2021.01.014Get rights and content

Highlights

  • Automatic feature extraction from text and images using deep learning.

  • Feature reduction using Binary Particle Swarm Optimization algorithm.

  • Classification of comments into high-aggressive, medium-aggressive and non-aggressive.

Abstract

Aggressive posts containing symbolic and offencive images, inappropriate gestures along with provocative textual comments are growing exponentially in social media with the availability of inexpensive data services. These posts have numerous negative impacts on the reader and need an immediate technical solution to filter out aggressive comments. This paper presents a model based on a Convolutional Neural Network (CNN) and Binary Particle Swarm Optimization (BPSO) to classify the social media posts containing images with associated textual comments into non-aggressive, medium-aggressive and high-aggressive classes. A dataset containing symbolic images and the corresponding textual comments was created to validate the proposed model. The framework employs a pre-trained VGG-16 to extract the image features and a three-layered CNN to extract the textual features in parallel. The hybrid feature set obtained by concatenating the image and the text features were optimized using the BPSO algorithm to extract the more relevant features. The proposed model with optimized features and Random Forest classifier achieves a weighted F1-Score of 0.74, an improvement of around 3% over unoptimized features.

Introduction

Social media networks, such as Facebook,2 Instagram3 and Vine,4 are platforms to share opinions, ideas and information. These platforms help businesses to grow by spreading information about their products and services in a relatively short time. Government agencies are using them as a feedback mechanism in making their policies and regulations. Social media is also helping the economy by providing a potential pathway for sustainable societies to assure its citizen’s equality, freedom and a healthy standard of living [1], [2].

Along with these positive usages to help economic growth and societal development, social media is also impacting our society negatively by spreading hate speech, fake news, negativity about government orders, anti-national activities, defamatory postings and so on. These activities have grown exponentially in the past few years [3]. The circulation of offencive and unacceptable comments on social media is a massive threat to our society. Cyber-aggression [4], hate speech, cyberstalking and cyberbullying [5] are among the most disturbing barriers to a sustainable society. The need of the hour to ensure the flourishing of an open society is to find out an appropriate way to respond to such materials without enforcing strict censorship.

Cyber-aggression is characterized as hostile or violent behaviour with the aim of harming others by using electronic media. It comprises sending, posting or sharing threatening, negative or nasty information about an individual or a group causing character assassination, humiliation, emotional stress, depression, anxiety and suicidal thoughts to the victim or victims. Such aggression occurs in many forms including textual aggression (instant messaging, e-mail, chatting), verbal aggression (verbal posts, phone calls) and visual aggression (sending, posting or sharing embarrassing videos or images). Although cyber-aggression can affect any age group of social networking users, adolescents and youths are the most affected groups. Recent studies have concluded that teens generally make frequent use of online sites for video and image sharing (e.g., Instagram and Vine) and are more vulnerable to these behaviours [6]. The visual contents (video and image), accounting for more than 70% of all Internet traffic,5 is making cyber-aggression more chaotic and damaging [7].

The severity of the problem requires immediate technical attention given that manual monitoring is not practically scalable and also very time-consuming. Hence, it is necessary to develop automated tools to detect these kinds of aggression in the very first instance to minimize mental and physical health problems of Internet users [8].

Most of the earlier works [4], [9], [10], [11], [12], [13] to distinguish between aggressive and non-aggressive posts were concentrated on the text content of posts whereas the posts also very often contain images along with text [14]. A few recent works [14], [15], [16] included images with text to identify the cases of cyberbullying. The images and text of the comments together with the user features like the number of followers and followees were used by Hosseinmardi et al. [15] to predict cyberbullying instances on the Instagram network. Another model to identify the cases of cyberbullying was proposed by Singh et al. [16] using visual and text characteristics for posts on the Instagram network. Kumari et al. [17] presented a model to detect cyber-aggressive posts using symbolic images. They considered images only and ignored the textual part of the post.

To the best of our knowledge symbolic images together with the text of the post were not considered in any of the earlier works for the detection of cyber-aggression. The current work concentrated on the identification of multi-modal aggressive posts containing symbolic images and associated comments. To achieve this task, we created a multi-modal dataset containing 3600 images with associated comments.

We utilized the pre-trained VGG-16 [18] to extract features from images and three-layers CNN to extract features from the text of a post. The BPSO algorithm was used to optimize the features space by eliminating redundant features. Then we applied the Random Forest classifier on optimized features to classify the multi-modal cyber-aggressive posts. Our major contributions can be summarized as:

  • We created a multi-modal cyber-aggressive dataset containing symbolic images with composed text comments. From this dataset, we found a peculiar case in which the image and the text of a post separately appear non-aggressive but together they make the post highly aggressive as shown in Fig. 1a.

  • We proposed a hybrid model to extract features from text and symbolic images in parallel to get combined features for the multi-modal posts.

  • We employed the BPSO algorithm to reduce the features space to improve the performance of classification. The proposed work improves performance by 3% compared to the unoptimized feature set.

The rest of the paper is organized as follows. In Section 2, the related works are discussed. The proposed framework for the detection of cyber-aggression is presented in Section 3. The findings of the current system are illustrated in Section 4. In Section 5, a discussion about the findings is provided. Finally, the article is concluded in Section 6 by outlining some future research directions.

Section snippets

Related works

Automatic detection and prevention of cyber-aggressive posts have attracted a lot of attention in recent years [9], [17], [19], [20], [21], [22], [23]. Burnap and Williams [21] presented an ensemble learning-based solution to identify hate-related tweets. They used ensembles of three popular classifiers: (i) Logistic Regression, (ii) Support Vector Machine (SVM) and (iii) Random Forest with n-gram text features to get an F1-Score of 0.77. Al-garadi et al. [24] also employed four classifiers:

Methodology

This section presents a model for automatic cyber-aggression detection of multi-modal social media posts. In the following subsection, we describe details of collection, labelling and description of our datasets. The proposed model is described next in Section 3.2.

Results

Results obtained from the proposed model are presented in this section. The results are grouped into three subsections (i) Selection of pre-trained deep neural network model, (ii) Selection of the size of features from image and text and (iii) BPSO feature selection and classification for better presentation. To evaluate the performance of the current system, we have used three different performance metrics: Precision (P), Recall (R) and F1-Score. These performance metrics for High-aggressive

Discussion and implication

The major finding of current research is that BPSO is able to optimize feature space and this improves the classification performance with features obtained from the deep neural network model. Another major finding is that VGG-16 is a better model for extracting features from symbolic images. The next finding of this research is that a 512 sized image feature vector obtained by using CNN and 512 sized text feature vector jointly performed better to distinguish Non-aggressive, Medium-aggressive

Conclusion

Social media is affecting our society badly through hatespeech, cyber-aggression and cyberbullying. To control and minimize the spread of online aggressive comments, the current research presents a hybrid model to detect aggressive posts containing images and text on social media. The proposed system used VGG-16 and CNN for the extraction of the features from image and text, respectively. The model also extracts the optimized feature set from hybrid features of images and text using the Binary

CRediT authorship contribution statement

Kirti Kumari: Experimentations, Drafting of manuscript. Jyoti Prakash Singh: Reviewing, Editing, Finalizing the manuscript, Helped in experimentation. Yogesh K. Dwivedi: Helped in formulation of idea and problem statement. Nripendra P. Rana: Helped in formulation of idea and problem statement.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgement

The first author would like to acknowledge the Ministry of Electronics and Information Technology (MeitY), Government of India , for financial support during the research work through Visvesvaraya Ph.D. Scheme for Electronics and IT.

Kirti Kumari did her B. Tech. in Computer Science and Engineering from Bhagalpur College of Engineering in 2011. She did her M. Tech. in Computer Science and Engineering from M. S. Ramaiah Institute of Technology, Bangalore, University of Visvesvaraya Technological University, Belgaum in 2014. She is currently pursuing her Ph.D. in the department of Computer Science and Engineering, National Institute of Technology Patna, India. She has 6 research publications in journals and international

References (47)

  • PaterJ.A. et al.

    This digital life: A neighborhood-based study of adolescents’ lives online

  • KornblumJ.

    Cyberbullying grows bigger and meaner with photos, video, USA Today, dated July 17, 2008

    (2008)
  • K. Raiyani, T. Gonçalves, P. Quaresma, V.B. Nogueira, Fully connected neural network with advance preprocessor to...
  • S. Modha, P. Majumder, T. Mandl, Filtering aggression from the multilingual social media feed, in: Proceedings of the...
  • N.S. Samghabadi, D. Mave, S. Kar, T. Solorio, Ritual-uh at TRAC 2018 shared task: Aggression identification, in:...
  • I. Arroyo-Fernández, D. Forest, J.-M. Torres-Moreno, M. Carrasco-Ruiz, T. Legeleux, K. Joannette, Cyberbullying...
  • J. Risch, R. Krestel, Aggression identification using deep learning and data augmentation, in: Proceedings of the First...
  • KumariK. et al.

    Towards cyberbullying-free social media in smart cities: a unified multi-modal approach

    Soft. Comput.

    (2020)
  • HosseinmardiH. et al.

    Prediction of Cyberbullying incidents in a media-based social network

  • SinghV.K. et al.

    Toward multi-modal Cyberbullying detection

  • KumariK. et al.

    Aggressive social media post detection system containing symbolic images

  • K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, arXiv preprint...
  • K. Kumari, J.P. Singh, AI_ML_NIT_Patna @ TRAC - 2: Deep learning approach for multi-lingual aggression identification,...
  • Cited by (0)

    Kirti Kumari did her B. Tech. in Computer Science and Engineering from Bhagalpur College of Engineering in 2011. She did her M. Tech. in Computer Science and Engineering from M. S. Ramaiah Institute of Technology, Bangalore, University of Visvesvaraya Technological University, Belgaum in 2014. She is currently pursuing her Ph.D. in the department of Computer Science and Engineering, National Institute of Technology Patna, India. She has 6 research publications in journals and international conferences. Her research interest includes data analytics, machine learning, natural language of processing, text mining, deep learning and mobile social networks. She is the student member of IEEE

    Jyoti Prakash Singh is an assistant professor in the department of Computer science and Engineering in National Institute of Technology Patna. He did his B.Tech. in Computer Science and Technology and M. Tech. in Information Technology in 2000 and 2005 respectively. He completed his Ph.D. in 2015 from University of Calcutta. He has co-authored seven books in the area of C programming, Data Structures, Operating systems and Ad Hoc Networks. Apart from this, he has around 25 international journal publications and more than 40 international conference proceedings. His research interests include text mining, deep learning, social network and information security. He is a senior member of IEEE, Life member of Computer Society of India (CSI) and Indian Society of Technical Society (ISTE), member of ACM. He is associate Editor of International Journal of Electronic Government Research (IJEGR).

    Yogesh K. Dwivedi is a Professor of Digital Marketing and Innovation, and Director of the Emerging Markets Research Centre (EMaRC) in the School of Management at Swansea University, Wales, UK. His research interests are in the area of Information Systems (IS) including digital and social media marketing particularly in the context of emerging markets. He has published more than 250 articles in a range of leading academic journals and conferences. He has co-edited/co-authored more than 20 books; acted as co-editor of fifteen journal special issues; organized tracks, mini-tracks and panels in leading conferences; and served as programme co-chair of 2013 IFIP WG 8.6 Conference on Grand Successes and Failures in IT: Public and Private Sectors and Conference Chair of IFIP WG 6.11 I3E2016 Conference on Social Media: The Good, the Bad, and the Ugly. He is chief editor of International journal of Information management, an Associate Editor of European Journal of Marketing and Government Information Quarterly and Senior Editor of Journal of Electronic Commerce Research.

    Nripendra P. Rana is a Professor in Digital Marketing and Head of International Business, Marketing and Branding at the School of Management at University of Bradford, UK. His current research interests focus primarily on adoption and diffusion of emerging ICTs and digital and social media marketing. He has published more than 160 articles in a range of leading academic journals and conferences. He has co-edited five books on digital and social media marketing, emerging markets and supply and operations management. He has also co-edited special issues, organized tracks, mini-tracks and panels in leading conferences. He is chief Editor of International Journal of Electronic Government Research (IJEGR) an Associate Editor of International Journal of Information Management.

    1

    All the authors of current manuscript contributed equally.

    View full text