Skip to main content
Log in

A multi-platform dataset for detecting cyberbullying in social media

  • Original Paper
  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

Recent work on cyberbullying detection relies on using machine learning models with text and metadata in small datasets, mostly drawn from single social media platforms. Such models have succeeded in predicting cyberbullying when dealing with posts containing the text and the metadata structure as found on the platform. Instead, we develop a multi-platform dataset that consists purely of the text from posts gathered from seven social media platforms. We present a multi-stage and multi-technique annotation system that initially uses crowdsourcing for post and hashtag annotation and subsequently utilizes machine-learning methods to identify additional posts for annotation. This process has the benefit of selecting posts for annotation that have a significantly greater than chance likelihood of constituting clear cases of cyberbullying without limiting the range of samples to those containing predetermined features (as is the case when hashtags alone are used to select posts for annotation). We show that, despite the diversity of examples present in the dataset, good performance is possible for models trained on datasets produced in this manner. This becomes a clear advantage compared to traditional methods of post selection and labeling because it increases the number of positive examples that can be produced using the same resources and it enhances the diversity of communication media to which the models can be applied.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Similar content being viewed by others

Notes

  1. https://cyberbullying.org/.

  2. http://www.bbc.co.uk/news/10302550.

  3. http://caw2.barcelonamedia.org.

  4. https://goo.gl/z8YiRf.

  5. https://goo.gl/4gmC2m.

  6. Questions 3 and 4 elicited responses on two Likert scales, each with 5 gradations. Respectively, these were (i) ‘Never’, ‘Seldom’, ‘Sometimes’, ‘Often’, and ‘Always’ and (ii) ‘Not concerned’, ‘Concerned a little’, ‘Moderately concerned’, ‘Concerned’, and ‘Very concerned’.

  7. The app is currently available from https://www.safetonet.com/, through the Apple App store, or through the Google Play store.

  8. https://liwc.wpengine.com/

References

  • Al-garadi, M. A., Varathan, K. D., & Ravana, S. D. (2016). Cybercrime detection in online communications: The experimental case of cyberbullying detection in the twitter network. Computers in Human Behavior, 63, 433–443.

    Article  Google Scholar 

  • Ashktorab, Z., Kumar, S., De, S., & Golbeck, J. (2014). “ianon: Leveraging social network big data to mitigate behavioral symptoms of cyberbullying,” iConference 2014 (Social Media Expo).

  • Beckman, L., Hagquist, C., & Hellström, L. (2012). Does the association with psychosomatic health problems differ between cyberbullying and traditional bullying? Emotional and Behavioural Difficulties, 17(3–4), 421–434.

    Article  Google Scholar 

  • Bigelow, J.L., Edwards, L. et al., (2016). “Detecting cyberbullying using latent semantic indexing,” in Proceedings of the First International Workshop on Computational Methods for CyberSafety, pp. 11–14, ACM.

  • Chawla, N.V. (2009). “Data mining for imbalanced datasets: An overview,” in Data mining and knowledge discovery handbook, pp. 875–886, Springer.

  • Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). Smote: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357.

    Article  Google Scholar 

  • Chu, T., Jue, K., & Wang, M. (2016) “Comment abuse classification with deep learning.”

  • Dadvar, M., de Jong, F.M., Ordelman, R., & Trieschnigg, R. (2012). “Improved cyberbullying detection using gender information,”.

  • Dadvar, M., Trieschnigg, D., & de Jong, F. (2014). “Experts and machines against bullies: A hybrid approach to detect cyberbullies,” in Canadian Conference on Artificial Intelligence, pp. 275–281, Springer.

  • Dadvar, M., Trieschnigg, D., Ordelman, R., & de Jong, F. (2013). “Improving cyberbullying detection with user context.,” in ECIR, pp. 693–696, Springer.

  • Dinakar, K., Reichart, R., & Lieberman, H. (2011). “Modeling the detection of textual cyberbullying.,”. The Social Mobile Web, 11(02).

  • Dinakar, K., Jones, B., Havasi, C., Lieberman, H., & Picard, R. (2012). Common sense reasoning for detection, prevention, and mitigation of cyberbullying. ACM Transactions on Interactive Intelligent Systems (TiiS), 2(3), 18.

    Google Scholar 

  • Hosseinmardi, H., Mattson, S.A., Rafiq, R.I., Han, R., Lv, Q., & Mishra, S. (2015). “Detection of cyberbullying incidents on the instagram social network,” arXiv preprint arXiv:1503.03909.

  • Huang, Q., Singh, V.K., & Atrey, P.K. (2014). “Cyber bullying detection using social and textual analysis,” in Proceedings of the 3rd International Workshop on Socially-Aware Multimedia, pp. 3–6, ACM.

  • Husseini Orabi, A., Husseini Orabi, M., Huang, Q., Inkpen, D., Van Bruwaene, & D. Aug. (2018) “Cyber-aggression detection using cross segment-and-concatenate multi-task learning from text,” in Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying (TRAC-2018), (Santa Fe, New Mexico, USA), pp. 159–165, Association for Computational Linguistics.

  • Kontostathis, A., Reynolds, K., Garron, A., & Edwards, L. (2013). “Detecting cyberbullying: query terms and techniques,” in Proceedings of the 5th annual acm web science conference, pp. 195–204, ACM.

  • Kowalski, R. M., & Limber, S. P. (2007). Electronic bullying among middle school students. Journal of Adolescent Health, 41(6), S22–S30.

    Article  Google Scholar 

  • Nahar, V., Unankard, S., Li, X., & Pang, C. (2012). “Sentiment analysis for effective detection of cyber bullying,” in Asia-Pacific Web Conference, pp. 767–774, Springer.

  • Nahar, V., Li, X., & Pang, C. (2013). An effective approach for cyberbullying detection. Communications in Information Science and Management Engineering, 3(5), 238.

    Google Scholar 

  • Nandhini, B. S., & Sheeba, J. (2015). Online social network bullying detection using intelligence techniques. Procedia Computer Science, 45, 485–492.

    Article  Google Scholar 

  • Raisi, E., & Huang, B. (2016). “Cyberbullying identification using participant-vocabulary consistency,” arXiv preprint arXiv:1606.08084.

  • Reynolds, K., Kontostathis, A., & Edwards, L. (2011). “Using machine learning to detect cyberbullying,” in 2011 10th International Conference on Machine learning and applications and workshops (ICMLA), 2, pp. 241–244, IEEE.

  • Singh, V.K., Ghosh, S., & Jose, C. (2017). “Toward multimodal cyberbullying detection,” in Proceedings of the 2017 CHI Conference Extended Abstracts on Human Factors in Computing Systems, pp. 2090–2099, ACM.

  • Singh, V.K., Huang, Q., & Atrey, P.K. (2016). “Cyberbullying detection using probabilistic socio-textual information fusion,” in Advances in Social Networks Analysis and Mining (ASONAM), 2016 IEEE/ACM International Conference on, pp. 884–887, IEEE.

  • Sintaha, M., Satter, S.B., Zawad, N., Swarnaker, C., & Hassan, A. (2016). Cyberbullying detection using sentiment analysis in social media. PhD thesis, BRAC University.

  • Smith, P. K., Mahdavi, J., Carvalho, M., Fisher, S., Russell, S., & Tippett, N. (2008). Cyberbullying: Its nature and impact in secondary school pupils. Journal of Child Psychology and Psychiatry, 49(4), 376–385.

    Article  Google Scholar 

  • Sourander, A., Klomek, A. B., Ikonen, M., Lindroos, J., Luntamo, T., Koskelainen, M., et al. (2010). Psychosocial risk factors associated with cyberbullying among adolescents: A population-based study. Archives of General Psychiatry, 67(7), 720–728.

    Article  Google Scholar 

  • Squicciarini, A., Rajtmajer, S., Liu, Y., & Griffin, C. (2015). “Identification and characterization of cyberbullying dynamics in an online social network,” in Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015, pp. 280–285, ACM.

  • Tokunaga, R. S. (2010). Following you home from school: A critical review and synthesis of research on cyberbullying victimization. Computers in Human Behavior, 26(3), 277–287.

    Article  Google Scholar 

  • Wulczyn, E., Thain, N., & Dixon, L. (2017). “Ex machina: Personal attacks seen at scale,” in Proceedings of the 26th International Conference on World Wide Web, pp. 1391–1399, International World Wide Web Conferences Steering Committee.

  • Xu, J.-M., Jun, K.-S., Zhu, X., & Bellmore, A. (2012). “Learning from bullying traces in social media,” in Proceedings of the 2012 conference of the North American chapter of the association for computational linguistics: Human language technologies, pp. 656–666, Association for Computational Linguistics.

  • Yin, D., Xue, Z., Hong, L., Davison, B. D., Kontostathis, A., & Edwards, L. (2009). Detection of harassment on web 2.0. Proceedings of the Content Analysis in the WEB, 2, 1–7.

    Google Scholar 

  • Zhao, R., & Mao, K. (2016). Cyberbullying detection based on semantic-enhanced marginalized denoising auto-encoder. IEEE Transactions on Affective Computing, 8, 328–339.

    Article  Google Scholar 

  • Zhong, H., Li, H., Squicciarini, A.C., Rajtmajer, S.M., Griffin, C., Miller, D.J., & Caragea, C. (2016).“Content-driven detection of cyberbullying on the instagram social network.,” in IJCAI, pp. 3952–3958.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David Van Bruwaene.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This project was supported by the Natural Sciences and Engineering Research Council of Canada (NSERC), the Ontario Centers of Excellence (OCE), and SafeToNet Ltd.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Van Bruwaene, D., Huang, Q. & Inkpen, D. A multi-platform dataset for detecting cyberbullying in social media. Lang Resources & Evaluation 54, 851–874 (2020). https://doi.org/10.1007/s10579-020-09488-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-020-09488-3

Keywords

Navigation