Prediction of risk factors of cyberbullying-related words in Korea: Application of data mining using social big data

https://doi.org/10.1016/j.tele.2020.101524Get rights and content

Highlights

Abstract

The study examined a decision tree analysis using social big data to conduct the prediction model on types of risk factors related to cyberbullying in Korea. The study conducted an analysis of 103,212 buzzes that had noted causes of cyberbullying and data were collected from 227 online channels, such as news websites, blogs, online groups, social network services, and online bulletin boards. Using opinion-mining method and decision tree analysis, the types of cyberbullying were sorted using SPSS 25.0. The results indicated that the total rate of types of cyberbullying in Korea was 44%, which consisted of 32.3% victims, 6.4% perpetrators, and 5.3% bystanders. According to the results, the impulse factor was also the greatest influence on the prediction of the risk factors and the propensity for dominance factor was the second greatest factor predicting the types of risk factors. In particular, the impulse factor had the most significant effect on bystanders, and the propensity for dominance factor was also significant in influencing online perpetrators. It is necessary to develop a program to diminish the impulses that were initiated by bystanders as well as victims and perpetrators because many of those bystanders have tended to aggravate impulsive cyberbullying behaviors.

Introduction

Smart media have been made widely available across the world, and the use of mobile Internet and Social Network Service (Hereafter SNS) has rapidly increased in daily life. SNS, which connects the relationships among individuals, groups, and society in the network, has characteristics of real-time and acceleration; therefore, the speed of propagating an issue is faster than in any other media. The Internet and SNS have positive effects, such as information search and online chatting as well as negative effects, including cyberbullying, Internet addiction, and preoccupation with games. In particular, SNS is utilized as a venue where the feelings, stress, and worries that adolescents feel in their daily lives are expressed and relieved. However, it has also emerged as a site for serious social problems where teenagers exposed to cyberbullying chose to commit suicide or became bullies who inflict harm on others.

Furthermore, the volume of data transmitted on SNS has soared exponentially, and the value of data is recognized as an economic asset; based on this, attempts are made to actively utilize “big data” in various areas. In Korea, an enormous volume of big data is managed and stored by portals and SNS’ of the government and public or private organizations; however, the utilization and analysis of big data have not made sufficient progress largely because of the difficulties in accessing and analyzing information. In particular, a study using existing cross-sectional data or longitudinal data, which is used to determine the causes of cyberbullying and its related factors, is useful in determining the relationships between an individual and a group with respect to pre-determined variables. However, there are limitations in how and to what extent an individual-level buzz commenting on cyberspace is related to a given social phenomenon. Thus, the decision tree analysis of data mining, which utilizes social big data is a useful tool to effectively analyze the interactions of various causes; these may arise from the complex and dynamic phenomenon of human behaviors such as cyberbullying, by revealing a new correlation or pattern according to decision rules in the absence of special statistical hypotheses. This study proposes a predictive model and related rules that can explain the causes of cyberbullying according to the types indicated by social big data collected from domestic news sites, blogs, cafes, SNS, bulletin boards, and others. The purpose of this research is to propose a predictive model of risk factors by cyberbullying type in Korea through a decision tree analysis of data mining using social big data. The research’s specific purposes are as follows: First, classify the types of cyberbullying and determine the factors that affect it by type; second, develop a decision tree that can determine the risk factors for each type of cyberbullying.

Section snippets

Literature Review

Bullying refers to a situation in which a student is exposed to repeated and persistent negative behavior by one or more students and in which that student is a victim of bullying subject to negative behavior, including psychological and physical harassment (Olweus, 1994). Cyberbullying is an aggressive behavior that is conducted using electronic means repeatedly over time by a group or an individual against a victim who cannot easily defend himself or herself (Hinduja and Patchin, 2008; Slonje

Research target

This research targeted social big data collected from the Internet such as domestic online news sites, blogs, cafes, SNS, and bulletin boards. In this research, social big data is defined as text-based web documents (buzz) that can be collected from a total of 227 online channels, i.e., 214 online news sites, 4 blogs, 3 cafe, 2 SNS’s, and 4 bulletin boards. A total of 435,563 cases of cyberbullying-related topics were collected from each channel during the period between January 1, 2011 and

Research tools

The collected buzz related to cyberbullying were coded to structured data through the process of text mining and opinion mining.

Descriptive statistics of major factors

Total of 435,565 buzz cases in which cyberbullying topics were commented on, the rate of buzz in which the causes of cyberbullying were commented on represents 23.7% (103,212 cases). With respect to these cases of buzz, the types of cyberbullying commented on are classified into: 56% document (57,817 cases) did not contain any emotion, 32.3% victims (33,361 cases), 6.4% perpetrators (6587 cases), and bystanders at 5.3% (5447 cases). The impulse factor ranked first among the causal factors of

Discussion

In this research, the decision tree analysis was performed using social big data to verify the predictive model related to risk factors for each cyberbullying type. In contrast to regression analysis or structural equations, the decision tree analysis model performs analysis and prediction in a tree-structure diagram according to decision rules without involving a special statistical hypothesis; as such, it is useful in determining patterns or relations between factors that have a great effect

Conclusion

This research study was performed to classify cyberbullying types through a data mining technique based on cyberbullying-related buzz and collected social big data in Korea, as well as to determine causes that affect each type and develop a decision tree that can predict risk factors. Total of 435,565 buzz cases in which cyberbullying topics were commented on, the rate of buzz in which the causes of cyberbullying were commented on represents 23.7% (103,212 cases). With respect to these cases of

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (34)

  • E.V. Altay et al.

    December). Detection of cyberbullying in social networks using machine learning methods

  • H.-o. Bae et al.

    Bullying experience of racial and ethnic minority Youth in South Korea

    J. Early Adolesc.

    (2019)
  • R. Berk et al.

    Forecasts of violence to inform sentencing decisions

    J. Quant. Criminol.

    (2014)
  • D. Butler et al.

    Cyber bullying in schools and the law: Is there an effective means of addressing the power imbalance

    eLaw J.

    (2009)
  • K.A. Fanti E.R. Kimonis Dimensions of juvenile psychopathy distinguish “bullies,” “bully-victims,” and “victims”....
  • C.L. Fox M.J. Boulton The social skills problems of victims of bullying: Self, peer and teacher perceptions 75 2 2005...
  • D. Hand et al.

    Principles of Data Mining”. The MIT Press

    In A comprehensive, highlytechnical look at the math and science behind extracting useful information from large databases

    (2001)
  • Cited by (17)

    • What awareness variables are associated with motivation for changing risky behaviors to prevent recurring victims of cyberbullying?

      2021, Heliyon
      Citation Excerpt :

      Experiencing cyberbullying has been linked to a number of factors. The impulse factor is the most influential factor for predicting cyberbullying (Song and Song, 2021). Particularly among university students who are separated from their families to study, social media may be used more actively to help reduce loneliness (Deters and Mehl, 2013).

    • Profiling the digital divide of the elderly based on Internet big data: evidence from China

      2021, Data Science and Management
      Citation Excerpt :

      Based on text mining technology, Serna and Gasparovic (2018) collect data from websites and combine with official surveys to investigate the satisfaction of tourists with the modes of transportation used; Amado (2018) collects marketing-related papers and analyze that the application of big data in marketing is still in its infancy; Chen (2019) extracts from big data and analyzes real customer needs for product improvement in the mobile application market by mining online reviews; Kano (2019) analyzes the content of citizen reports issued by the government, extracts and categorizes problems, and reflects trends in actual problems; Alzamil et al. (2020) extract information from the social media platform Twitter and classifies financial information; Cho et al. (2020) study the association between classic Chinese herbal medicines and skin-related keywords, and makes a comprehensive list of candidate skin care Chinese herbal medicines to help discover new candidate Chinese herbal medicines; Kushwaha et al. (2021) analyze related papers in management journals and reveals the emerging management fields supported by contemporary big data; Wu et al. (2021) collect data on Sina Weibo users and their comments on related popular posts and analyzes them to explore the sentimental tendencies of Chinese residents towards garbage classification policies. Song et al. (2021) collect data from online channels such as news sites and blogs, classify cyberbullying types through a data mining technique based on cyberbullying-related buzz and collect social big data in Korea, to determine causes that affect each type and develop a decision tree that can predict risk factors. And Lim et al. (2021) apply text mining technology to the literature review to facilitate the understanding of the overall composition and trends of smart city research, and to understand the work of building a common foundation for smart cities from a multi-disciplinary perspective.

    • Research on a hierarchical intervention algorithm for violent crime based on CLGA-Net

      2024, International Journal of Machine Learning and Cybernetics
    View all citing articles on Scopus
    View full text