Prediction of risk factors of cyberbullying-related words in Korea: Application of data mining using social big data
Introduction
Smart media have been made widely available across the world, and the use of mobile Internet and Social Network Service (Hereafter SNS) has rapidly increased in daily life. SNS, which connects the relationships among individuals, groups, and society in the network, has characteristics of real-time and acceleration; therefore, the speed of propagating an issue is faster than in any other media. The Internet and SNS have positive effects, such as information search and online chatting as well as negative effects, including cyberbullying, Internet addiction, and preoccupation with games. In particular, SNS is utilized as a venue where the feelings, stress, and worries that adolescents feel in their daily lives are expressed and relieved. However, it has also emerged as a site for serious social problems where teenagers exposed to cyberbullying chose to commit suicide or became bullies who inflict harm on others.
Furthermore, the volume of data transmitted on SNS has soared exponentially, and the value of data is recognized as an economic asset; based on this, attempts are made to actively utilize “big data” in various areas. In Korea, an enormous volume of big data is managed and stored by portals and SNS’ of the government and public or private organizations; however, the utilization and analysis of big data have not made sufficient progress largely because of the difficulties in accessing and analyzing information. In particular, a study using existing cross-sectional data or longitudinal data, which is used to determine the causes of cyberbullying and its related factors, is useful in determining the relationships between an individual and a group with respect to pre-determined variables. However, there are limitations in how and to what extent an individual-level buzz commenting on cyberspace is related to a given social phenomenon. Thus, the decision tree analysis of data mining, which utilizes social big data is a useful tool to effectively analyze the interactions of various causes; these may arise from the complex and dynamic phenomenon of human behaviors such as cyberbullying, by revealing a new correlation or pattern according to decision rules in the absence of special statistical hypotheses. This study proposes a predictive model and related rules that can explain the causes of cyberbullying according to the types indicated by social big data collected from domestic news sites, blogs, cafes, SNS, bulletin boards, and others. The purpose of this research is to propose a predictive model of risk factors by cyberbullying type in Korea through a decision tree analysis of data mining using social big data. The research’s specific purposes are as follows: First, classify the types of cyberbullying and determine the factors that affect it by type; second, develop a decision tree that can determine the risk factors for each type of cyberbullying.
Section snippets
Literature Review
Bullying refers to a situation in which a student is exposed to repeated and persistent negative behavior by one or more students and in which that student is a victim of bullying subject to negative behavior, including psychological and physical harassment (Olweus, 1994). Cyberbullying is an aggressive behavior that is conducted using electronic means repeatedly over time by a group or an individual against a victim who cannot easily defend himself or herself (Hinduja and Patchin, 2008; Slonje
Research target
This research targeted social big data collected from the Internet such as domestic online news sites, blogs, cafes, SNS, and bulletin boards. In this research, social big data is defined as text-based web documents (buzz) that can be collected from a total of 227 online channels, i.e., 214 online news sites, 4 blogs, 3 cafe, 2 SNS’s, and 4 bulletin boards. A total of 435,563 cases of cyberbullying-related topics were collected from each channel during the period between January 1, 2011 and
Research tools
The collected buzz related to cyberbullying were coded to structured data through the process of text mining and opinion mining.
Descriptive statistics of major factors
Total of 435,565 buzz cases in which cyberbullying topics were commented on, the rate of buzz in which the causes of cyberbullying were commented on represents 23.7% (103,212 cases). With respect to these cases of buzz, the types of cyberbullying commented on are classified into: 56% document (57,817 cases) did not contain any emotion, 32.3% victims (33,361 cases), 6.4% perpetrators (6587 cases), and bystanders at 5.3% (5447 cases). The impulse factor ranked first among the causal factors of
Discussion
In this research, the decision tree analysis was performed using social big data to verify the predictive model related to risk factors for each cyberbullying type. In contrast to regression analysis or structural equations, the decision tree analysis model performs analysis and prediction in a tree-structure diagram according to decision rules without involving a special statistical hypothesis; as such, it is useful in determining patterns or relations between factors that have a great effect
Conclusion
This research study was performed to classify cyberbullying types through a data mining technique based on cyberbullying-related buzz and collected social big data in Korea, as well as to determine causes that affect each type and develop a decision tree that can predict risk factors. Total of 435,565 buzz cases in which cyberbullying topics were commented on, the rate of buzz in which the causes of cyberbullying were commented on represents 23.7% (103,212 cases). With respect to these cases of
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
References (34)
- et al.
Cyberbullying on social network sites. An experimental study into bystanders’ behavioural intentions to help the victim or reinforce the bully
Comput. Hum. Behav.
(2014) - et al.
The role of bystanders in students' perception of bullying and sense of safety
J. Sch. Psychol.
(2008) - et al.
Impulsivity, attribution and prison bullying: Bully-category and perpetrator–victim mutuality
Int. J. Law Psychiatry
(2009) - et al.
Does the offline bully-victimization influence cyberbullying behavior among youths? Application of General Strain Theory
Comput. Hum. Behav.
(2014) - et al.
Bullying victimization among school-aged immigrant youth in the United States
J. Adolesc. Health
(2016) - et al.
The nature of cyberbullying, and strategies for prevention
Comput. Hum. Behav.
(2013) - et al.
Social big data analysis of future signals for bullying in South Korea: Application of general strain theory
Telemat. Inform.
(2020) - et al.
Mediating the bullying victimization–delinquency relationship with anger and cognitive impulsivity: A test of general strain and criminal lifestyle theories
J. Crim. Justice
(2017) - et al.
Automatic monitoring of cyberbullying on social networking sites: From technological feasibility to desirability
Telemat. Inform.
(2015) - et al.
Predicting cyberbullying on social media in the big data era using machine learning algorithms: Review of literature and open challenges
IEEE Access
(2019)
December). Detection of cyberbullying in social networks using machine learning methods
Bullying experience of racial and ethnic minority Youth in South Korea
J. Early Adolesc.
Forecasts of violence to inform sentencing decisions
J. Quant. Criminol.
Cyber bullying in schools and the law: Is there an effective means of addressing the power imbalance
eLaw J.
Principles of Data Mining”. The MIT Press
In A comprehensive, highlytechnical look at the math and science behind extracting useful information from large databases
Cited by (17)
Systematic analysis on school violence and bullying using data mining
2023, Children and Youth Services ReviewOptimized hadoop map reduce system for strong analytics of cloud big product data on amazon web service
2023, Information Processing and ManagementWhat awareness variables are associated with motivation for changing risky behaviors to prevent recurring victims of cyberbullying?
2021, HeliyonCitation Excerpt :Experiencing cyberbullying has been linked to a number of factors. The impulse factor is the most influential factor for predicting cyberbullying (Song and Song, 2021). Particularly among university students who are separated from their families to study, social media may be used more actively to help reduce loneliness (Deters and Mehl, 2013).
Profiling the digital divide of the elderly based on Internet big data: evidence from China
2021, Data Science and ManagementCitation Excerpt :Based on text mining technology, Serna and Gasparovic (2018) collect data from websites and combine with official surveys to investigate the satisfaction of tourists with the modes of transportation used; Amado (2018) collects marketing-related papers and analyze that the application of big data in marketing is still in its infancy; Chen (2019) extracts from big data and analyzes real customer needs for product improvement in the mobile application market by mining online reviews; Kano (2019) analyzes the content of citizen reports issued by the government, extracts and categorizes problems, and reflects trends in actual problems; Alzamil et al. (2020) extract information from the social media platform Twitter and classifies financial information; Cho et al. (2020) study the association between classic Chinese herbal medicines and skin-related keywords, and makes a comprehensive list of candidate skin care Chinese herbal medicines to help discover new candidate Chinese herbal medicines; Kushwaha et al. (2021) analyze related papers in management journals and reveals the emerging management fields supported by contemporary big data; Wu et al. (2021) collect data on Sina Weibo users and their comments on related popular posts and analyzes them to explore the sentimental tendencies of Chinese residents towards garbage classification policies. Song et al. (2021) collect data from online channels such as news sites and blogs, classify cyberbullying types through a data mining technique based on cyberbullying-related buzz and collect social big data in Korea, to determine causes that affect each type and develop a decision tree that can predict risk factors. And Lim et al. (2021) apply text mining technology to the literature review to facilitate the understanding of the overall composition and trends of smart city research, and to understand the work of building a common foundation for smart cities from a multi-disciplinary perspective.
Research on a hierarchical intervention algorithm for violent crime based on CLGA-Net
2024, International Journal of Machine Learning and Cybernetics