Abstract

As the most active and knowledge-intensive university, the forefront of the use of network information technology, the education, management, and service mode of the school, and the ideology, learning methods, and behavioural habits of teachers and students will be profoundly affected by the era of big data. The current form of student management in universities is too old and needs to be replaced by a new management system. With the promotion of the construction of smart campus, the rapid development of big data technology has realised the innovation of student management in colleges and universities. This paper takes students of Wuhan University as the research object. By collecting various application data in the campus information system, the K-means algorithm in cluster analysis is used to classify students’ campus behavioural characteristics, and then the Apriori algorithm is used to correlate students’ behavioural characteristics with their academic performance. The experimental results show that there is a close relationship between the consumption behaviour, work and rest behaviour, study behaviour characteristics of different student groups, and their academic performance. Using the results of these analyses, universities can adopt differentiated management measures for different categories of students, which can help improve students’ academic performance as well as further enhance the efficiency of student management.

1. Introduction

With the abundance of Internet applications, every student and every teacher are producing a variety of data every day. When the amount of data reaches a certain level, it can be analysed and mined in a meaningful way, and the analysis of massive amounts of data will have a profound impact and influence on education information technology. At the same time, schools have more and more live data at their disposal, which provides new directions for research to find the reasons behind the patterns. For example, the campus network platform used by students to communicate generates a huge amount of data consisting of photos, videos, conversations, messages, emails, etc. every day, and these datasets reflect students’ thinking, emotional direction, and behavioural dynamics and contain rich information and regular content. How to scientifically store, manage, and later effectively analyse and utilise huge amounts of data is of great importance in student management in universities.

Big data has become the microscope through which people can analyse their thoughts and observe their behaviour, as it goes beyond the description of the nature of things and can be mined and analysed to concretely quantify all aspects of people’s thoughts and behaviour. Similarly, student management in universities needs not only experience to guide it but also science to lead it. In this respect, if student management in universities can use data more effectively to analyse problems and make decisions, the quality of work will be greatly improved. With the rapid development of the mobile Internet, students are already making greater use of information networks in their activities. For example, video filming and audio and photographic recordings are used extensively for webcasting while being published in conjunction with traditional formats. The links between these heterogeneous sources of information data, obtained in different forms, are so complex that they are difficult to count and describe in a traditional, simple way. Therefore, the scientific and rational analysis, collation and aggregation of information and data, and presentation of the results in an intuitive and decision-friendly form have become a major issue for student management in universities.

Student management often makes use of instant messaging devices such as mobile phones to gather information about current events on campus and to understand the dynamics of student thinking, which also generates a large amount of data flow, which reflects the data flowing through the system and shows the characteristics of dynamic data. The best way to understand students’ needs and feedback is to tap into the information data generated by students and analyse their behaviour. The best way to understand students’ needs and feedback is to use student data to analyse their behaviour, but one of the problems is to sort out the data and store it in a scientific way. At the same time, the security of information data, which is closely related to the physical and mental safety of students, is also an aspect that must be taken into account in order to ensure that information about students is not leaked and is not used maliciously by other institutions and people.

In the process of the continuous evolution of big data technology, the construction of university intelligent campus is in the midst of a profound change towards the Internet and big data, while the student management problems, also under the influence of Internet+, are increasingly highlighted. The ubiquitous mobile network makes students obtain various information through the Internet all the time and at the same time inevitably makes students’ studies negatively affected to different degrees. For example, students do not develop good study habits and habits of life, using their mobile phones to watch TV series, play games, and brush microblogs and jitterbug during classes, leading to low efficiency in classroom learning, lack of interest in learning new technologies and knowledge, insufficient motivation to learn, and a preference for speculation and taking refuge in easy choices [1, 2].

There are three major problems with university education in the new information technology environment. First, students are not sufficiently motivated to learn subjectively. In the relaxed learning atmosphere of university, due to the lack of external pressure and effective supervision, many students often miss classes, leave early, and even desert class, a situation mostly seen in freshmen students, so it is also called new sickness. In the long run, the lack of energy input will lead to academic wastage. Second, students study inappropriately. Many students still maintain the study habits of their high school days after entering university and do not adapt to new teaching modes, such as MOOC and flipped classroom, resulting in half the effort and half the result, resulting in their academic performance being unsatisfactory. Third, students’ learning disabilities have not been corrected. This is due to the teachers’ teaching methods, as well as the students’ personal subjective lack of effort. Of the three types of situation: not wanting to learn, not knowing how to learn, and wanting to learn but not being able to learn, the first two account for the vast majority of cases [3]. In response to these situations, universities urgently need to innovate the management methods of university students and promote refined and differentiated management methods. Big data technology provides new ideas for personalized management and intelligent decision making.

Chinese scholars have begun to study the application of big data technology in the education sector and the impact and changes it has brought [4], with research findings focusing on the use of big data technology to analyse student behaviour [57] and transform management thinking and models [810]. However, few research results have been published on the use of big data technology to fine-tune and differentiate the management of college students. In this paper, from the perspective of college student management, we use the data from the digital platform of education to explore the hidden information value and provide new ideas and methods for college student management.

At present, most of the student management workers in universities are able to adjust their working concepts, methods, and approaches in a timely manner and make full use of the convenient conditions brought by network information technology to improve the efficiency and quality of their work. However, in the era of big data, the existing technical means in universities can only analyse a small number of information data with similar categories and structured shapes and cannot yet collect, store, analyse, and visualise the results of big data in the modern sense, and data analysis techniques are not yet widely used and not familiar to most staff. Moreover, it is difficult to invest significant effort and financial resources in the current student management in universities to use such techniques to analyse the individual needs of students in terms of learning and development. At the same time, professional data mining, integration, and analysis skills are scarce. The value of big data is undeniable, but the limitations in the use of analytical talent and technology have made it difficult to fully realise the value of data for the time being.

The advent of the big data era is undeniable, and data as an important asset are changing the decision-making model of governments and enterprises. Student management in higher education must face the challenges of this change, change its thinking, and use all possible methods to fully exploit the value of big data and provide a variety of support for its work. The era of big data requires more complex talents with data technology skills. To fully utilise the role of big data technology in student management, teams with big data analysis skills need to be formed. Introduce more specific technical training in the team, master theories, methods, and tools for data analysis on big data platforms, build a cloud-based student management system, etc. Through real-time tracking and analysis of internal campus communication forums, it can keep abreast of and grasp the dynamics of students’ thoughts, actively guide public opinion on hot issues and emergencies that students discuss and care about, and maintain campus security and stability.

From the perspective of the current team structure of student management in China’s colleges and universities, it is difficult to enrich in a short period of time a teaching force with both a background of knowledge of big data technology and the laws of student management. Therefore, in order to realise the challenges and innovations of student management in universities under the background of big data, this paper collects various application data in the campus information system, classifies students’ campus behavioural characteristics by using the K-means algorithm in cluster analysis, and then uses the Apriori algorithm to correlate student behavioural characteristics with academic performance. The experimental results show that there is a close relationship between the consumption behaviour, work and rest behaviour, study behaviour characteristics of different student groups, and their academic performance. Using the results of these analyses, universities can adopt differentiated management measures for different categories of students, which can help to improve students’ academic performance as well as further enhance the efficiency of student management.

2. The Application of Big Data in Student Management in Universities

In the management of students in colleges and universities, scientific and reasonable classification of students’ behaviour in school, the formulation of corresponding management service methods for different types of students, and the provision of personalized support measures can improve the refinement of student management and teaching services and enhance the quality level of talent cultivation. In the era of big data on the Internet, students’ one card consumption data, classroom card attendance data, book borrowing data, and Internet access data can dynamically and accurately map out students’ behavioural characteristics and the behavioural habits hidden behind these data.

2.1. Experimental Data

In this paper, undergraduate students of Wuhan University Class 2021 were taken as the research object, and the behavioural data of four grades of undergraduate students for one academic year were selected, and the research variables of student behavioural characteristics and student performance were set, and the specific relationship is shown in Figure 1. Among them, the consumption behaviour data mainly came from the One Card system, and the work and rest behaviour data mainly came from the Sunshine Attendance System, the One Card system, and the Internet authentication system. The learning behaviour data mainly came from the teaching attendance system, the book lending system, and the Internet authentication system. The raw data extracted include 8426392 consumption flows from the One Card, 741904 attendance flows from the Sunshine Attendance System, and 929088 book borrowing flows. Due to the redundancy and structural inconsistency in the data, the student samples with serious missing information and some outliers were eliminated after data processing, and the final sample data of 12904 students were obtained.

Specific indicator codes for the student behavioural characteristics and student achievement research variables are given in Table 1.

2.2. Clustering Analysis of Student Behaviour Using the K-Means Algorithm

The K-means algorithm was proposed by the academic Mac Queen in 1967, and it is one of the classical algorithms of clustering algorithms [1113]. The core idea of the K-means algorithm is to find out K cluster centres , then make each data , through continuous iteration, as close as possible to the cluster centre where it is located, and make the data not in the same cluster as far away from each other as possible. The calculation of cluster centres varies depending on the clustering algorithm, and the K-means algorithm determines the cluster centres based on the average of all data within a cluster.

Since the K-means algorithm always converges, K-means can always reach a steady state in a finite number of steps, i.e., the clustering centres will not change again [1416]. Since changes in the clustering centres often occur in the course of the previous iterations, in order to optimise the time complexity of the algorithm, the iterative process can usually be stopped and the results can be output directly when only more than 99% of the data points belong to clusters that no longer change. This approach is effective in reducing the time complexity of K-means when dealing with larger data [17, 18]. In general, we can use the inverse of the spatial distance as the expression for calculating similarity, as shown in the following equation:where c represents the cluster centre, x represents a particular data point, and represents the Euclidean distance. The expression for calculating the Euclidean distance on an N-dimensional continuous space is given in (2), where x and y are two points in the N-dimensional space, and represents the value of point x in the i-dimensional coordinates.

In general, the cluster centre is the mean or median of all the data in the cluster, i.e., (3), where is the cluster centre of cluster and is the number of data in cluster .

For two different clustering results obtained by K-means on the same dataset, the superiority of the results can be judged by comparing their SSE, with a smaller SSE indicating a more desirable result. This is because a smaller SSE indicates a smaller sum of errors for all the data, which means that the clustering centre can better represent all the data in its clusters.

Clustering analysis is descriptive in the context of data mining tasks and is characterised by the fact that the output of the algorithm cannot be described in terms of correctness or incorrectness, i.e., there is no unique solution. Although there is no unique solution for this type of task, the evaluation of its results is important.

2.3. Analysis of the Association between Student Behaviour and Academic Performance at School Using the Apriori Algorithm

The Apriori algorithm is an algorithm for mining frequent item sets for Boolean association rules [1921]. Apriori is a Boolean association rule frequent itemset mining algorithm that is part of unsupervised learning in machine learning [22, 23]. It is necessary to set a minimum confidence level to constrain the confidence level of association rules. The core of the Apriori algorithm is a two-stage recursive mining process, where the first stage is to find all combinations of factors with high frequency from the dataset, and the second stage is to find association rules that satisfy the requirements from all frequent items in the set. The formulae for support [24] and confidence [25] are as follows:where support indicates the probability of event A and event B occurring simultaneously and confidence indicates the probability of event B occurring simultaneously given the occurrence of event A. The results of the two confidence levels are shown in the following equation:

3. K-Means Clustering Analysis

Before the data are clustered and analysed, they are normalised and dimensionless. Cluster analysis was then performed to create a new data stream and the data were then processed using the K-means algorithm.

3.1. Analysis of Student Behaviour
3.1.1. Clustering Analysis of Student Consumption Behaviour

According to the evaluation criteria of the clustering algorithm, the best clustering effect was obtained when the number of clusters was set to 5. The clustered student classification results were analysed by comparing the mean of each cluster with the mean of the overall student indicator according to the actual situation of student consumption and noting H as the mean above the overall student indicator and L as the mean below the overall student indicator. The results of the cluster analysis of student consumption behaviour for each cluster are shown in Table 2.

The average monthly spending and the corresponding percentage of students in this category are shown in Figure 2.

The data on the average monthly consumption frequency and the average monthly consumption peak are shown in Figure 3.

Figures 2 and 3 show that the characteristics of student consumption behaviour fall into five categories. The first group of students has the lowest monthly consumption level and the lowest monthly consumption peak but consumes frequently and belongs to the group with a low consumption level. The second group of students has a medium monthly consumption level but has a high monthly consumption peak and spends less frequently. The third group of students has a medium to high monthly consumption level, with a high monthly consumption frequency and a medium monthly consumption peak. The fourth group of students has the highest monthly consumption level, with frequent monthly consumption and the highest monthly consumption peak. The fifth group of students had a low monthly consumption level, the least frequent monthly consumption, and a low monthly consumption peak.

3.1.2. Clustering Analysis of Students’ Work and Rest Behaviour

The K-means algorithm cluster analysis was conducted on the eating and drinking habits, Internet habits, early waking habits, and physical exercise habits among the indicators of students’ work and rest behaviour. According to the clustering average criterion, when the number of clusters is 3, the percentage of students in each type of cluster and the average value of the indicators are shown in Table 3.

Table 3 shows that the first group of students woke up early more often each month, ate more regularly in the school canteen, spent more time online, and participated in physical activity more often. The second group of students woke up more often each month, ate irregularly in the school canteen, spent the longest time online, and participated in very little physical exercise. The third group of students woke up early more often each month, but ate regularly in the school canteen, spent more time online, and participated in physical activity less often.

A comparison of the data on the number of times students ate regularly and woke up early is shown in Figure 4.

A comparison of the data information on the number of hours spent online and the number of physical activity sessions for these three groups of students is shown in Figure 5.

3.1.3. Clustering Analysis of Students’ Learning Behaviour

K-means algorithm cluster analysis was conducted on four indicators of students’ learning behaviour: class attendance, library borrowing, number of visits to the library, and length of study. The percentage of students in each category of clusters and the mean values of the indicators are shown in Table 4.

Table 4 shows that students in category 1 had the highest class attendance, borrowed fewer books, visited the library the most, and spent the most time studying. Category 2 students had the lowest class attendance, borrowed the fewest books, visited the library the least, and spent the shortest amount of time studying. Category 3 students had the highest attendance rate, borrowed the most books, visited the library more often, and studied for longer periods of time. The fourth group of students had an average attendance rate, borrowed fewer books, visited the library more often, and spent less time studying. The comparisons are shown in Figure 6.

3.2. Student Group Characteristics and Management Suggestions

Through the cluster analysis of the above three categories of students’ different behavioural characteristics, we believe that the behavioural characteristics of different groups of students can be used as the basis for implementing personalized management for students in the management of university students and put forward suggestions for implementing personalized management, which are summarized in Tables 57. The following are some of the recommendations. Firstly, in view of the wide variation in student spending in the school canteen, it is important to pay particular attention to students who spend too much and too little and to develop a policy of poverty assistance for students in financial difficulty. Secondly, in view of the fact that students generally spend a lot of time on the Internet during the school year, administrators should have a clear understanding that, in the Internet era, although it is more convenient to obtain information from the Internet than from other means and students are becoming more and more dependent on the Internet, a reasonable code of conduct for students on the Internet should be formulated and the management and supervision of students on the Internet should be strengthened in view of students’ health. Thirdly, in response to the situation that students do not invest enough in their studies, visit the library less often, and borrow fewer books, comprehensive measures to motivate university students to study should be formulated, such as setting up scholarships for excellence and launching reading day activities.

4. Apriori Algorithm Analysis

4.1. Analysis of the Association between Students’ School Behaviour and Academic Performance

The above clustering analysis has classified students’ school behaviour into three categories: consumption behaviour, work and rest behaviour, and study behaviour. In order to further study the relationship between students’ behavioural characteristics and their academic performance and to find out whether there is an inevitable connection between students’ behavioural characteristics and their academic performance, the Apriori algorithm was chosen to conduct the correlation analysis and to explore the hidden correlations and patterns from the big data.

Before performing the Apriori algorithm correlation analysis, the student recruitment indicator data and the academic achievement indicator data were used as data sources. A new data stream is created in SPSS Modeler software, the Apriori algorithm model is constructed, and the relevant parameters are set. In the Apriori algorithm model, the five types of student fee behaviour, the three types of resting behaviour, the four types of learning behaviour, and the three types of student academic achievement were set as the antecedent and postecedent variables of the association rules. A support level of 10% and a confidence level of 80% were set for the association rule analysis, and a total of 24 association rules were obtained. In accordance with the objectives of this study, the association rules with an elevation greater than 1 were selected by eliminating and merging redundant rules, with the post-item being the association rule for student academic achievement, as shown in Table 8.

As can be seen from Table 8, the first rule reflects that 10.265% of the student population is characterised by not waking up early, eating irregularly, spending a lot of time online, and exercising less often, and 82.423% of these students are likely to be academically low achievers. The rule support was 8.461%, indicating that 8.461% of students in this type of behaviour were type 2 and had poor academic performance. The second rule reflects that 11.841% of the student population is characterised by low monthly consumption, low peak consumption, low consumption, and often waking up early, eating irregularly, spending more time online, and exercising less often, and that 83.246% of students in this category are likely to have average academic performance. The rule support of 9.857% indicates that 9.857% of students in this category have a consumption behaviour type of 5, a resting behaviour type of 3, and average academic performance. The third rule reflects that 53.156% of students are characterised as early risers, irregular eaters, spending more time on the Internet, and exercising less often, and 84.068% of students in this group are likely to be average achievers. The rule support of 44.687% indicates that 44.687% of students in this type of behaviour are of type 3 and have average academic performance.

The fourth rule reflects that 23.683% of students with average attendance, infrequent visits to the library, low book borrowing, short study time, early rising, regular eating, long Internet access, and regular exercise were 80.366% likely to be of average academic achievement. In contrast, the rule support of 19.033% indicates that 19.033% of students in this category have a study behaviour type of 4, a rest and relaxation behaviour type of 1, and have average academic performance. The fifth rule reflects that 10.308% of the student population is characterised by waking up early, eating more regularly, spending more time online, exercising regularly and being present in class more often, going to the library more often, borrowing more books, and studying for longer periods of time, and that 81.343% of students in this category are likely to be doing well academically. In contrast, the rule support of 8.385% indicates that 8.385% of students in this category have a type 1 resting behaviour and a type 3 study behaviour and are doing well academically. A detailed comparison of the data from the analysis of the association rules is shown in Figure 7.

The use of big data technology to classify students into different behavioural characteristics can facilitate administrators to carry out targeted interventions and counselling for different types of student groups, which is conducive to personalized management of university students and further improves the efficiency of student management in universities; it can remind administrators to focus especially on those groups of students whose academic performance is poor due to high probability events and develop corresponding student management systems. The study also found that early morning exercise is a good way to improve academic performance. The study also found that regular early exercise, regular meals, and book borrowing contributed to improved academic performance. On the contrary, students who lacked good work and rest behaviour were more likely to have poor academic performance.

4.2. Innovation of Big Data Technology in the Management of University Students

Through the K-means clustering analysis and Apriori correlation analysis, the clustering results of students’ school behaviours and the correlation rules between students’ school behaviours and their academic performance were obtained. This not only correlates students’ daily school behaviours with their academic performance but also distinguishes different groups of students with different characteristics. Based on the results of clustering analysis of student behaviour data, universities can develop differentiated management measures for different types of student groups to further improve student management efficiency and contribute to the innovation of student management in universities.(1)The use of big data technology can transform the student management mode from unified management to personalized management.In the context of the current high-quality development of higher education, the traditional management mode of college students can no longer meet the needs of the times. In the background of information technology, the smart campus construction of universities is changing day by day, the ways for students to obtain knowledge and information have become diversified, and they receive more and more information from the outside world every day. Through big data technology, universities can grasp the behavioural habits of students in daily consumption, work and rest, study, etc., understand the personalized needs of different student groups, and formulate corresponding management measures, so that the student management mode can change from uniform management to personalized management. This will change the management mode of students from unified management to personalized management.(2)The use of big data technology can transform student management from passive management to active management.As the cradle of talent training, universities need to keep pace with the development of society. The university’s one card system, digital campus system, student teaching system, book lending system, sunshine attendance system, and other applications provide rich data resources for school management. By mining these data, we can obtain information on the learning and work habits of students across the university, thus changing the way student management works from trying to solve problems after they arise to taking the initiative to identify problems and solve potential problems immediately or set up preventions in advance. For example, in student management, individual students have abnormal behavioural characteristics but are not detected, often causing consequences before they are discovered, while the use of big data prediction type algorithm can predict and detect abnormal behaviour of students in time and arrange early intervention by tutors, thus transforming student management from passive management to active management.(3)The use of big data technology can make the evaluation of student and student management performance shift from a single to a diversified approach.At present, the evaluation of student and student management performance is mainly focused on academic performance ranking, which is one-sided in some ways. For one thing, the design of evaluation methods and indicators is not very reasonable; for another, there is a serious information mismatch between evaluators and students, and it is impossible to grasp students’ daily performance and effort level in school systematically through effective technical means. The use of big data technology can collect data on the behaviour of university students in their school life and study and can analyse the behaviour of students in classes, grades, and even the whole university, so as to comprehensively and systematically judge the state of students’ school life and study as well as the effectiveness of student management work, thus shifting the evaluation of students and student management work performance from singularity to diversity.

5. Conclusion

In view of the current team structure of student management in universities, it is difficult to fill up a faculty with the knowledge background of big data technology and the laws of student management in a short period of time, so it is necessary to give full play to the role of big data technology in student management. In this paper, we selected various types of behavioural data from the campus information system of four undergraduate students of Wuhan University Class 2021 for one academic year and used cluster analysis to classify students’ behavioural characteristics in school and also to analyse the association between students’ behavioural characteristics in school and their academic performance. The results of the experiment found that there is a close relationship between the consumption behaviour, work and rest behaviour, study behaviour, and academic performance of different student groups, which provides a basis for schools to adopt differentiated management measures for different types of students, and on this basis, suggestions are given for improving student management methods by using big data technology in the context of smart campus construction. Universities should make full use of the achievements of education informatization construction and process the large amount of data information obtained through big data technology and informatization means, so as to provide rich data support for school decision making and development.

Data Availability

The datasets used during the current study are available from the corresponding author on reasonable request.

Conflicts of Interest

The author declares that there are no conflicts of interest.

Acknowledgments

This study was supported by the Youth Project of Liaoning Provincial Department of Education (project no. lj2017qw001) in 2017.