Abstract

Network information technology and distance technology learning provide convenience for college students to learn online courses, but some problems have also been found in practice, and schools need to pay attention to improving students’ learning quality and supervision. The cross-spatial nature of the study can be used to study how to detect students’ learning fatigue and learning concentration in online classrooms. This paper first designs a lightweight convolutional neural network model for eye state classification and verifies the performance of the model. The designed model has a compact structure and a high recognition rate. Combined with the human eye positioning algorithm, the recognition of the opening and closing state of the eyes is realized. Finally, the feasibility of using the PERCLOS value for fatigue detection, Euler pitch angle, and yaw is verified by experiments. Corners can be used to detect student attention. The method can enhance the synergistic supervision of other cooperative methods, thus improving the quality and effectiveness of online learning for college students, promoting the development of digital modern teaching and learning management models, and exploring possible future technologies and corresponding changes in teaching methods and management models.

1. Introduction

Online courses have obvious advantages, such as online convenience, wide course audience, easy-to-use courses, and comprehensive and diverse course resources [1]. The demand for epidemic prevention and control in China has driven the demand and rapid development of online courses [2]. Chinese universities, European and American universities, and social institutions are all promoting the construction of online courses [3]. In recent years, typical resources of online open courses for college students organized or participated by the Chinese Ministry of Education include the National Higher Education Smart Education Platform [4, 5], according to the data released by the Chinese Ministry of Education [6, 7].

The so-called online learning, also known as network learning, is one of the learning methods that mainly establishes an education platform on the Internet and uses the Internet to freely choose the learning time, place, and learning method and finally completes certain learning tasks [8]. Online learning enables students to fully personalize their learning. In the traditional classroom environment, it is difficult to realize the idealized pursuit of individualized learning and individualized teaching [9]. With the help of the Internet, personalized learning may become a reality, making learning a process in which students get what they need and truly realize personalized learning [10]. Learners can freely choose suitable learning resources from the Internet and choose their own suitable learning methods [11].

Due to the lack of face-to-face real-time monitoring of students, many students are not adapted to the learning method of online courses, especially the self-control ability does not meet the learning requirements of online courses, resulting in poor performance of students due to low learning initiative and no monitoring situation [12]. Therefore, it is necessary to design a method to monitor the learning quality of middle school students’ online learning in order to promote the desired learning effect and quality [13].

When students log in to the online learning platform, they must first make a study plan according to their own situation and then choose the courses they want to study [14, 15]. The supervision process of students’ online learning is a process that starts from the student logging in to the learning platform and ends when the student exits the learning platform [1618]. How to effectively supervise the online learning of college students and improve their learning quality, the government, universities, teachers, students, and society all have great interests or responsibilities, which is also an important aspect of moral education teaching in Chinese universities [19]. Therefore, the technological novelty of online courses should focus on realizing and strengthening supervision and management through technological means [2024].

Since the 1990s, a large number of articles and papers on the integration of IT and the curriculum have been published in international conferences and in the relevant literature, but few have been published with any real theoretical depth. However, there are few studies with any real theoretical depth, and even fewer articles that can give a comprehensive and profound discussion of the three issues are mentioned above. There are two important works that can discuss the theory and methods of integrating information technology and curriculum in a more systematic and complete way from the above three aspects. Reference [8] is recognized by the International Education Community as the most authoritative and representative literature on the integration of information technology and curriculum in a more systematic and complete way. The three main integration models are described in detail in [10]. In order to help teachers solve the problem of how to effectively implement the integration of information technology and subject teaching, Wei et al. [11] put forward specific steps and methods for effective integration and provide clear answers to the three major problems faced by the theory of integration of information technology and curriculum (the goal of integration, the connotation of integration, and the method of integration). Although the results are not ideal, they provide a good reference for future scholars to explore the integration of information technology and curriculum.

Most scholars believe that the essence of the integration of information technology and the teaching of civic studies in universities is to change the traditional teaching of teacher-centred teaching mode, giving rise to new teaching concepts and generating new teaching modes, rather than simply the essence of the integration of information technology with the teaching of civics and political science which is to change the traditional teacher-centered teaching model [12], to give rise to new teaching concepts and new teaching modes, rather than simply overlaying information technology with civics and political science. Yang and Dazhi [14] proposed that in the teaching of ideological and political science classes, we cannot use the relationship between “objectives content” to deal with the “general approach,” but must consider the nature and characteristics of ideological and political theory classes and analyse in depth the relationship between multimedia teaching and multimedia classes. We must consider the nature and characteristics of ideological and political theory classes and analyse in depth the confused border between multimedia teaching and multimedia courseware and the theoretical and technical logic of the conflict between learning from technology and learning with technology. Cavanaugh et al. [15] specifically described several aspects, including the specific performance of application, the principles of application, and the ideas of application. They mentioned that it is necessary to fully follow the basic principles, fully clarify the ideas, change the ideology, fully apply information technology to ideological and political education, and enhance the education vitality.

In [16], it is necessary to take the cultivation of students’ core literacy as the core, guide ideological and political education workers to break out of the traditional thinking stereotypes, establish “big data thinking,” follow and grasp “the law of microcommunication,” innovative ideas, channels, and methods of publicity and education. Chandra et al. [17] said that it needs to be divided into clear targets and needs to be implemented in steps and stages. Secondly, it is crucial to choose the right information technology tools for teaching, to make appropriate use of multimedia courseware, to use motion pictures to raise questions at the right time, to provoke students to think, and to deepen their understanding of knowledge will receive twice the result with half the effort.

3. Learning Fatigue Detection Method

In this paper, we actively strengthen the supervision and management of online course learning by means of technology: machine learning, face recognition, etc. Specific supervision includes (1) supervision and management of learning behavior: whether students choose courses online, whether they are on duty, the effect of completing class assignments, random roll call in class, and detection to prevent negligence in learning. (2) Supervision and management of examination behavior: supervision of substitute examination, examination violations, and cheating. (3) Ensuring data security.

In order to use the data effectively, we have designed a data analysis scheme, which will be described in detail in this section.

This paper focuses on student fatigue detection based on video images, firstly, the face is extracted from the video image frame, and then the eye area is localized in the face area, and the implementation principles of face localization and human eye localization have been described in this paper. The flow of learning fatigue detection is shown in Figure 1.

In this paper, we choose the CEW dataset collected by the team of [6], which contains 2425 subjects. The human eye dataset was automatically extracted from the images by the face detector and eye localizer, respectively. The face images were resized to 100 × 100 pixels size, and the extracted eye images were uniformly resized to 24 × 24 pixels size.

3.1. Eye State Recognition

To perform eye state recognition, it is also necessary to locate the face region and then the eye region and then invoke the designed eye state classification model for eye state recognition, and the eye open and eye closed recognition algorithm is shown in Figure 2.

The eye state recognition algorithm performs the detection of open and closed eyes, and the final recognition results are shown in Figure 3. The eye state recognition algorithm proposed in this paper can better determine the open and closed state of students’ eyes in the online classroom.

The pose detection model is adapted to the data distribution of a particular deployment scenario by adapting the model parameters in two phases: offline learning and online learning, without adding additional network branches or changing the network structure. In the offline learning phase, the method trains the pose detection metamodel and adaptation optimizer parameters based on various classroom data; in the online learning phase, the method achieves fast domain adaptation of the pose detection model under small sample conditions by loading the metamodel as the initialization parameters of the pose detection model and using the adaptation optimizer to guide the pose detection model parameter training process. As shown in Figure 4, this section will focus on the offline learning phase and introduce the method in four aspects, namely, the selection of the basic framework of the pose detection model, the training method of the metamodel, the training and application of the domain adaptation optimizer, and the design of the external training optimizer in the two-layer training.

The multiscene student pose detection method using meta-learning is shown in Figure 4.

4. Basic Model Framework

The COCO dataset is the authoritative dataset in computer vision, which defines targets with pixel values smaller than 32 × 32 as small targets. The distance of the back row students from the surveillance camera in the teaching scenario is relatively far, resulting in the back row human targets on the surveillance images which are usually small and the postcategories are difficult to distinguish. In the multiscene student pose detection dataset used in this paper, the proportion of imaging targets categorized as small targets reaches 20%, which makes it accurate. This poses a difficulty for accurate detection. As shown in Figure 5, the target detection branch of Mask Region-CNN (Mask R-CNN) [8] is used as the framework of the postdetection model in this paper. The feature extraction network of Faster Region-CNN (Faster R-CNN) [10] is replaced by Residual Network 50 (ResNet-50) [11], and the feature pyramid network is added to address the problem of severe information loss of small targets on high-dimensional feature maps. Network (FPN) [13] is to predict human pose from different scales of feature maps. Experiments show that the target detection branch of Mask R-CNN has significantly improved the detection of small rear-row targets compared to Faster R-CNN.

The pose detection metamodel is an initial domain adaptation model designed for domain adaptation training for different teaching scenarios in the online learning phase. In this paper, the method for offline training of the pose detection metamodel is based on model-agnostic meta-learning (MAML) algorithm [14]. By combining the test loss of the model after training in various teaching scenarios, the model parameter gradients are guided to the positions most suitable for subsequent domain adaptation. As shown in Figure 6, the pose detection metamodel is not aimed at a specific teaching scenario, but aims at good domain adaptation for all types of teaching scenarios.

N single-scene pose detection datasets are constructed based on the training data of the multiscene pose detection dataset . Each single-scene dataset consists of a support set and a query set that are mutually exclusive with the distribution. Make pose detection metamodel parameters as initial values, then the optimization objective of the metamodel parameter on the N single-scene pose detection datasets is as follows:

In this case, the single scenario test loss function is defined as follows:

is the test loss of the pose detection model on the support set with pose detection model parameters. The test loss of the pose detection model Mask R-CNN consists of four components, including the positive and negative sample classification and border regression losses of the region proposal network (RPN) [10] in Figure 3 and the pose classification and candidate frame regression losses of the final model output.

The single-scene test loss in the paper requires that metamodel parameter be trained once on that scenario support set before being measured on query set . Training before testing the loss as opposed to testing directly on dataset means that the metamodel will focus more on the performance of the postdetection model after it has been trained for domain adaptation to a specific scene. This operation is essentially using the approximate second-order derivative of the model to optimize the subsequent gradient descent of the model parameters [15], resulting in a better performance of the metamodel after domain adaptation for a specific teaching scene.

5. Domain Adaptation Optimizer

The model parameters are usually difficult to converge to the ideal position when the model parameters are optimized directly by gradient descent under small sample conditions. Therefore, a parameter-learning domain adaptation optimizer is designed, and by embedding the training of the domain adaptation optimizer in the two-layer training mode of the MAML in the offline learning phase, the domain adaptation optimizer can provide guidance for the optimization direction and step size of each parameter of the metamodel in the subsequent domain adaptation process [16].

Figure 7 illustrates the two-layer training process after adding the domain adaptation optimizer. The single training step includes the following: first, n single-scene datasets are randomly selected; then, the detection model is trained internally on the support set of the ith dataset with the metamodel parameter as the initial value, and the updated model parameter is ; then, the model parameter is tested on the query set to obtain the internal training loss of the scene, and the internal training loss of the n scenes is averaged to obtain the multiscene training loss; finally, based on this multiscene training loss, the metamodel parameter and the domain adaptation optimizer weight parameter are averaged to obtain the multiscene training loss. Finally, based on this multi-scene training loss, the metamodel parameters and the weight parameters , of the domain adaptation optimizer can be externally trained.

Assuming that the total number of image frames per unit time is M, and the number of image frames in which the eyes are detected to be closed per unit time by some detection criterion (one of P80, P70, or EM) which is /m, the PERCLOS value is the ratio of the number of image frames in which the eyes are closed per unit time to the total number of image frames per unit time, as defined by the following equation:

Fatigue testing can be valuable in classroom learning. For example, some students may experience sleep and other behaviors as their fatigue level increases with the passage of class time. Figure 8 shows the algorithm for obtaining PERCLOS parameters, and the P80 standard is used for PERCLOS in this paper. Since the number of image frames captured by the camera is relatively fixed per unit of time, the time to capture 450 image frames is set to be defined as a unit of time.

It can be seen from the algorithm flow of the PERCLOS parameter acquisition designed in this paper that each captured image first determines whether it is a face image frame, thereby increasing the statistical PERCLOS value. When the captured image is equal to or greater than 450 frames, the system will automatically calculate the PERCLOS value in the current 450 frames and save it and clear the statistics of the captured image to 0, waiting for the next statistical cycle. After the class, if the image is smaller than 450 frames, it will also be discarded to ensure that the PERCLOS value statistics are fixed per unit of time. In this algorithm, the statistics are the total number of image frames and the number of open-eye image frames, and the number of closed-eye image frames is the total number of image frames minus the number of open-eye image frames.

6. Class Discussion and Q&A

As shown in Table 1, there are three types of data in the classroom discussion and Q&A: the target value set by the teacher, the statistics of learner participation, and the value of experience gained by the learners. Among them, the target values set by teachers include the total number of discussions and Q&A sessions and the total experience value of discussions and Q&A sessions; the learner participation statistics include the number of discussions and Q&A sessions, the total number of speeches, and the number of times they were liked by teachers; the experience value data include the experience value of being liked by teachers, the experience value of participation, and the total experience value.

Table 2 shows the statistics of the frequency and number of questions and answers. Most of the learners were able to participate in the interactive Q&A activities. It can be seen that the majority of the learners participated in the interactive Q&A activities under the premise that the instructor set up 4 discussions and Q&A sessions, while 30 students were more passive and did not actively participate. The number of students who participated in the highest percentage of students participated once and four times, so there is still much room for improvement in the motivation of learners.

Table 3 shows the distribution statistics of the number of Q&A sessions, which shows that there is a positive correlation between the number of Q&A sessions and the experience value obtained. Generally speaking, the more the number of times learners participated in Q&A, the higher the experience value they gained; on the contrary, the less the number of times learners participated in Q&A, the lower the experience value they gained. For example, the majority of learners who participated in only one Q&A session received an experience value of less than 5. Interestingly, there is always a small percentage of learners who receive an experience value that exceeds the average of their total experience for the number of Q&A sessions. For example, among the learners who participated in two discussions, one learner had an experience value greater than 10 and less than 15, and one had an experience value greater than 25, which is the same as the total experience value of the two learners who participated in four discussions and the six learners who participated in four discussions.

7. Conclusion

Combined with the human eye positioning algorithm, this paper realizes the recognition of the opening and closing state of the eyes and finally verifies the feasibility of the PERCLOS value in fatigue detection through experiments. The nonface-to-face and nonuniform space intersection of online courses can study how to detect students’ learning fatigue and learning attention in online classrooms. However, the shortcomings of this research are that it strengthens the collaborative supervision of other cooperation methods, thereby promoting the quality and effect of online learning for college students and promoting the development of digital modern teaching management models.

Data Availability

No data were used to support this study.

Conflicts of Interest

The authors declare that they have no conflicts of interest.