Abstract

Reading and writing are the foundations of English learning as well as an important method of instruction. With the advancement of network technology and the onset of the information age, an increasing number of students have lost interest in traditional English reading and writing instruction in the classroom. Flipped classrooms have emerged as a result of this situation and have become the focus of research in one fell swoop. As a result, flipped classroom research at home and abroad has primarily focused on the theory and practical application of flipped classrooms, and flipped classroom application practice is primarily based on the overall classroom, with few separate discussions on the effects of flipped classroom students’ self-learning. As a result, we developed a recurrent neural network-based intelligent assisted learning algorithm for English flipped classrooms. There are two main characteristics of the model. First, it is a gated recurrent unit based on a variant structure of the recurrent neural network. The double-gating mechanism fully considers the context and selects memory through weight assignment, and on this basis, it integrates the novel LeakyReLU function to improve the model’s training convergence efficiency. Second, by overcoming time-consuming problems in the medium, the adoption of the connection sequence classification algorithm eliminates the need for prior alignment of speech and text data, resulting in a direct boost in model training speed. The experimental results show that in the English flipped classroom’s intelligent learning mode, students explore and discover knowledge independently, their enthusiasm and interest in learning are greatly increased, and the flipped classroom’s teaching effect is greatly improved.

1. Introduction

Education experts advocate a new teaching model in the context of the new curriculum reform [13], in which students are the main body and teachers are the leaders. This concept, however, cannot be fully reflected in classroom teaching in specific practical situations. The flipped classroom [46] came into being under this situation, which is in line with the requirements of contemporary educational concepts. Flipped classroom advocates autonomous learning. In flipped classroom teaching, teachers should not only teach students professional knowledge but also provide personalized guidance in the classroom to teach students the ability to learn. This is not only the requirement of the times but also the direction of the future development of national education.

Secondly, the current situation of literacy class teaching is unsatisfactory, which leads to worrying students’ literacy skills [7, 8]. First, in the process of English reading and writing teaching [9], teachers and students lack sufficient attention to reading ability, writing ability, and communication ability. Second, the atmosphere in the teaching of English reading and writing is tense and fast. The English books that students can access every day are extremely limited. In addition to English textbooks, there are volumes of exercises and sets of exercises. As a result, the current stage of English reading and writing course teaching content has significant limitations and has failed to effectively and successfully broaden the scope of knowledge [10]. Many English reading and writing teachers do not know how to organize classroom activities, and some teachers do know how to organize classroom activities, but the effect is clearly difficult to achieve expectations. To carry out classroom activities in English reading and writing classes, it is necessary to combine the flipped classroom teaching mode [1113].

The ability of reading and writing is one of the main abilities of English learning, which occupies a vital position. For many years, the traditional teaching of English reading and writing has been based on students’ passive acceptance, which has led to many students’ boredom of reading and writing, resulting in the decline of students’ learning confidence [5, 14], which makes it difficult for students to improve their English scores. Many scholars are studying how to change this situation. In English reading and writing teaching, using modern information technology as an auxiliary teaching means, improve students' learning confidence and interest, via class activities, read presses to write, to write gong of reading, speaking, reading, and writing as a whole, guide students to learn actively.

However, the speech signal of English reading aloud [15] has its complexity. It needs to take into account the students’ different accents, volume, speed, pauses, and other factors. The input data has specificity due to different users, making the input of speech data more complicated and more complicated. Diversification adds a lot of difficulty to the recognition and training process of the model. According to the different types of speech data processed, it can be divided into isolated word recognition [16, 17], small vocabulary continuous speech recognition [18, 19], and large vocabulary continuous speech recognition [20, 21]. This article mainly focuses on the research of medium and long sentences with large vocabulary. In the structure of long sentences with large vocabulary, the context contains greater relevance. This relevance has a great effect on the learning process of the model. Many training models often ignore such relevance and only learn the current state. A single mapping relationship leads to low efficiency. Combining the characteristics of recurrent neural networks in machine learning [22], this article believes that assigning a certain “memory” weight to the relevant information of the context will help the judgment of the current output and more accurately convert the time sequence information of speech into the corresponding correct characters. This article will focus on the research and analysis of the variant structure GRU under the recurrent neural network branch [23, 24], make some improvements to its internal calculation structure, and combine it with a popular time series classification method [25, 26] to reduce the error rate in recognition results.

The main innovations of this paper are as follows: (1)I built an intelligent assisted learning algorithm for English flipped classroom based on recurrent neural network, and the experimental results proved that in the intelligent learning mode of English flipped classroom, students autonomously explore and discover knowledge, and their learning enthusiasm and interest are greatly enhanced, flipping the classroom. The teaching effect has also been greatly improved(2)To solve the existing problems in the field of English speech recognition, a hybrid cyclic neural network model incorporating LeakyReLU functions is proposed in this paper. The characteristics of the model are mainly based on a variant structure gated loop unit of the neural network. The relevance of the context is fully considered through the double-gated mechanism, selective memory is carried out through weight assignment, and the joining time sequence classification algorithm is introduced to improve the speed of model training

The remainder of this paper is organized as follows: Section 2 analyzes some related work. Some details of the principles of the proposed algorithm and related submodules are introduced in Section 3. Section 4 provides details of experimental results. In Section 5, the conclusion based on this study is given.

Robert modelized the flipped classroom teaching process and clearly divided flipped classrooms. Specifically, it can be divided into two stages: before class and during class. The preclass mainly includes students watching instructional videos for target teaching and completing well-designed preclass exercises [2729]; the inclass stage includes a small amount of teacher evaluation, student problem-solving, problem feedback, and evaluation of four steps. Through the analysis of the current popular flipped classroom teaching model, it can be found that most of the flipped classroom teaching process models built at home and abroad are based on Robert’s structural model diagram as the main prototype and then combine different learning theories to target specific teaching situations. Make changes and start research. Aaron Sams believes that teachers can not only use the excellent teaching videos of other teachers but also use the Camtasia Studio screen recording software and other screen recording tools to make their own teaching videos, but they should ensure that the videos are short, vivid, and clear. With the deepening of flipped classroom research and practice, educators believe that to a large extent, the learning effect of students’ preclass teaching videos determines the effect of classroom teaching. As a result, more focus is placed on assessment, as well as teachers and students, in order to communicate with preclass video learning impacts. Chung et al. used the “first guiding principle” of meta-design theory to design a flipped classroom method and conducted a two-stage study in two middle schools. Participants had a total of 382 students and 5 teachers, who came from four subject areas, namely, mathematics, physics, Chinese, and information and communication technology (ICT). Based on the experience of the pilot study, the author improved the flipped classroom model and tested its effect through the quasiexperimental design in the main study. The results found that the learning outcomes of the students in the flipped ICT course were similar to those of the nonflip ICT courses, but the student performance levels of the other three courses (namely, mathematics, physics, and Chinese) improved after the flip, with small- and medium-sized effects. Finally, some practical suggestions are put forward in the article, and ideas for further research are put forward.

In addition, Sarah et al. look at effective classroom activities in English as a foreign language (EFL) and propose a student response system (SRS) to support teachers in organizing classroom activities in flipped classes. In order to investigate the effectiveness of this method, a quasiexperiment was conducted in an EFL classroom of an engineering school. The experimental group used SRS for classroom activities, while the control group followed the usual method. The results show that the use of SRS can improve students’ motivation and self-efficacy in learning English grammar and improve their participation in classroom activities during flipped learning. In addition, the results of the questionnaire survey show that students accept SRS as a teaching method in EFL flipped classroom. Hernan et al. [30] described a teaching intervention that included a range of creative activities aimed at improving students’ oral and writing English, particularly those who showed interest or attention deficits. At first, the participants did not appear to be particularly interested in learning the language. In the end, they were more willing and motivated to participate in chain games, creative writing, and screenwriting exercises after hearing about the proposed approach. These activities have aided students in improving their spoken and written English fluency as well as their comprehension of English grammar and structure.

3. Methodology

3.1. Definition of Flipped Classroom

The so-called flipped classroom is relative to the current teaching method of teacher explanations, students listening to lectures, and student homework after class; it refers to the use of information technology to facilitate teachers to record the explanations of knowledge points into short and succinct teaching methods. The video, accompanied by other learning materials and advanced homework, is sent to students through the learning management platform. Under the guidance of the teacher, the students will conduct self-study and complete the advanced homework; based on the information on the learning platform, the teacher is grasping the learning situation in detail. Under the circumstance, the classroom will focus on the targeted explanation, solve the problems with the students, and complete the homework. In the flipped classroom teaching model, the acquisition of knowledge mainly relies on students’ self-study before class and no longer takes up students’ valuable classroom time for teachers to teach. Students can use a variety of channels: online discussions with other students, watching related video materials, reading required image materials, etc. In the time in the classroom, students can learn more actively based on the information obtained before class to gain a deeper understanding. Teachers can also make full use of the remaining spare time to communicate with everyone. After class teaching, students plan their own learning content, learning pace, and learning style and adopt appropriate knowledge presentation methods; teachers use cooperation and teaching methods to meet the needs of students, promote students’ personalized inquiry learning, and let students get a real learning experience in self-inquiry.

The essence of the flipped classroom is to return the dominance of learning to the students, lead the students’ subjective teaching methods, and have the energy to create education for the future. And flipped classroom as a kind of teaching concept and teaching mode is influencing and changing the traditional classroom teaching. It makes use of Internet technology and information technology to break down traditional classroom barriers, expand classroom teaching time and space, optimize students' learning processes, improve students' learning abilities, and realize in-depth integration of information technology and curriculum teaching, all of which promote students' deep learning. At the same time, the flipped classroom model, as a part of the education reform movement, will completely subvert the traditional printing-based classroom teaching structure and teaching process and trigger a series of changes in the role of teachers, curriculum models, and management models.

Autonomous learning, in contrast to traditional acceptance learning, places a greater emphasis on students’ ability to learn independently. As the subject of learning, students achieve their own learning goals through independent analysis, exploration, practice, and creation. We should change the current situation of curriculum implementation that emphasizes too much on learning, rote memorization, and mechanical learning, advocate students’ active participation, willing to explore and diligent in doing, and cultivate students’ ability to collect and process information, acquire new knowledge, analyze and solve problems, and communicate and cooperate. The concept of autonomous learning is based on inquiry-based learning, in which students are presented with situations in which they must conduct their own research, solve problems, and gain knowledge in the field. As a form of discovery learning, students in the classroom are provided with more opportunities to “experience and interact” with knowledge. It is promising to use the current popular neural network technology [3133] to assist English learning.

3.2. English Speech Recognition
3.2.1. Language Model

There are often multiple combinations of output sequences that can meet the original signal input conditions in the speech recognition process. An external tool is required to assist the machine in constraining the discriminative category of words when judging the output, resulting in a more logical collocation situation. The role of the language model is to summarize the most probable text sequence on the output of the acoustic model, just like a dictionary. In the method of processing continuous speech recognition with large vocabulary, the commonly used language model is the -gram language model, which is characterized by using the relationship between adjacent words in the context to calculate the current maximum probability output. The -gram model is based on the assumption that the current output is only related to the previous words and has no relationship with other words. Assuming a word sequence is , then its probability calculation process is as follows: where is the number of words in the sequence and is the -th word in the sequence .

There are certain restrictions on the selection of the value in -gram. Generally, 1~5 is more appropriate. If the value is too large, it will cause data sparseness. The more commonly used are binary language model (bi-gram) and the ternary language model (tri-gram). Among them, bi-gram only considers the impact of its previous word on the current judgment, and its probability calculation equation is as follows: where can be calculated by maximum likelihood estimation: where is the frequency of occurrence of the phrase in the transcription text set and is the frequency of occurrence of the word . Similarly, tri-gram considers the influence of the first two words on the current judgment.

3.2.2. Pronunciation Dictionary

The pronunciation dictionary keeps track of the various characters to which each pronunciation can respond. It is generally identified by various linguistics official records and can be used directly in recognition methods to aid learning. In Chinese, the corresponding relationship is pinyin and Chinese characters. For example, when learning a Chinese character, we know how to pronounce it, but we do not know which character it corresponds to. We need to use tools like “Xinhua Dictionary” to assist in learning. Similarly, in the process of voice recognition, it also needs the participation of such a tool. Following the previous example, in the pinyin part, “ta” in the pronunciation dictionary can correspond to the Chinese characters “it,” “he,” “she,” etc., which compresses the recognition results within certain constraints. The model is then used to match the most likely characters using the process of learning context information. The function of the pronunciation dictionary, from this perspective, is to match the phoneme information from the acoustic model with the word information from the language model. It is a highly effective tool for acoustic recognition.

3.3. Our Model
3.3.1. Gated Recurrent Unit

The intuitive improvement of gated recurrent unit is to merge the three gate units in the LSTM network into two gate units, which is more concise and refined in structure and is a good deformed structure. As a modified structure of LSTM, LSTM can solve the long dependency problem in RNN, as can GRU. On this basis, GRU also reduces the number of parameters, avoiding the problem of slower training speed due to more and more parameters in the longer-term training and learning of LSTM.

Figure 1 is the unit state of the previous unit, is the input of the current unit, and they are collectively used as the calculation input of the current unit, and carries the information passed by the previous unit selection. GRU turns the input gate, forget gate, and output gate in LSTM into two gates: update gate and reset gate . Among them, the update gate is mainly responsible for judging the degree to which the transfer information of the previous unit at the previous moment is calculated into the current state. The larger the value of the update gate, the deeper the state information previously transferred is calculated into the current state. The reset gate is mainly responsible for controlling how much information of the previous unit state is written to the candidate set of the current unit. The smaller the reset gate, the less information of the previous state is written. Finally, the unit state and output are combined into a state , which makes the model have fewer parameters and higher processing efficiency.

The place where GRU is more efficient than LSTM is that it uses one step to complete the two actions of selective memory and selective forgetting.

3.3.2. Connectionist Temporal Classification

The connectionist temporal classification (CTC) was first proposed by Graves et al., as a temporal classification algorithm. Mainly used for the alignment of speech signals and text labels, CTC can help save the time cost of human alignment and improve training efficiency. Traditionally used cross entropy loss requires that the tags of the training data be aligned at the frame level before processing the speech sequence data. The alignment operation requires a certain amount of work, and the model needs to know the corresponding tags of each frame before training, which reduces work efficiency. In the process of training, CTC can automatically optimize the mapping relationship between the input sequence and the output sequence for direct training, which greatly improves the speed of decoding.

We use CTC to process the GRU output information. The output length of GRU is the same as the input length. With voice signal inputs, there are probability vector outputs, including the probability of each character corresponding to the dictionary. When the voice signal is input, where is the output with the highest conditional probability and is the posterior probability of . During training, I hope to obtain its maximum value, that is, the most likely output. In recognition processing, the nearest similar output can be found more quickly.

As shown in Figures 2 and 3, the CTC algorithm does not require the input and output to be strictly aligned. Multiple output paths may correspond to one output result. Understanding the corresponding relationship between input and output can help us better understand the calculation method of loss function and the calculation method used in the test. The most prominent feature of CTC is the introduction of a placeholder blank node, mainly to model the parts without effective information, such as silence and pause, to represent the output state of the network when predicting uncertain information. After the alignment of the blank, it needs to be deleted. This process is called transformation. If an output sequence of the network can be mapped to the correct annotated sequence through transformation, then the output sequence is a CTC path. The process of transformation is as follows: first, delete the repeated annotations between adjacent blank nodes in the sequence and then delete the blank nodes.

3.3.3. LeakyReLU Improved GRU

The calculation equation of LeakyReLU function is as follows: where is the fixed value of the slope selected before training, with the purpose of making the nonpositive part have a small slope. Compared with the training characteristics of sigmoid and tanh function, it is more consistent with the convergence effect that the training wants to achieve, so that the training process of the model can be stable and efficient.

4. Experiments

4.1. Parameter Settings

The experiments with all the algorithms were performed on a computer equipped with a single NVIDIA GTX1080TI GPU (11 GB). We have implemented the model construction through the PyTorch deep learning library, the programming language we use is Python, and we batch processed 100 samples each time.

4.2. Datasets

The THCHS30 database can be used to build a comprehensive Chinese speech recognition benchmark system, including performance under high noise conditions. The voice data from the dataset was recorded by 40 people in a quiet office environment through a carbon particle microphone. The total time is 35 hours. Most of those involved in the recording were college students who spoke Mandarin. The recorded content text is from large-capacity news, the sampling frequency is 16 kHz, the sampling size is 16 bits, and the format is WAV. The composition of the dataset is shown in Table 1.

4.3. Evaluation Methods

The evaluation methods I use are character error rate, sentence error rate, and word error rate. The calculation equations are as follows: where is the number of replacement characters, is the number of deleted characters, is the number of inserted characters, and is the number of characters. is the number of replacement words, is the number of deleted words, is the number of inserted words, and is the number of words.

4.4. Experimental Results

To test the sexual superiority of GRU in relation to phonological recognition patterns in other languages, LSTM-HMM, LSTM-CTC, and GRU-CTC were selected as the models for the comparative experiment. Through the comparison of CER and SER of the processing results of the same data set of each model in the experimental environment, the smaller the value, the better the performance in the experiment.

From the data in Table 2, the recognition effect of the hybrid model with CTC is lower than the CER and SER of the traditional HMM hybrid model. This set of comparison results shows that the introduction of the CTC method can help the performance of the model to a certain extent. The recognition effect has been improved to a certain extent, mainly due to the fact that the CTC method can directly model the entire sentence when the input and output are not aligned, which improves the completeness and relevance of the entire sentence recognition. The CER and SER of the GRU-CTC model, like the previous control group LSTM-CTC, combined with the CTC approach, are the best among the three deep learning-based models selected in the experiment and LSTM. CER and SER are both reduced as compared to CTC. Combining the above results and the characteristics of CTC and GRU, it can be concluded that CTC, as an efficient method that does not require manual label alignment, and the combination of LSTM, GRU, and other cyclic neural network variants have achieved considerable results. The improved double-door variant structure is more reasonable in the deletion and optimization of the structure, and the control of the relevance of the sequence context information is more efficient, and the recognition performance of the model has been improved. In summary, the hybrid cyclic neural model of GRU and CTC is a relatively efficient method of speech recognition.

4.5. Ablation Experiment of Activation Function

In order to further verify the role played by the activation function in the proposed algorithm, we added an ablation experiment. We selected LeakyReLU, ReLU, and ELU, which are the three activation functions for experiments. The experimental results are shown in Table 3.

It can be clearly seen from Table 3 that after using the three activation functions, respectively, the performance of LeakyReLU is the best, and secondly, the performance of ReLU is better than that of ELU. Therefore, it is reasonable and effective to choose the LeakyReLU activation function in this paper.

5. Conclusion

In this paper, we propose a recurrent neural network-based intelligent assisted learning algorithm for creating an English flipped classroom. There are two main characteristics of this model. For starters, it is a gated recurrent unit based on a recurrent neural network’s variant structure. On this basis, the dual-gating mechanism fully considers the context’s relevance, selects memory through weight allocation, and integrates the novel LeakyReLU function, which can improve the model’s training convergence efficiency. Second, the connection sequence classification algorithm can be used to solve the general model. The time-consuming problem of processing input and output alignment is moderate, and model training speed has increased. The experimental results show that in the English flipped classroom’s intelligent learning mode, students explore and discover knowledge independently, and their enthusiasm and interest in learning are greatly enhanced, as is the flipped classroom’s teaching effect.

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The author does not have any possible conflicts of interest.