Full length article
Early prediction of undergraduate Student's academic performance in completely online learning: A five-year study

https://doi.org/10.1016/j.chb.2020.106595Get rights and content

Highlights

  • Prediction of academic performance of 802 undergraduate students in a completely online learning.

  • Exploratory factor analysis, multiple regression model and clustering are utilized to predict the academic performance.

  • The prediction model is mainly based on variables of interaction data in Moodle.

  • Age has been identified as a factor that is inversely proportional to the academic performance.

Abstract

This decade, e-learning systems provide more interactivity to instructors and students than traditional systems and make possible a completely online (CO) education. However, instructors could not warn if a CO student is engaged or not in the course, and they could not predict his or her academic performance in courses. This work provides a collection of models (exploratory factor analysis, multiple linear regressions, cluster analysis, and correlation) to early predict the academic performance of students. These models are constructed using Moodle interaction data, characteristics, and grades of 802 undergraduate students from a CO university. The models result indicated that the major contribution to the prediction of the academic student performance is made by four factors: Access, Questionnaire, Task, and Age. Access factor is composed by variables related to accesses of students in Moodle, including visits to forums and glossaries. Questionnaire factor summarizes variables related to visits and attempts in questionnaires. Task factor is composed of variables related to consulted and submitted tasks. The Age factor contains the student age. Also, it is remarkable that Age was identified as a negative predictor of the performance of students, indicating that the student performance is inversely proportional to age. In addition, cluster analysis found five groups and sustained that number of interactions with Moodle are closely related to performance of students.

Section snippets

Author contribution

Javier Bravo-Agapito, Conceptualization, Software, Investigation, Writing - original draft. Sonia Janeth. Romero, Methodology, Formal analysis, Data curation, Visualization, Writing - original draft. Sonia Pamplona, Writing - original draft, Validation, Resources.

Theory

Several works have been done recently to predict academic performance based on LMS data. One challenge that it is noted is the difficulty of finding a set of variables that can consistently predict student performance across multiple courses (Conijn, Snijders, Kleingeld, & Matzat, 2017). One of the reasons for this difficulty is that instructional conditions could influence the predictions of academic success based on log files of LMS (Gašević, Dawson, Rogers, & Gasevic, 2016). The students may

Sample

The sample was composed by 802 students: 377 females and 425 males. They were all students of UDIMA in Spain. Data of students' interaction with the LMS were collected from four courses in the academic year 2012–2013. In addition, to perform the early prediction, longitudinal data of academic achievement was gathered during the years 2013–2014, 2014–2015, 2015–2016, and 2016–2017. The courses selected were: Knowledge Management (N = 151, which is 18.8% of the sample), General Sociology

Distributions

As can be seen in Table 2, the independent variables exhibit great dispersion, positive skewness and they are leptokurtic (except total of assignments).

Correlations

Table 3 shows almost consistently significant correlations. For that reason, and also due to the skewed and leptokurtic form of the distributions presented in Table 2 we decide to perform an EFA. This EFA checks in advance whether some of the variables extracted from the log files could be better represented in a series of combined factors, more

Discussion

The results obtained provide information to meet the objectives outlined in the introduction of the present paper. On the one hand, we have found a group of variables that allows predicting the academic performance of a sample of undergraduate students using data collected from an LMS during an academic semester (G1 and G2). These variables may be considered as EWI in order to carry out preventive support measures. On the other hand, we analyzed the relationship between variables and developed

Conclusions

This work proposed a collection of models that could be useful to consistently predict the academic performance of students at the end of a degree. These models utilized variables of two data sources: LMS interaction data of students and institutional data that included information of student enrollment, age and sex of students, and GPA of each academic year from 2012 to 2017. The models presented in this work make an early prediction using LMS students’ interaction data of the first semester

References (29)

  • A. Sandoval et al.

    Centralized student performance prediction in large courses based on low-cost variables in an institutional context

    The Internet and Higher Education

    (2018)
  • J.W. You

    Identifying significant indicators using LMS data to predict course achievement in online learning

    The Internet and Higher Education

    (2016)
  • F. Chen et al.

    Using handheld devices for tests in classes. CMU-CS-00-152

    (2000)
  • R. Conijn et al.

    Predicting student performance from LMS data: A comparison of 17 blended courses using Moodle LMS

    IEEE Transactions on Learning Technologies

    (2017)
  • Cited by (40)

    View all citing articles on Scopus
    View full text