HeTROPY: Explainable learning diagnostics via heterogeneous maximum-entropy and multi-spatial knowledge representation
Introduction
The proliferation of e-learning technologies, such as Massive Open On-line Courses (MOOCs) and Intelligent Tutoring Systems (ITS), has benefited many learners in terms of the space and time to learn. However, diagnosing learning problems in these systems can be challenging due to the fact that few human resources (teachers) can possibly be allocated to assist the learners during the e-learning process. The problem becomes tougher when numerous e-learners learn on personalized learning paths and pace. In traditional classrooms, it is the human teachers’ job to help diagnose the students’ learning problems. In this paper, we seek to use an AI teacher to autonomously diagnose student learning problems in e-learning systems through learning performance data.
Prior efforts have shown AI technologies are capable of tackling the learning assessment problem effectively. Knowledge Tracing (KT) is the task of estimating knowledge state through sequences of student problem solving attempts on the knowledge points (or skills)1 [1], [2]. More recently, Recurrent Neural Networks (RNN)-based KT models have made profound progress in terms of assessment accuracy [3], [4], [5]. The state-of-the-art results in KT are achieved using RNNs [4]. However, little effort has been made to tackle the learning diagnostics challenge.
The act of diagnosis involves inspection and causal reasoning. For example, when a student makes a problem-solving mistake, learning diagnostics uncovers the reason behind the mistake with explanations to answer the question ‘why this mistake has occurred?’ Explainable learning diagnostics helps the student to learn because not only are decisions (predictions) made but we would also know how they are made. Specifically, it requires (1) Accurate decisions should be generated by the decision model; (2) The decisions should be interpretable and understood by humans; (3) Reliable explanations should enhance the trustworthiness of the model [6]. However, conventional KT methods are a black-box [7], generating unexplainable decisions that make the model unreliable. This flaw also raises critical safety concerns where false decisions impose negative impacts on student learning [8]. Explainable KT models are the key to solve the autonomous learning diagnostics problem and to alleviate the decision-safety concerns.
Beyond accuracy, explanations drive research into faithful models as their complexity increases, especially for models that are deep [9], [10]. By explanation we refer to the qualitative measurement of contributions of features in producing a model decision as defined by [10]. Various lines of methods have been devised to shed light on explaining the opaque models. Local proximity-based sampling and weighting strategy (LIME) [11], gradients Layer-wise Relevance Propagation [12], [13], rule-based anchor explanation [14], and extractive generator–encoder rationale selection [15] of short and sufficient explanatory segments are representative examples of several important perspectives of explanatory methods. Interestingly, although not initially aimed at explanations [16], attention-based methods that rely on saliency masks as explanations to highlight the parts of texts or images are often conceptualized as a way of opening up the black-boxes [17], [18]. Cumulative Learning Theory in cognitive science states the hierarchical building blocks of knowledge points disclose how learners apply previously acquired (subordinate) skills to solve subsequent problems [19]. This implies a chain of dependency relationship where the mastery/non-mastery of a skill is dependent on other related skills, which are further dependent on other skills. This dependency relationship can be back-traced until an initial point where any element on the trail may uncover the reason for the final decision — the explanations. These methods are limited in capturing such chain of dependency relationship that is inherently cumulative.
Additionally, successful explainable learning diagnostics drives more robust and accurate downstream applications, such as explainable recommendations [20], [21], [22]. As illustrated in Fig. 1, if a student fails to solve a ‘matrix–matrix multiplication’ problem, then conventional algorithms would recommend more ‘matrix–matrix multiplication’ problems (homogeneous recommendation) [5]. However, they do not further trace the latent heterogeneous knowledge of ‘vector inner product’ and ‘matrix–vector multiplication’ as the potential causal factors (they are prerequisites of ‘matrix–matrix multiplication’).
Therefore, we propose to tackle the learning diagnostics problem by ‘back-tracing’ a sequentially cumulative learning process to identify the salient sources as explanations to the current prediction (target).
Specifically, we generate an attention distribution by introducing a heterogeneity mechanism with a maximum entropy regularizer (HeTROPY). As contrasted in Figs. 2(a) and 2(b), when making a prediction decision, HeTROPY excludes the homogeneous knowledge points in the sequences, encouraging the model to attend to the heterogeneous and relevant ones. At the same time, the maximized attention entropy encourages the model to learn the multiple and multi-hop knowledge relations2 to facilitate deeper diagnostics into the origin of the sequences. HeTROPY achieves learning diagnostics through modeling the problem as a non-standard knowledge relation prediction problem in knowledge graph embedding [23]. It also adds explanations to reveal why the model ‘thinks’ a student would make a correct/wrong attempt. For example, in predicting a mistake, HeTROPY learns to capture the relevant knowledge points that may be the origin of the mistake. These discovered explanations can be interpreted and verified by humans to enhance the reliability of the model.
We also construct a multi-spatial knowledge representation by factorizing the discrete knowledge-problem space and the student performance space into a single continuous vector space. The resulted representation has higher expressive power that enables HeTROPY to effectively compute the latent correlations between the target and source knowledge points during training, which acts as a key booster to knowledge relevance computation and prediction performance.
In summary, the main contributions of this paper include:
- 1.
To the best of our knowledge, this work is the first to tackle the explainable learning diagnostics problem using explanation mechanisms by performing target–source relation prediction. HeTROPY segregates the impact of the homogeneous elements, while promoting the heterogeneous relatives by maximizing their possibility in knowledge relation discovery.
- 2.
We propose a canonical multi-spatial representation of knowledge in expressing knowledge relations in finer-granularity and low dimensionality, which can be readily generalized to other data-driven educational tasks.
- 3.
We provide a different perspective on knowledge graph completion and/or construction. In this perspective, we exploit the interaction data to uncover inner knowledge relations (links) where there is no existing relation data to learn. Our method is effective in settings where the knowledge space is relatively small while the interaction space is very large.
Section snippets
Related work
Attentive recurrent neural models relying on saliency masks to visualize explanations are related to some important recent works [16], [18]. Examples in this domain include reasoning towards textual entailment using a word-by-word attention pairs [24], [25], intermediate layer visualization on saliency in the natural language inference task [26], and providing explanations to a list of common NLP tasks [27]. Despite its popularity, several recent works point out standard attention mechanisms do
HeTROPY for explainable learning diagnostics
In this section, we formally define the problem and introduce HeTROPY in detail. Our goal is two-fold: (1) to estimate students’ proficiency on knowledge points; and (2) to solve the explainable learning diagnostics problem.
Experiments
In this section, we evaluate the model on the explainable learning diagnostics task and also its prediction performance measured in accuracy. We are particularly interested in the interpretability and explanation aspect of the proposed model, which is closely related to explainable learning diagnostics.
Discussion
The trade-off between the model interpretability and accuracy is worth discussing. Jointly learning towards an overall optimization goal between the two is a possible alleviation, which could be achieved using additional constraints or attention layer. To define this optimization problem, a quantitative measurement on the interpretability power needs to be established (i.e. measure the attention weights against labeled relation scores). Then the optimization point on the trade-off function can
Conclusion
In this paper, we presented HeTROPY that segregates the impact of the homogeneous elements while promoting the heterogeneous ‘close relatives.’ We also employ a maximum entropy regularizer that encourages a uniform weight distribution to detect multiple and multi-hop relations between knowledge points in a sequential model. A continuous low-dimensional multi-spatial representation of knowledge is also presented to show its effectiveness in context embedding and superiority over complex feature
CRediT authorship contribution statement
Yujia Huo: Conceptualization, Methodology, Validation, Writing - original draft. Derek F. Wong: Validation, Writing - review & editing, Formal analysis, Funding acquisition. Lionel M. Ni: Supervision, Funding acquisition. Lidia S. Chao: Visualization, Investigation, Project administration. Jing Zhang: Resources, Data curation.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
The authors wish to acknowledge the support of the National Natural Science Foundation of China (Grant No. 61672555), the Science and Technology Development Fund, Macau SAR (Grant Nos. 045/2017/AFJ, 0101/2019/A2), the Multi-year Research Grant from the University of Macau (Grant No. MYRG2017-00087-FST), the Ministry of Education Key Laboratory of China (Grant No. 2018-477), the Educational Development Fund of Guizhou (Grant No. 2017520072), the Guizhou Key Laboratory of Big Data Statistics
References (55)
- et al.
A three learning states Bayesian knowledge tracing model
Knowl. Based Syst.
(2018) - et al.
Knowledge modeling via contextualized representations for LSTM-based personalized exercise recommendation
Inform. Sci.
(2020) - et al.
Methods for interpreting and understanding deep neural networks
Digit. Signal Process.
(2018) - et al.
A learner oriented learning recommendation approach based on mixed concept mapping and immune algorithm
Knowl. Based Syst.
(2016) - et al.
A multi-constraint learning path recommendation algorithm based on knowledge map
Knowl. Based Syst.
(2018) - et al.
Bilingual recursive neural network based data selection for statistical machine translation
Knowl. Based Syst.
(2016) - et al.
Consensus and complementarity based maximum entropy discrimination for multi-view classification
Inform. Sci.
(2016) - et al.
Knowledge tracing: Modeling the acquisition of procedural knowledge
User Model. User-Adapt. Interact.
(1994) - et al.
Deep knowledge tracing
- et al.
Exercise-enhanced sequential modeling for student performance prediction
Right for the right reasons: training differentiable models by constraining their explanations
A survey of methods for explaining black box models
ACM Comput. Surv.
Generative grading: neural approximate parsing for automated student feedback
The mythos of model interpretability
Commun. ACM
”why should I trust you?”: Explaining the predictions of any classifier
On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation
Plos One
Uncovering convolutional neural network decisions for diagnosing multiple sclerosis on conventional MRI using layer-wise relevance propagation
Anchors: high-precision model-agnostic explanations
Rationalizing neural predictions
Neural machine translation by jointly learning to align and translate
Effective approaches to attention-based neural machine translation
Show, attend and tell: Neural image caption generation with visual attention
Contributions of learning to human development
Psychol. Rev.
Dynamic explainable recommendation based on neural attentive models
Explainable reasoning over knowledge graphs for recommendation
Learning attention-based embeddings for relation prediction in knowledge graphs
Cited by (11)
Exploiting multiple question factors for knowledge tracing
2023, Expert Systems with ApplicationsTracking knowledge proficiency of students with calibrated Q-matrix
2022, Expert Systems with ApplicationsCitation Excerpt :Last, we develop the CQKT framework which integrates the QKT model with the aforementioned Q-matrix calibration methods to conquer those problems defined in 3.1. Following existing works (e.g. Huo et al., 2020a; Huo et al., 2020b), where relevant KCs are used for representing exercises, we develop a method called QKT which describes exercises by replacing all exercises’ one-hot representation vectors with KC-based representation vectors. One reason for this shift of interest is that more explainable results of KS of students are what we exactly expect for.
HackRL: Reinforcement learning with hierarchical attention for cross-graph knowledge fusion and collaborative reasoning
2021, Knowledge-Based SystemsCitation Excerpt :In recent years, with the developments of big data [1,2] and natural language processing, a large number of KGs, including YAGO [3], DBpedia [4], and Freebase [5], have been constructed, which contain millions of facts about real-world entities and relations, e.g., (Joe Biden, president of, USA). Due to the explainable characteristics of KGs [6], they have been widely used in various tasks, such as information retrieval [7], question answering [8,9], and recommender systems [10–12]. Original KGs are usually constructed with data collected from the Internet or gathered manually; consequently, KGs always encounter the knowledge incompleteness issue even though they are large in size.
Learning cognitive embedding using signed knowledge interaction graph
2021, Knowledge-Based SystemsExplainable Goal-driven Agents and Robots - A Comprehensive Review
2023, ACM Computing SurveysCalibrated Q-Matrix-Enhanced Deep Knowledge Tracing with Relational Attention Mechanism
2023, Applied Sciences (Switzerland)