HeTROPY: Explainable learning diagnostics via heterogeneous maximum-entropy and multi-spatial knowledge representation

doi:10.1016/j.knosys.2020.106389

Knowledge-Based Systems

Volume 207, 5 November 2020, 106389

https://doi.org/10.1016/j.knosys.2020.106389 Get rights and content

Highlights

•
Autonomously diagnosing learning problems in e-learning systems can be challenging due to the lack of teacher resources (teachers) in e-learning. This work proposes to tackle the explainable learning diagnostics problem using attention-based explanation mechanisms by performing target-source relation prediction. The findings are adoptable to various types of e-learning systems to gain insights into their learner states and diagnose their learning problems.
•
This paper identifies the importance to use the ‘close relatives of knowledge to generate prediction decisions in a knowledge tracing model. We propose a Heterogeneous Attention Relative Detector and a Maximum Entropy Regularizer to detect those relatives, i.e. knowledge relation discovery.
•
We propose a multi-spatial representation of knowledge in expressing knowledge relations in finer-granularity and low-dimensionality, which can be readily generalized to other data-driven educational tasks.
•
We provide a different perspective on knowledge graph completion and/or construction. In this perspective, we exploit the interaction data to uncover inner knowledge relations (links) where there is no existing relation data to learn. Our method is effective in settings where the knowledge space is relatively small while the interaction space is very large.

Abstract

Autonomous learning diagnostics, where the students’ strengths and weaknesses are disclosed from their observed performance data, is a challenging task in e-learning systems. Current student knowledge models can alleviate some of the problems in learning (i.e. predicting student performance) but they neglect learning diagnostics, which is based on causal reasoning. To this end, we propose a novel heterogeneous attention interpreter with a maximum entropy regularizer on top of a student knowledge model to achieve explainable learning diagnostics. Our model segregates the impact of the homogeneous knowledge points, while promoting the heterogeneous relatives by maximizing their chance to contribute to the prediction. We also propose a multi-spatial knowledge representation that is readily generalizable to other data-driven educational tasks. Extensive experiments on real-world datasets reveal that the proposed method is able to enhance the model’s explanatory power, hence increases the trustworthiness towards learning diagnostics. It also brings notable improvement in accuracy in the student performance prediction task. The findings in this paper are adoptable to various types of e-learning systems to assist teachers to gain insights into student learning states and diagnose learning problems.

Introduction

The proliferation of e-learning technologies, such as Massive Open On-line Courses (MOOCs) and Intelligent Tutoring Systems (ITS), has benefited many learners in terms of the space and time to learn. However, diagnosing learning problems in these systems can be challenging due to the fact that few human resources (teachers) can possibly be allocated to assist the learners during the e-learning process. The problem becomes tougher when numerous e-learners learn on personalized learning paths and pace. In traditional classrooms, it is the human teachers’ job to help diagnose the students’ learning problems. In this paper, we seek to use an AI teacher to autonomously diagnose student learning problems in e-learning systems through learning performance data.

Prior efforts have shown AI technologies are capable of tackling the learning assessment problem effectively. Knowledge Tracing (KT) is the task of estimating knowledge state through sequences of student problem solving attempts on the knowledge points (or skills)¹ [1], [2]. More recently, Recurrent Neural Networks (RNN)-based KT models have made profound progress in terms of assessment accuracy [3], [4], [5]. The state-of-the-art results in KT are achieved using RNNs [4]. However, little effort has been made to tackle the learning diagnostics challenge.

The act of diagnosis involves inspection and causal reasoning. For example, when a student makes a problem-solving mistake, learning diagnostics uncovers the reason behind the mistake with explanations to answer the question ‘why this mistake has occurred?’ Explainable learning diagnostics helps the student to learn because not only are decisions (predictions) made but we would also know how they are made. Specifically, it requires (1) Accurate decisions should be generated by the decision model; (2) The decisions should be interpretable and understood by humans; (3) Reliable explanations should enhance the trustworthiness of the model [6]. However, conventional KT methods are a black-box [7], generating unexplainable decisions that make the model unreliable. This flaw also raises critical safety concerns where false decisions impose negative impacts on student learning [8]. Explainable KT models are the key to solve the autonomous learning diagnostics problem and to alleviate the decision-safety concerns.

Beyond accuracy, explanations drive research into faithful models as their complexity increases, especially for models that are deep [9], [10]. By explanation we refer to the qualitative measurement of contributions of features in producing a model decision as defined by [10]. Various lines of methods have been devised to shed light on explaining the opaque models. Local proximity-based sampling and weighting strategy (LIME) [11], gradients Layer-wise Relevance Propagation [12], [13], rule-based anchor explanation [14], and extractive generator–encoder rationale selection [15] of short and sufficient explanatory segments are representative examples of several important perspectives of explanatory methods. Interestingly, although not initially aimed at explanations [16], attention-based methods that rely on saliency masks as explanations to highlight the parts of texts or images are often conceptualized as a way of opening up the black-boxes [17], [18]. Cumulative Learning Theory in cognitive science states the hierarchical building blocks of knowledge points disclose how learners apply previously acquired (subordinate) skills to solve subsequent problems [19]. This implies a chain of dependency relationship where the mastery/non-mastery of a skill is dependent on other related skills, which are further dependent on other skills. This dependency relationship can be back-traced until an initial point where any element on the trail may uncover the reason for the final decision — the explanations. These methods are limited in capturing such chain of dependency relationship that is inherently cumulative.

Additionally, successful explainable learning diagnostics drives more robust and accurate downstream applications, such as explainable recommendations [20], [21], [22]. As illustrated in Fig. 1, if a student fails to solve a ‘matrix–matrix multiplication’ problem, then conventional algorithms would recommend more ‘matrix–matrix multiplication’ problems (homogeneous recommendation) [5]. However, they do not further trace the latent heterogeneous knowledge of ‘vector inner product’ and ‘matrix–vector multiplication’ as the potential causal factors (they are prerequisites of ‘matrix–matrix multiplication’).

Therefore, we propose to tackle the learning diagnostics problem by ‘back-tracing’ a sequentially cumulative learning process to identify the salient sources as explanations to the current prediction (target).

Specifically, we generate an attention distribution by introducing a heterogeneity mechanism with a maximum entropy regularizer (HeTROPY). As contrasted in Figs. 2(a) and 2(b), when making a prediction decision, HeTROPY excludes the homogeneous knowledge points in the sequences, encouraging the model to attend to the heterogeneous and relevant ones. At the same time, the maximized attention entropy encourages the model to learn the multiple and multi-hop knowledge relations² to facilitate deeper diagnostics into the origin of the sequences. HeTROPY achieves learning diagnostics through modeling the problem as a non-standard knowledge relation prediction problem in knowledge graph embedding [23]. It also adds explanations to reveal why the model ‘thinks’ a student would make a correct/wrong attempt. For example, in predicting a mistake, HeTROPY learns to capture the relevant knowledge points that may be the origin of the mistake. These discovered explanations can be interpreted and verified by humans to enhance the reliability of the model.

We also construct a multi-spatial knowledge representation by factorizing the discrete knowledge-problem space and the student performance space into a single continuous vector space. The resulted representation has higher expressive power that enables HeTROPY to effectively compute the latent correlations between the target and source knowledge points during training, which acts as a key booster to knowledge relevance computation and prediction performance.

In summary, the main contributions of this paper include:

1.
To the best of our knowledge, this work is the first to tackle the explainable learning diagnostics problem using explanation mechanisms by performing target–source relation prediction. HeTROPY segregates the impact of the homogeneous elements, while promoting the heterogeneous relatives by maximizing their possibility in knowledge relation discovery.
2.
We propose a canonical multi-spatial representation of knowledge in expressing knowledge relations in finer-granularity and low dimensionality, which can be readily generalized to other data-driven educational tasks.
3.
We provide a different perspective on knowledge graph completion and/or construction. In this perspective, we exploit the interaction data to uncover inner knowledge relations (links) where there is no existing relation data to learn. Our method is effective in settings where the knowledge space is relatively small while the interaction space is very large.

Section snippets

Related work

Attentive recurrent neural models relying on saliency masks to visualize explanations are related to some important recent works [16], [18]. Examples in this domain include reasoning towards textual entailment using a word-by-word attention pairs [24], [25], intermediate layer visualization on saliency in the natural language inference task [26], and providing explanations to a list of common NLP tasks [27]. Despite its popularity, several recent works point out standard attention mechanisms do

HeTROPY for explainable learning diagnostics

In this section, we formally define the problem and introduce HeTROPY in detail. Our goal is two-fold: (1) to estimate $S$ students’ proficiency on $K$ knowledge points; and (2) to solve the explainable learning diagnostics problem.

Experiments

In this section, we evaluate the model on the explainable learning diagnostics task and also its prediction performance measured in accuracy. We are particularly interested in the interpretability and explanation aspect of the proposed model, which is closely related to explainable learning diagnostics.

Discussion

The trade-off between the model interpretability and accuracy is worth discussing. Jointly learning towards an overall optimization goal between the two is a possible alleviation, which could be achieved using additional constraints or attention layer. To define this optimization problem, a quantitative measurement on the interpretability power needs to be established (i.e. measure the attention weights against labeled relation scores). Then the optimization point on the trade-off function can

Conclusion

In this paper, we presented HeTROPY that segregates the impact of the homogeneous elements while promoting the heterogeneous ‘close relatives.’ We also employ a maximum entropy regularizer that encourages a uniform weight distribution to detect multiple and multi-hop relations between knowledge points in a sequential model. A continuous low-dimensional multi-spatial representation of knowledge is also presented to show its effectiveness in context embedding and superiority over complex feature

CRediT authorship contribution statement

Yujia Huo: Conceptualization, Methodology, Validation, Writing - original draft. Derek F. Wong: Validation, Writing - review & editing, Formal analysis, Funding acquisition. Lionel M. Ni: Supervision, Funding acquisition. Lidia S. Chao: Visualization, Investigation, Project administration. Jing Zhang: Resources, Data curation.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

The authors wish to acknowledge the support of the National Natural Science Foundation of China (Grant No. 61672555), the Science and Technology Development Fund, Macau SAR (Grant Nos. 045/2017/AFJ, 0101/2019/A2), the Multi-year Research Grant from the University of Macau (Grant No. MYRG2017-00087-FST), the Ministry of Education Key Laboratory of China (Grant No. 2018-477), the Educational Development Fund of Guizhou (Grant No. 2017520072), the Guizhou Key Laboratory of Big Data Statistics

References (55)

ZhangK. et al.
A three learning states Bayesian knowledge tracing model
Knowl. Based Syst.
(2018)
HuoY. et al.
Knowledge modeling via contextualized representations for LSTM-based personalized exercise recommendation
Inform. Sci.
(2020)
MontavonG. et al.
Methods for interpreting and understanding deep neural networks
Digit. Signal Process.
(2018)
WanS. et al.
A learner oriented learning recommendation approach based on mixed concept mapping and immune algorithm
Knowl. Based Syst.
(2016)
ZhuH. et al.
A multi-constraint learning path recommendation algorithm based on knowledge map
Knowl. Based Syst.
(2018)
WongD.F. et al.
Bilingual recursive neural network based data selection for statistical machine translation
Knowl. Based Syst.
(2016)
ChaoG. et al.
Consensus and complementarity based maximum entropy discrimination for multi-view classification
Inform. Sci.
(2016)
CorbettA.T. et al.
Knowledge tracing: Modeling the acquisition of procedural knowledge
User Model. User-Adapt. Interact.
(1994)
PiechC. et al.
Deep knowledge tracing
SuY. et al.
Exercise-enhanced sequential modeling for student performance prediction

RossA.S. et al.

Right for the right reasons: training differentiable models by constraining their explanations

GuidottiR. et al.

A survey of methods for explaining black box models

ACM Comput. Surv.

(2019)

MalikA. et al.

Generative grading: neural approximate parsing for automated student feedback

(2019)

LiptonZ.C.

The mythos of model interpretability

Commun. ACM

(2018)

RibeiroM.T. et al.

”why should I trust you?”: Explaining the predictions of any classifier

SebastianB. et al.

On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation

Plos One

(2015)

EitelF. et al.

Uncovering convolutional neural network decisions for diagnosing multiple sclerosis on conventional MRI using layer-wise relevance propagation

(2019)

RibeiroM.T. et al.

Anchors: high-precision model-agnostic explanations

LeiT. et al.

Rationalizing neural predictions

BahdanauD. et al.

Neural machine translation by jointly learning to align and translate

LuongT. et al.

Effective approaches to attention-based neural machine translation

XuK. et al.

Show, attend and tell: Neural image caption generation with visual attention

GagneR.M.

Contributions of learning to human development

Psychol. Rev.

(1968)

ChenX. et al.

Dynamic explainable recommendation based on neural attentive models

WangX. et al.

Explainable reasoning over knowledge graphs for recommendation

NathaniD. et al.

Learning attention-based embeddings for relation prediction in knowledge graphs

T. Rocktäschel, E. Grefenstette, K.M. Hermann, T. Kociský, P. Blunsom, Reasoning about entailment with neural...

Cited by (11)

Exploiting multiple question factors for knowledge tracing
2023, Expert Systems with Applications
Knowledge Tracing (KT) aims to predict future students’ performance via their responses to a sequence of questions, which serves as a fundamental task for intelligent education. Most of the existing efforts directly predict students’ performance depending on their dynamically changing knowledge states. However, the individualization of questions is neglected and difficulty level differ from question to question, which would give some valuable clues to KT. Towards this end, in this paper, we propose a novel Multiple Question Factors for Knowledge Tracing (MQFKT) method, which fully exploits various question factors to generate better prediction. On one hand, calibrated student-concept connection space is established to obtain fine-grained response representations on questions according to the information of responses on questions. On the other hand, individualized difficulty levels with particular concept for different questions are introduced for improving the prediction performance. Extensive experiments on three datasets have shown that the MQFKT approach achieves more precise prediction of student performance and better interpretation of the model.
Tracking knowledge proficiency of students with calibrated Q-matrix
2022, Expert Systems with Applications
Citation Excerpt :
Last, we develop the CQKT framework which integrates the QKT model with the aforementioned Q-matrix calibration methods to conquer those problems defined in 3.1. Following existing works (e.g. Huo et al., 2020a; Huo et al., 2020b), where relevant KCs are used for representing exercises, we develop a method called QKT which describes exercises by replacing all exercises’ one-hot representation vectors with KC-based representation vectors. One reason for this shift of interest is that more explainable results of KS of students are what we exactly expect for.
With the emergence of intelligent educational systems, numerous research works are dedicated to Knowledge Tracing (KT), which refers to the issue of diagnosing students’ changing knowledge proficiency in exercises. Recent developments in KT have yielded immense success on this task and they mainly use sophisticated and flexible deep neural network-based models to fully exploit the interaction information between students and response logs. However, these models either ignore the significance of Q-matrix associated exercises with knowledge concepts (KCs) or fail to avoid the subjective tendency of experts within the Q-matrix. To tackle these problems, in this paper, we devise a novel Calibrated Q-matrix-based Knowledge Tracing (CQKT) framework to track knowledge proficiency of students dynamically in KT. To be specific, for the original Q-matrix, we primarily strive to capture the high-order connectivity between exercises and KCs to obtain potential KCs of each exercise by utilizing graph convolution network. Then, three Q-matrix calibration methods based on a pairwise Bayesian treatment equipped with potential KCs are adopted to refine and calibrate the raw Q-matrix so that the subjective tendency of the Q-matrix defined by domain experts can be weakened. After that, the embedding of each exercise aggregated the calibrated Q-matrix with historical student interactions is injected into the Long Short-Term Memory (LSTM) network to trace students’ knowledge states. Extensive experiments are conducted on three real-world benchmark datasets and the results show the superiority of CQKT. In particular, we demonstrate its practicability via applying it to three fundamental educational tasks, including score prediction, knowledge state estimation, and diagnosis result visualization.
HackRL: Reinforcement learning with hierarchical attention for cross-graph knowledge fusion and collaborative reasoning
2021, Knowledge-Based Systems
Citation Excerpt :
In recent years, with the developments of big data [1,2] and natural language processing, a large number of KGs, including YAGO [3], DBpedia [4], and Freebase [5], have been constructed, which contain millions of facts about real-world entities and relations, e.g., (Joe Biden, president of, USA). Due to the explainable characteristics of KGs [6], they have been widely used in various tasks, such as information retrieval [7], question answering [8,9], and recommender systems [10–12]. Original KGs are usually constructed with data collected from the Internet or gathered manually; consequently, KGs always encounter the knowledge incompleteness issue even though they are large in size.
Reasoning aiming at inferring implicit facts over knowledge graphs (KGs) is a critical and fundamental task for various intelligent knowledge-based services. With multiple distributed and complementary KGs, the effective and efficient capture and fusion of knowledge from different KGs is becoming an increasingly important topic, which has not been well studied. To fill this gap, we propose to explore cross-KG relation paths with the anchor links identified by entity alignment for the knowledge fusion and collaborative reasoning of multiple KGs. To address the heterogeneity of different KGs, this paper proposes a novel reasoning model named HackRL based on the reinforcement learning framework, which incorporates the long short-term memory and hierarchical graph attention in the policy network to infer indicative cross-KG relation paths from the history trajectory and the heterogeneous environment for predicting corresponding relations. Meanwhile, an entity alignment-oriented representation learning method is utilized to embed different KGs into a unified vector space based on the anchor links to reduce the impact of distinct vector spaces, and two training mechanisms, action mask and retrain with sampled paths, are proposed to optimize the training process to learn more successful indicative paths. The proposed HackRL is validated on three cross-lingual datasets built from DBpedia on the link prediction and fact prediction tasks. Experimental results demonstrate that HackRL achieves better performance on most tasks than existing methods. This work provides an industrially-applicable framework for fusing distributed KGs to make better decisions.
Learning cognitive embedding using signed knowledge interaction graph
2021, Knowledge-Based Systems
Measuring learner cognition based on their problem-solving performance is a joint discipline of cognitive psychology and machine learning. In the case of learner problem-solving, the interaction between learner and knowledge forms a typical type of signed interaction graph. Interaction graphs are a widely used and effective solution to model the relationships between interacting entities. However, most of previous interaction graph methods are inclined to the observed interactions as positive links but they often fail to consider unobserved and negative links, which leads to an insufficiency in capturing the complete cognition/mis-cognition proximity information. To address this limitation, we propose a knowledge graph representation learning method that is based on signed knowledge interaction network (SKIN). We explicitly model the correct/incorrect cognitive performance as the positively $(+)$ /negatively $(-)$ signed links in the graph. The model simultaneously measures the nodes’ local and global proximity , and then preserves them in the learned knowledge embedding. We architect a pairwise neural network that is based on a tri-sampling strategy and a sign-driven distance measuring objective function. The network generates knowledge representations by maximizing the knowledge distance between oppositely-signed pairs and minimizing the distance between identically-signed pairs. Our experimental results show the learned knowledge embedding demonstrates a desired Euclidean property and can be visualized with clear classification boundaries. We also show it can power downstream tasks such as learner-performance-prediction. The learned embeddings generate promising prediction scores on this task when compared to several methods in network sign prediction and learner-performance-prediction.
Explainable Goal-driven Agents and Robots - A Comprehensive Review
2023, ACM Computing Surveys
Calibrated Q-Matrix-Enhanced Deep Knowledge Tracing with Relational Attention Mechanism
2023, Applied Sciences (Switzerland)

View all citing articles on Scopus

View full text

HeTROPY: Explainable learning diagnostics via heterogeneous maximum-entropy and multi-spatial knowledge representation

Highlights

Abstract

Introduction

Section snippets

Related work

HeTROPY for explainable learning diagnostics

Experiments

Discussion

Conclusion

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgments

Knowl. Based Syst.

Inform. Sci.

Digit. Signal Process.

Knowl. Based Syst.

Knowl. Based Syst.

Knowl. Based Syst.

Inform. Sci.

Knowledge tracing: Modeling the acquisition of procedural knowledge

User Model. User-Adapt. Interact.

Deep knowledge tracing

Exercise-enhanced sequential modeling for student performance prediction

Right for the right reasons: training differentiable models by constraining their explanations

A survey of methods for explaining black box models

ACM Comput. Surv.

Generative grading: neural approximate parsing for automated student feedback

The mythos of model interpretability

Commun. ACM

”why should I trust you?”: Explaining the predictions of any classifier

On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation

Plos One

Uncovering convolutional neural network decisions for diagnosing multiple sclerosis on conventional MRI using layer-wise relevance propagation

Anchors: high-precision model-agnostic explanations

Rationalizing neural predictions

Neural machine translation by jointly learning to align and translate

Effective approaches to attention-based neural machine translation

Show, attend and tell: Neural image caption generation with visual attention

Contributions of learning to human development

Psychol. Rev.

Dynamic explainable recommendation based on neural attentive models

Explainable reasoning over knowledge graphs for recommendation

Learning attention-based embeddings for relation prediction in knowledge graphs