当前位置: X-MOL 学术Stat › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
VtNet: A neural network with variable importance assessment
Stat ( IF 1.7 ) Pub Date : 2020-10-30 , DOI: 10.1002/sta4.325
Lixiang Zhang 1 , Lin Lin 1 , Jia Li 1
Affiliation  

The architectures of many neural networks rely heavily on the underlying grid associated with the variables, for instance, the lattice of pixels in an image. For general biomedical data without a grid structure, the multi‐layer perceptron (MLP) and deep belief network (DBN) are often used. However, in these networks, variables are treated homogeneously in the sense of network structure; and it is difficult to assess their individual importance. In this paper, we propose a novel neural network called Variable‐block tree Net (VtNet) whose architecture is determined by an underlying tree with each node corresponding to a subset of variables. The tree is learned from the data to best capture the causal relationships among the variables. VtNet contains a long short‐term memory (LSTM)‐like cell for every tree node. The input and forget gates of each cell control the information flow through the node, and they are used to define a significance score for the variables. To validate the defined significance score, VtNet is trained using smaller trees with variables of low scores removed. Hypothesis tests are conducted to show that variables of higher scores influence classification more strongly. Comparison is made with the variable importance score defined in Random Forest from the aspect of variable selection. Our experiments demonstrate that VtNet is highly competitive in classification accuracy and can often improve accuracy by removing variables with low significance scores.

中文翻译:

VtNet:具有可变重要性评估的神经网络

许多神经网络的体系结构严重依赖于与变量关联的基础网格,例如,图像中的像素网格。对于没有网格结构的一般生物医学数据,通常使用多层感知器(MLP)和深度置信网络(DBN)。但是,在这些网络中,从网络结构的意义上说,变量是同质的。而且很难评估他们的个人重要性。在本文中,我们提出了一种称为可变块树网(VtNet)的新型神经网络,其结构由基础树确定,每个节点对应于变量的子集。从数据中学习树,以最好地捕获变量之间的因果关系。VtNet对于每个树节点都包含一个长短期记忆(LSTM)状的单元。每个单元的输入门和忘记门控制通过节点的信息流,它们用于定义变量的显着性得分。为了验证定义的显着性得分,使用较小的树(去除了低得分的变量)训练VtNet。进行假设检验表明,得分较高的变量对分类的影响更大。从变量选择的角度,与随机森林中定义的变量重要性评分进行比较。我们的实验表明,VtNet在分类准确性方面具有很高的竞争力,并且通常可以通过除去具有低显着性得分的变量来提高准确性。VtNet使用较小的树进行训练,并去除了低分的变量。进行假设检验表明,得分较高的变量对分类的影响更大。从变量选择的角度,与随机森林中定义的变量重要性评分进行比较。我们的实验表明,VtNet在分类准确性方面具有很高的竞争力,并且通常可以通过除去具有低显着性得分的变量来提高准确性。VtNet使用较小的树进行训练,并去除了低分的变量。进行假设检验表明,得分较高的变量对分类的影响更大。从变量选择的角度,与随机森林中定义的变量重要性评分进行比较。我们的实验表明,VtNet在分类准确性方面具有很高的竞争力,并且通常可以通过除去具有低显着性得分的变量来提高准确性。
更新日期:2020-10-30
down
wechat
bug