当前位置: X-MOL 学术IEEE J. Sel. Top. Signal Process. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Transitive Entropy - A Rank Ordered Approach for Natural Sequences
IEEE Journal of Selected Topics in Signal Processing ( IF 7.5 ) Pub Date : 2020-02-01 , DOI: 10.1109/jstsp.2019.2939998
Andrew D. Back , Daniel Angus , Janet Wiles

Information theoretic entropy measures are calculated from estimates of the probabilities of the constituent symbolic events. In natural sequences, such as those occurring in human language, the probabilistic structure typically follows a rank ordering pattern. Entropy has been used to model language by using large data sets to characterize the underlying source. To model the dynamic characteristics of language, methods such as N-gram entropy are normally used which rely on a very large database from which the statistical samples can be obtained. However it is of interest to apply entropy based methods for classifying natural sequence behavior, such as characterizing changes in language due to diseases such as dementia, using only a small number of samples. There are multiple problems with this approach however: the number of samples required and in addition, we show that degenerative solutions can occur when using current entropy measures which can render them less suitable for classifying disorders in natural language which may result in rank reordering variations of the feature space probability distributions. We propose a new probabilistic measure, termed Transitive Entropy which overcomes this problem. We examine the properties of the proposed entropy measure and demonstrate its effectiveness on successfully classifying patient dementia by application to a probabilistic model of pause length in their speech.

中文翻译:

传递熵 - 自然序列的排序方法

信息论熵度量是根据对组成符号事件的概率的估计来计算的。在自然序列中,例如出现在人类语言中的序列,概率结构通常遵循等级排序模式。通过使用大型数据集来表征底层源,熵已被用于对语言进行建模。为了对语言的动态特性进行建模,通常使用诸如 N-gram 熵之类的方法,这些方法依赖于可以从中获得统计样本的非常大的数据库。然而,应用基于熵的方法对自然序列行为进行分类是很有趣的,例如仅使用少量样本来表征由痴呆等疾病引起的语言变化。但是,这种方法存在多个问题:所需的样本数量,此外,我们表明,当使用当前的熵度量时可能会出现退化解决方案,这会使它们不太适合对自然语言中的障碍进行分类,这可能导致特征空间概率分布的排序重新排序变化。我们提出了一种新的概率度量,称为传递熵,它克服了这个问题。我们检查了所提出的熵度量的属性,并通过应用到他们讲话中停顿长度的概率模型来证明其在成功对患者痴呆进行分类方面的有效性。我们表明,当使用当前的熵度量时可能会出现退化解决方案,这会使它们不太适合对自然语言中的障碍进行分类,这可能导致特征空间概率分布的等级重新排序变化。我们提出了一种新的概率度量,称为传递熵,它克服了这个问题。我们检查了所提出的熵度量的属性,并通过应用到他们讲话中停顿长度的概率模型来证明其在成功对患者痴呆进行分类方面的有效性。我们表明,当使用当前的熵度量时可能会出现退化解决方案,这会使它们不太适合对自然语言中的障碍进行分类,这可能导致特征空间概率分布的等级重新排序变化。我们提出了一种新的概率度量,称为传递熵,它克服了这个问题。我们检查了所提出的熵度量的属性,并通过应用到他们讲话中停顿长度的概率模型来证明其在成功对患者痴呆进行分类方面的有效性。
更新日期:2020-02-01
down
wechat
bug