当前位置: X-MOL 学术arXiv.cs.AI › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Increasing the Inference and Learning Speed of Tsetlin Machines with Clause Indexing
arXiv - CS - Artificial Intelligence Pub Date : 2020-04-07 , DOI: arxiv-2004.03188
Saeed Rahimi Gorji, Ole-Christoffer Granmo, Sondre Glimsdal, Jonathan Edwards, Morten Goodwin

The Tsetlin Machine (TM) is a machine learning algorithm founded on the classical Tsetlin Automaton (TA) and game theory. It further leverages frequent pattern mining and resource allocation principles to extract common patterns in the data, rather than relying on minimizing output error, which is prone to overfitting. Unlike the intertwined nature of pattern representation in neural networks, a TM decomposes problems into self-contained patterns, represented as conjunctive clauses. The clause outputs, in turn, are combined into a classification decision through summation and thresholding, akin to a logistic regression function, however, with binary weights and a unit step output function. In this paper, we exploit this hierarchical structure by introducing a novel algorithm that avoids evaluating the clauses exhaustively. Instead we use a simple look-up table that indexes the clauses on the features that falsify them. In this manner, we can quickly evaluate a large number of clauses through falsification, simply by iterating through the features and using the look-up table to eliminate those clauses that are falsified. The look-up table is further structured so that it facilitates constant time updating, thus supporting use also during learning. We report up to 15 times faster classification and three times faster learning on MNIST and Fashion-MNIST image classification, and IMDb sentiment analysis.

中文翻译:

使用子句索引提高 Tsetlin 机器的推理和学习速度

Tsetlin Machine (TM) 是一种基于经典 Tsetlin Automaton (TA) 和博弈论的机器学习算法。它进一步利用频繁模式挖掘和资源分配原则来提取数据中的常见模式,而不是依赖于最小化输出错误,这很容易导致过度拟合。与神经网络中模式表示的交织性质不同,TM 将问题分解为独立的模式,表示为连接从句。反过来,子句输出通过求和和阈值处理组合成分类决策,类似于逻辑回归函数,但具有二进制权重和单位步长输出函数。在本文中,我们通过引入一种避免详尽评估条款的新算法来利用这种层次结构。相反,我们使用一个简单的查找表来索引关于伪造它们的特征的子句。通过这种方式,我们可以通过伪造快速评估大量子句,只需遍历特征并使用查找表来消除那些被伪造的子句。查找表进一步结构化,以便于不断更新时间,从而支持在学习期间使用。我们报告说,在 MNIST 和 Fashion-MNIST 图像分类以及 IMDb 情感分析上,分类速度提高了 15 倍,学习速度提高了 3 倍。简单地通过迭代特征并使用查找表来消除那些被伪造的子句。查找表进一步结构化,以便于不断更新时间,从而支持在学习期间使用。我们报告说,在 MNIST 和 Fashion-MNIST 图像分类以及 IMDb 情感分析上,分类速度提高了 15 倍,学习速度提高了 3 倍。简单地通过迭代特征并使用查找表来消除那些被伪造的子句。查找表进一步结构化,以便于不断更新时间,从而支持在学习期间使用。我们报告说,在 MNIST 和 Fashion-MNIST 图像分类以及 IMDb 情感分析上,分类速度提高了 15 倍,学习速度提高了 3 倍。
更新日期:2020-04-08
down
wechat
bug