Gradient Descent Learning With Floats,IEEE Transactions on Cybernetics

当前位置： X-MOL 学术 › IEEE Trans. Cybern. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Gradient Descent Learning With Floats
IEEE Transactions on Cybernetics ( IF 11.8 ) Pub Date : 2020-06-11 , DOI: 10.1109/tcyb.2020.2997399
Tao Sun ₁ , Ke Tang ₂ , Dongsheng Li ₁

Affiliation

The gradient learning descent method is the main workhorse of training tasks in artificial intelligence and machine-learning research. Current theoretical studies of gradient descent only use the continuous domains, which is unreal since electronic computers use the float point numbers to store and deal with data. Although existing results are sufficient for the extremely tiny errors in high-precision machines, they need to be improved for low-precision cases. This article presents an understanding of the learning algorithm in computers with floats. The performances of three gradient descents with the floating domain are investigated when the objective function is smooth. When the function is assumed to have the PŁ condition, the convergence speed can be improved. We proved that for floating gradient descent to obtain an error with

$\epsilon $

, the iteration is

$O(1/\epsilon)$

for the general smooth case, and

$O(\ln (1/\epsilon))$

for the PŁ case. But

$\epsilon $

should be larger than the

$s$

-bit machine epsilon

$\delta (s)$

in the deterministic case, that is,

$\epsilon \geq \Omega (\delta (s))$

, while

$\epsilon \geq \Omega (\sqrt {\delta (s)})$

for the stochastic case. Floating stochastic and sign gradient descents can both output an

$\epsilon $

noised result in

$O(1/\epsilon ^{2})$

iterations.

中文翻译：

使用浮点数进行梯度下降学习

梯度学习下降法是人工智能和机器学习研究中训练任务的主要方法。目前梯度下降的理论研究只使用连续域，这是不真实的，因为电子计算机使用浮点数来存储和处理数据。虽然现有结果对于高精度机器中极小的误差已经足够，但对于低精度的情况，它们需要改进。本文介绍了对具有浮点数的计算机中的学习算法的理解。研究了当目标函数是平滑时三个带有浮动域的梯度下降的性能。当假设函数具有 PŁ 条件时，可以提高收敛速度。我们证明了对于浮动梯度下降获得的误差

$\epsilon $

, 迭代是

$O(1/\epsilon)$

对于一般光滑的情况，和

$O(\ln (1/\epsilon))$

对于 PŁ 案例。但

$\epsilon $

应该大于

$$$

位机ε

$\delta (s)$

在确定性情况下，即