Temporal Interlacing Network,arXiv - CS - Computer Vision and Pattern Recognition

当前位置： X-MOL 学术 › arXiv.cs.CV › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Temporal Interlacing Network
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2020-01-17 , DOI: arxiv-2001.06499
Hao Shao, Shengju Qian, Yu Liu

For a long time, the vision community tries to learn the spatio-temporal representation by combining convolutional neural network together with various temporal models, such as the families of Markov chain, optical flow, RNN and temporal convolution. However, these pipelines consume enormous computing resources due to the alternately learning process for spatial and temporal information. One natural question is whether we can embed the temporal information into the spatial one so the information in the two domains can be jointly learned once-only. In this work, we answer this question by presenting a simple yet powerful operator -- temporal interlacing network (TIN). Instead of learning the temporal features, TIN fuses the two kinds of information by interlacing spatial representations from the past to the future, and vice versa. A differentiable interlacing target can be learned to control the interlacing process. In this way, a heavy temporal model is replaced by a simple interlacing operator. We theoretically prove that with a learnable interlacing target, TIN performs equivalently to the regularized temporal convolution network (r-TCN), but gains 4% more accuracy with 6x less latency on 6 challenging benchmarks. These results push the state-of-the-art performances of video understanding by a considerable margin. Not surprising, the ensemble model of the proposed TIN won the $1^{st}$ place in the ICCV19 - Multi Moments in Time challenge. Code is made available to facilitate further research at https://github.com/deepcs233/TIN

中文翻译：

时间交错网络

长期以来，视觉社区试图通过将卷积神经网络与各种时间模型（例如马尔可夫链、光流、RNN 和时间卷积的家族）结合来学习时空表示。然而，由于空间和时间信息的交替学习过程，这些管道消耗了大量的计算资源。一个自然的问题是我们是否可以将时间信息嵌入到空间信息中，以便两个域中的信息可以联合学习一次。在这项工作中，我们通过展示一个简单而强大的算子——时间交错网络 (TIN) 来回答这个问题。TIN 不是学习时间特征，而是通过将过去到未来的空间表示交错来融合两种信息，反之亦然。可以学习可微分交错目标来控制交错过程。通过这种方式，一个繁重的时间模型被一个简单的隔行算子取代。我们从理论上证明，对于可学习的隔行目标，TIN 的性能与正则化时间卷积网络 (r-TCN) 相当，但在 6 个具有挑战性的基准测试中，准确度提高了 4%，延迟减少了 6 倍。这些结果大大推动了视频理解的最先进性能。毫不奇怪，提议的 TIN 的集成模型在 ICCV19 - Multi Moments in Time 挑战中赢得了 $1^{st}$ 的位置。代码可用于促进进一步研究 https://github.com/deepcs233/TIN 一个繁重的时间模型被一个简单的隔行操作符取代。我们从理论上证明，对于可学习的隔行目标，TIN 的性能与正则化时间卷积网络 (r-TCN) 相当，但在 6 个具有挑战性的基准测试中，准确度提高了 4%，延迟减少了 6 倍。这些结果大大推动了视频理解的最先进性能。毫不奇怪，提议的 TIN 的集成模型在 ICCV19 - Multi Moments in Time 挑战中赢得了 $1^{st}$ 的位置。代码可用于促进进一步研究 https://github.com/deepcs233/TIN 一个繁重的时间模型被一个简单的隔行操作符取代。我们从理论上证明，对于可学习的隔行目标，TIN 的性能与正则化时间卷积网络 (r-TCN) 相当，但在 6 个具有挑战性的基准测试中，准确度提高了 4%，延迟减少了 6 倍。这些结果大大推动了视频理解的最先进性能。毫不奇怪，提议的 TIN 的集成模型在 ICCV19 - Multi Moments in Time 挑战中赢得了 $1^{st}$ 的位置。代码可用于促进进一步研究 https://github.com/deepcs233/TIN 这些结果大大推动了视频理解的最先进性能。毫不奇怪，提议的 TIN 的集成模型在 ICCV19 - Multi Moments in Time 挑战中赢得了 $1^{st}$ 的位置。代码可用于促进进一步研究 https://github.com/deepcs233/TIN 这些结果大大推动了视频理解的最先进性能。毫不奇怪，提议的 TIN 的集成模型在 ICCV19 - Multi Moments in Time 挑战中赢得了 $1^{st}$ 的位置。代码可用于促进进一步研究 https://github.com/deepcs233/TIN

更新日期：2020-01-22

点击分享查看原文

点击收藏

阅读更多本刊最新论文