Multi-Perspective LSTM for Joint Visual Representation Learning,arXiv - CS - Computer Vision and Pattern Recognition

当前位置： X-MOL 学术 › arXiv.cs.CV › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Multi-Perspective LSTM for Joint Visual Representation Learning
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2021-05-06 , DOI: arxiv-2105.02802
Alireza Sepas-Moghaddam, Fernando Pereira, Paulo Lobato Correia, Ali Etemad

We present a novel LSTM cell architecture capable of learning both intra- and inter-perspective relationships available in visual sequences captured from multiple perspectives. Our architecture adopts a novel recurrent joint learning strategy that uses additional gates and memories at the cell level. We demonstrate that by using the proposed cell to create a network, more effective and richer visual representations are learned for recognition tasks. We validate the performance of our proposed architecture in the context of two multi-perspective visual recognition tasks namely lip reading and face recognition. Three relevant datasets are considered and the results are compared against fusion strategies, other existing multi-input LSTM architectures, and alternative recognition solutions. The experiments show the superior performance of our solution over the considered benchmarks, both in terms of recognition accuracy and complexity. We make our code publicly available at https://github.com/arsm/MPLSTM.

中文翻译：

用于联合视觉表示学习的多视角LSTM

我们提出了一种新颖的LSTM细胞结构，该体系结构能够学习从多个角度捕获的视觉序列中可用的透视内和透视间关系。我们的体系结构采用了一种新颖的循环联合学习策略，该策略在单元级别使用了额外的门控和记忆。我们证明，通过使用建议的单元格创建网络，可以为识别任务学习更有效，更丰富的视觉表示。我们在两个多视角视觉识别任务（即唇读和脸部识别）的背景下验证了我们提出的体系结构的性能。考虑了三个相关的数据集，并将结果与融合策略，其他现有的多输入LSTM体系结构和替代识别解决方案进行了比较。实验表明，在识别准确性和复杂性方面，我们的解决方案均优于考虑的基准。我们可以在https://github.com/arsm/MPLSTM上公开提供我们的代码。

更新日期：2021-05-07

点击分享查看原文

点击收藏

阅读更多本刊最新论文