ADC[sbnd]CF: Adaptive deep concatenation coder framework for visual question answering,Pattern Recognition Letters

当前位置： X-MOL 学术 › Pattern Recogn. Lett. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

ADC[sbnd]CF: Adaptive deep concatenation coder framework for visual question answering
Pattern Recognition Letters ( IF 3.9 ) Pub Date : 2021-10-28 , DOI: 10.1016/j.patrec.2021.10.028
Gunasekaran Manogaran , P.Mohamed Shakeel , M.A Burhanuddin , S Baskar , Vijayalakshmi Saravanan , Rubén González Crespo , Oscar Sanjuán Martínez

Multimodal teaching activity faces significant problems in Visual Question Answering (VQA), which involves simultaneous comprehension with reduced performance fidelity. However, Conventional methods are employed for portrayal and queries in a defined manner, which fails to accomplish the required performance accuracy rate. For elucidating the excellent image and question representation, this paper suggests an Adaptive Deep Concatenated Coder Framework (ADCCF) that enrolls both the image and question attributes simultaneously with the optimized residual layer. The Coder Framework comprises of cascaded layers of Encoder-Decoder architecture, which captures rich, meaningful query characteristics and image details through the use of keywords employing significant object areas in the picture. ADCCF layer has an encoder segment that blueprints the self-recognition of queries in which questions are concatenated to limit the answers and decoder segment blueprints the commanded-recognition of images. The simulation results of ADCCF are tested with both the VQA datasets 1.0 and 2.0 and manifests an improved performance accuracy ratio of 72.45% for 1.0 dataset and 73.57% for 2.0 datasets, thus proving the reliability of the proposed framework.

中文翻译：

ADC[sbnd]CF：用于视觉问答的自适应深度串联编码器框架

多模态教学活动在视觉问答（VQA）中面临重大问题，其中涉及同步理解，但性能保真度降低。然而，传统方法采用定义的方式进行描绘和查询，无法达到所需的性能准确率。为了阐明出色的图像和问题表示，本文提出了一种自适应深度级联编码器框架（ADCCF），它通过优化的残差层同时注册图像和问题属性。编码器框架由编码器-解码器架构的级联层组成，该架构通过使用采用图片中重要对象区域的关键字来捕获丰富、有意义的查询特征和图像细节。 ADCCF 层有一个编码器段，用于设计查询的自我识别，其中问题被串联起来以限制答案，而解码器段则用于设计图像的命令识别。 ADCCF 的仿真结果在 VQA 数据集 1.0 和 2.0 上进行了测试，结果表明 1.0 数据集的性能准确率提高了 72.45%，2.0 数据集的性能准确率提高了 73.57%，从而证明了所提出框架的可靠性。

更新日期：2021-10-28

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11