当前位置: X-MOL 学术Neurocomputing › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Evolutionary Recurrent Neural Network for Image Captioning
Neurocomputing ( IF 6 ) Pub Date : 2020-08-01 , DOI: 10.1016/j.neucom.2020.03.087
Hanzhang Wang , Hanli Wang , Kaisheng Xu

Abstract Automatic architecture search is efficient to discover novel neural networks while it is mostly employed for pure vision or natural language tasks. However, cross-modality tasks are highly emphasized on the associative mechanisms between visual and language models rather than merely convolutional neural network (CNN) or recurrent neural network (RNN) with the best performance. In this work, the intermediary associative connection is approximated to the topological inner structure of RNN cell, which is further evolved by an evolutionary algorithm on the proxy of image captioning task. On the MSCOCO dataset, the proposed algorithm, starting from scratch, discovers more than 100 RNN variants with the performances all above 100 on CIDEr and 31 on BLEU4, and the top performance achieves 101.4 and 32.6 accordingly. Additionally, several unknown interesting patterns as well as many existing powerful structures are found in the generated RNNs. The patterns of operation and connection in the generated architecture are analyzed to understand the language modeling of cross-modality compared with general RNNs.

中文翻译:

用于图像字幕的进化循环神经网络

摘要 自动架构搜索可以有效地发现新的神经网络,而它主要用于纯视觉或自然语言任务。然而,跨模态任务高度强调视觉和语言模型之间的关联机制,而不仅仅是具有最佳性能的卷积神经网络 (CNN) 或循环神经网络 (RNN)。在这项工作中,中间关联连接近似于 RNN 单元的拓扑内部结构,通过图像字幕任务代理的进化算法进一步进化。在 MSCOCO 数据集上,该算法从头开始,发现了 100 多个 RNN 变体,在 CIDEr 上的性能均在 100 以上,在 BLEU4 上的性能均在 31 以上,最高性能相应地达到了 101.4 和 32.6。此外,在生成的 RNN 中发现了几个未知的有趣模式以及许多现有的强大结构。分析生成的架构中的操作和连接模式,以了解与一般 RNN 相比的跨模态语言建模。
更新日期:2020-08-01
down
wechat
bug