当前位置: X-MOL 学术IEEE Trans. Cybern. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Chinese Image Caption Generation via Visual Attention and Topic Modeling
IEEE Transactions on Cybernetics ( IF 11.8 ) Pub Date : 2020-06-22 , DOI: 10.1109/tcyb.2020.2997034
Maofu Liu 1 , Huijun Hu 1 , Lingjun Li 1 , Yan Yu 1 , Weili Guan 2
Affiliation  

Automatic image captioning is to conduct the cross-modal conversion from image visual content to natural language text. Involving computer vision (CV) and natural language processing (NLP), it has become one of the most sophisticated research issues in the artificial-intelligence area. Based on the deep neural network, the neural image caption (NIC) model has achieved remarkable performance in image captioning, yet there still remain some essential challenges, such as the deviation between descriptive sentences generated by the model and the intrinsic content expressed by the image, the low accuracy of the image scene description, and the monotony of generated sentences. In addition, most of the current datasets and methods for image captioning are in English. However, considering the distinction between Chinese and English in syntax and semantics, it is necessary to develop specialized Chinese image caption generation methods to accommodate the difference. To solve the aforementioned problems, we design the NICVATP2L model via visual attention and topic modeling, in which the visual attention mechanism reduces the deviation and the topic model improves the accuracy and diversity of generated sentences. Specifically, in the encoding phase, convolutional neural network (CNN) and topic model are used to extract visual and topic features of the input images, respectively. In the decoding phase, an attention mechanism is applied to processing image visual features for obtaining image visual region features. Finally, the topic features and the visual region features are combined to guide the two-layer long short-term memory (LSTM) network for generating Chinese image captions. To justify our model, we have conducted experiments over the Chinese AIC-ICC image dataset. The experimental results show that our model can automatically generate more informative and descriptive captions in Chinese in a more natural way, and it outperforms the existing image captioning NIC model.
更新日期:2020-06-22
down
wechat
bug