Multimodal deep neural networks for attribute prediction and applications to e-commerce catalogs enhancement,Multimedia Tools and Applications

当前位置： X-MOL 学术 › Multimed. Tools Appl. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Multimodal deep neural networks for attribute prediction and applications to e-commerce catalogs enhancement
Multimedia Tools and Applications ( IF 3.0 ) Pub Date : 2021-04-24 , DOI: 10.1007/s11042-021-10885-1
Luiz Felipe Sales , Artur Pereira , Thales Vieira , Evandro de Barros Costa

Compiling and managing huge e-commerce catalogs is a hard and time-consuming task for a retailer. In particular, deriving standardized and structured descriptions from unstructured data modalities, such as texts and images, is crucial to the performance of search engines and the general organization of virtual store databases. In this paper, we propose methodologies and strategies based on Deep Learning classifiers to structure, update, and inspect large e-commerce catalogs. To this purpose, we exploit multimodal representations combining data from images and unstructured textual descriptions to identify relevant labels for e-commerce applications. Such modalities of data are employed to train deep neural network architectures, which are then able to automatically recognize attributes. Three classes of architecture were investigated: variations of the VGG architecture for recognition from images; architectures combining embedding, convolutional and recurrent layers for text recognition; and hybrid architectures that combine elements from each of the previous architectures. We also propose tools that allow the detection of insufficiently descriptive visual and textual data, which can be later manually improved; and automatic enhancement of attribute annotations through neural network predictions. Using a database that we collected through a Web Crawler from a large e-commerce site, we show in our experiments that hybrid architectures achieve a better result in the classification task by combining both types of data. Finally, we show results of a case study performed to demonstrate the potential of our strategy for insufficiently descriptive data detection. We conclude that the proposed tools are effective to rectify, enhance, and efficiently update e-commerce catalogs.

中文翻译：

用于属性预测的多模式深度神经网络及其在电子商务目录增强中的应用

对于零售商而言，编译和管理庞大的电子商务目录是一项艰巨而耗时的任务。特别是，从非结构化数据形式（例如文本和图像）中获取标准化和结构化的描述对于搜索引擎的性能和虚拟商店数据库的总体组织至关重要。在本文中，我们提出了基于深度学习分类器的方法和策略，以构建，更新和检查大型电子商务目录。为此，我们利用多模式表示形式结合图像数据和非结构化文本描述来识别电子商务应用程序的相关标签。此类数据模态用于训练深度神经网络体系结构，然后能够自动识别属性。研究了三类建筑：用于从图像识别的VGG架构的变体；结合嵌入，卷积和递归层进行文本识别的体系结构；以及混合架构，这些架构结合了以前每个架构的元素。我们还提出了一些工具，可以检测到描述性不强的视觉和文本数据，可以稍后对其进行手动改进。通过神经网络预测自动增强属性注释。使用我们通过Web爬网程序从大型电子商务站点收集的数据库，我们在实验中表明，混合架构通过组合两种类型的数据在分类任务中取得了更好的结果。最后，我们显示了一个案例研究的结果，以证明我们的策略对于描述性数据检测不足的潜力。

更新日期：2021-04-24

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11