Multimodal deep neural networks for attribute prediction and applications to e-commerce catalogs enhancement

Sales, Luiz Felipe; Pereira, Artur; Vieira, Thales; de Barros Costa, Evandro

doi:10.1007/s11042-021-10885-1

Multimodal deep neural networks for attribute prediction and applications to e-commerce catalogs enhancement

Published: 24 April 2021

Volume 80, pages 25851–25873, (2021)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Luiz Felipe Sales¹,
Artur Pereira²,
Thales Vieira ORCID: orcid.org/0000-0001-7775-5258¹ &
…
Evandro de Barros Costa¹

461 Accesses
4 Citations
Explore all metrics

Abstract

Compiling and managing huge e-commerce catalogs is a hard and time-consuming task for a retailer. In particular, deriving standardized and structured descriptions from unstructured data modalities, such as texts and images, is crucial to the performance of search engines and the general organization of virtual store databases. In this paper, we propose methodologies and strategies based on Deep Learning classifiers to structure, update, and inspect large e-commerce catalogs. To this purpose, we exploit multimodal representations combining data from images and unstructured textual descriptions to identify relevant labels for e-commerce applications. Such modalities of data are employed to train deep neural network architectures, which are then able to automatically recognize attributes. Three classes of architecture were investigated: variations of the VGG architecture for recognition from images; architectures combining embedding, convolutional and recurrent layers for text recognition; and hybrid architectures that combine elements from each of the previous architectures. We also propose tools that allow the detection of insufficiently descriptive visual and textual data, which can be later manually improved; and automatic enhancement of attribute annotations through neural network predictions. Using a database that we collected through a Web Crawler from a large e-commerce site, we show in our experiments that hybrid architectures achieve a better result in the classification task by combining both types of data. Finally, we show results of a case study performed to demonstrate the potential of our strategy for insufficiently descriptive data detection. We conclude that the proposed tools are effective to rectify, enhance, and efficiently update e-commerce catalogs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Artificial intelligence in E-Commerce: a bibliometric study and literature review

Article 18 March 2022

Ransome Epie Bawack, Samuel Fosso Wamba, … Shahriar Akter

Artificial intelligence in recommender systems

Article Open access 01 November 2020

Qian Zhang, Jie Lu & Yaochu Jin

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

Article Open access 06 February 2017

Ranjay Krishna, Yuke Zhu, … Li Fei-Fei

Notes

https://www.macys.com

References

Arslan HS, Sirts K, Fishel M, Anbarjafari G (2019) Multimodal sequential fashion attribute prediction. Information 10(10):308
Article Google Scholar
Bracher C, Heinz S, Vollgraf R (2016) Fashion DNA: merging content and sales data for recommendation and article mapping. CoRR arXiv:1609.02489
Cardoso Â, Daolio F, Vargas S (2018) Product characterisation towards personalisation: learning attributes from unstructured data to recommend fashion products. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, KDD 2018, London, UK, August 19-23, 2018. https://doi.org/10.1145/3219819.3219888, pp 80–89
Chen MX, Firat O, Bapna A, Johnson M, Macherey W, Foster G, Jones L, Parmar N, Schuster M, Chen Z et al (2018) The best of both worlds: combining recent advances in neural machine translation. arXiv:1804.09849
Chollet F et al (2015) Keras. https://github.com/fchollet/keras
Dai AM, Olah C, Le QV (2015) Document embedding with paragraph vectors
Dasgupta R, Tom F, Kumar S, Das Gupta M, Kumar Y, Patro BN, Namboodiri VP (2020) Visually precise query. In: Proceedings of the 28th ACM international conference on multimedia. ACM, pp 3550–3558
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on computer vision and pattern recognition. IEEE, pp 248–255
Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805
Fensel D (2001) Challenges in content management for b2b electronic commerce. In: Proceedings second international workshop on user interfaces in data intensive systems. UIDIS 2001. IEEE, pp 2–4
Frome A, Corrado GS, Shlens J, Bengio S, Dean J, Ranzato MA, Mikolov T (2013) Devise: a deep visual-semantic embedding model. In: Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ (eds) Advances in neural information processing systems. http://papers.nips.cc/paper/5204-devise-a-deep-visual-semantic-embedding-model.pdf, vol 26. Curran Associates, Inc., pp 2121–2129
He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition. CoRR arXiv:1512.03385
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Inoue N, Simo-Serra E, Yamasaki T, Ishikawa H (2017) Multi-label fashion image classification with minimal human supervision. In: 2017 IEEE international conference on computer vision workshops (ICCVW), pp 2261–2267
Jurasky D, Martin JH (2000) Speech and language processing: an introduction to natural language processing. Computational Linguistics and Speech Recognition. Prentice Hall, New Jersey
Google Scholar
Katarya R, Arora Y (2020) Capsmf: a novel product recommender system using deep learning based text analysis model. Multimedia Tools and Applications. https://doi.org/10.1007/s11042-020-09199-5
Kim Y (2014) Convolutional neural networks for sentence classification. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). https://doi.org/10.3115/v1/D14-1181. https://www.aclweb.org/anthology/D14-1181. Association for Computational Linguistics, Doha, pp 1746–1751
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv:1412.6980
Laenen K, Zoghbi S, Moens MF (2017) Cross-modal search for fashion attributes. In: Proceedings of the KDD 2017 workshop on machine learning meets fashion. ACM
Liu W, Wang Z, Liu X, Zeng N, Liu Y, Alsaadi FE (2017) A survey of deep neural network architectures and their applications. Neurocomputing 234:11–26
Article Google Scholar
Liu Z, Luo P, Qiu S, Wang X, Tang X (2016) Deepfashion: powering robust clothes recognition and retrieval with rich annotations. In: 2016 IEEE Conference on computer vision and pattern recognition (CVPR), pp 1096–1104
Lu J, Wu D, Mao M, Wang W, Zhang G (2015) Recommender system application developments: a survey. Decis Support Syst 74:12–32
Article Google Scholar
Ruder S (2016) An overview of gradient descent optimization algorithms. CoRR arXiv:1609.04747
Schindler A, Lidy T, Karner S, Hecker M (2018) Fashion and apparel classification using convolutional neural networks. CoRR arXiv:1811.04374
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Sun GL, Cheng ZQ, Wu X, Peng Q (2017) Personalized clothing recommendation combining user social circle and fashion style consistency. Multimedia Tools and Applications 77:1–24. https://doi.org/10.1007/s11042-017-5245-1
Google Scholar
Szegedy C, Ioffe S, Vanhoucke V (2016) Inception-v4, inception-resnet and the impact of residual connections on learning. CoRR arXiv:1602.07261
Szegedy C, Liu W, Jia Y, Sermanet P, Reed SE, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2014) Going deeper with convolutions. CoRR arXiv:1409.4842
Young T, Hazarika D, Poria S, Cambria E (2018) Recent trends in deep learning based natural language processing [review article]. IEEE Comput Intell Mag 13(3):55–75
Article Google Scholar
Yu L, Simo-Serra E, Moreno-Noguer F, Rubio A (2017) Multi-modal embedding for main product detection in fashion. In: 2017 IEEE international conference on computer vision workshops (ICCVW), pp 2236–2242
Yu W, Zhang H, He X, Chen X, Xiong L, Qin Z (2018) Aesthetic-based clothing recommendation. In: Proceedings of the 2018 world wide web conference, pp 649–658
Zahavy T, Magnani A, Krishnan A, Mannor S (2016) Is a picture worth a thousand words? A deep multi-modal fusion architecture for product classification in e-commerce. CoRR arXiv:1611.09534

Download references

Acknowledgements

The authors would like to thank the Alagoas Research Foundation (FAPEAL) for the first author’s scholarship #60030001626/2018.

Author information

Authors and Affiliations

Institute of Computing - Federal University of Alagoas, Maceió, AL, Brazil
Luiz Felipe Sales, Thales Vieira & Evandro de Barros Costa
Systems and Computing Department, Federal University of Campina Grande, Campina Grande, PB, Brazil
Artur Pereira

Authors

Luiz Felipe Sales
View author publications
You can also search for this author in PubMed Google Scholar
Artur Pereira
View author publications
You can also search for this author in PubMed Google Scholar
Thales Vieira
View author publications
You can also search for this author in PubMed Google Scholar
Evandro de Barros Costa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thales Vieira.

Ethics declarations

Conflict of Interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sales, L.F., Pereira, A., Vieira, T. et al. Multimodal deep neural networks for attribute prediction and applications to e-commerce catalogs enhancement. Multimed Tools Appl 80, 25851–25873 (2021). https://doi.org/10.1007/s11042-021-10885-1

Download citation

Received: 13 August 2020
Revised: 20 February 2021
Accepted: 01 April 2021
Published: 24 April 2021
Issue Date: July 2021
DOI: https://doi.org/10.1007/s11042-021-10885-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multimodal deep neural networks for attribute prediction and applications to e-commerce catalogs enhancement

Abstract

Access this article

Similar content being viewed by others

Artificial intelligence in E-Commerce: a bibliometric study and literature review

Artificial intelligence in recommender systems

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Abstract

Access this article

Similar content being viewed by others

Artificial intelligence in E-Commerce: a bibliometric study and literature review

Artificial intelligence in recommender systems

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation