Skip to main content
Log in

Multimodal deep neural networks for attribute prediction and applications to e-commerce catalogs enhancement

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Compiling and managing huge e-commerce catalogs is a hard and time-consuming task for a retailer. In particular, deriving standardized and structured descriptions from unstructured data modalities, such as texts and images, is crucial to the performance of search engines and the general organization of virtual store databases. In this paper, we propose methodologies and strategies based on Deep Learning classifiers to structure, update, and inspect large e-commerce catalogs. To this purpose, we exploit multimodal representations combining data from images and unstructured textual descriptions to identify relevant labels for e-commerce applications. Such modalities of data are employed to train deep neural network architectures, which are then able to automatically recognize attributes. Three classes of architecture were investigated: variations of the VGG architecture for recognition from images; architectures combining embedding, convolutional and recurrent layers for text recognition; and hybrid architectures that combine elements from each of the previous architectures. We also propose tools that allow the detection of insufficiently descriptive visual and textual data, which can be later manually improved; and automatic enhancement of attribute annotations through neural network predictions. Using a database that we collected through a Web Crawler from a large e-commerce site, we show in our experiments that hybrid architectures achieve a better result in the classification task by combining both types of data. Finally, we show results of a case study performed to demonstrate the potential of our strategy for insufficiently descriptive data detection. We conclude that the proposed tools are effective to rectify, enhance, and efficiently update e-commerce catalogs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. https://www.macys.com

References

  1. Arslan HS, Sirts K, Fishel M, Anbarjafari G (2019) Multimodal sequential fashion attribute prediction. Information 10(10):308

    Article  Google Scholar 

  2. Bracher C, Heinz S, Vollgraf R (2016) Fashion DNA: merging content and sales data for recommendation and article mapping. CoRR arXiv:1609.02489

  3. Cardoso Â, Daolio F, Vargas S (2018) Product characterisation towards personalisation: learning attributes from unstructured data to recommend fashion products. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, KDD 2018, London, UK, August 19-23, 2018. https://doi.org/10.1145/3219819.3219888, pp 80–89

  4. Chen MX, Firat O, Bapna A, Johnson M, Macherey W, Foster G, Jones L, Parmar N, Schuster M, Chen Z et al (2018) The best of both worlds: combining recent advances in neural machine translation. arXiv:1804.09849

  5. Chollet F et al (2015) Keras. https://github.com/fchollet/keras

  6. Dai AM, Olah C, Le QV (2015) Document embedding with paragraph vectors

  7. Dasgupta R, Tom F, Kumar S, Das Gupta M, Kumar Y, Patro BN, Namboodiri VP (2020) Visually precise query. In: Proceedings of the 28th ACM international conference on multimedia. ACM, pp 3550–3558

  8. Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on computer vision and pattern recognition. IEEE, pp 248–255

  9. Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805

  10. Fensel D (2001) Challenges in content management for b2b electronic commerce. In: Proceedings second international workshop on user interfaces in data intensive systems. UIDIS 2001. IEEE, pp 2–4

  11. Frome A, Corrado GS, Shlens J, Bengio S, Dean J, Ranzato MA, Mikolov T (2013) Devise: a deep visual-semantic embedding model. In: Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ (eds) Advances in neural information processing systems. http://papers.nips.cc/paper/5204-devise-a-deep-visual-semantic-embedding-model.pdf, vol 26. Curran Associates, Inc., pp 2121–2129

  12. He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition. CoRR arXiv:1512.03385

  13. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  14. Inoue N, Simo-Serra E, Yamasaki T, Ishikawa H (2017) Multi-label fashion image classification with minimal human supervision. In: 2017 IEEE international conference on computer vision workshops (ICCVW), pp 2261–2267

  15. Jurasky D, Martin JH (2000) Speech and language processing: an introduction to natural language processing. Computational Linguistics and Speech Recognition. Prentice Hall, New Jersey

    Google Scholar 

  16. Katarya R, Arora Y (2020) Capsmf: a novel product recommender system using deep learning based text analysis model. Multimedia Tools and Applications. https://doi.org/10.1007/s11042-020-09199-5

  17. Kim Y (2014) Convolutional neural networks for sentence classification. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). https://doi.org/10.3115/v1/D14-1181. https://www.aclweb.org/anthology/D14-1181. Association for Computational Linguistics, Doha, pp 1746–1751

  18. Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv:1412.6980

  19. Laenen K, Zoghbi S, Moens MF (2017) Cross-modal search for fashion attributes. In: Proceedings of the KDD 2017 workshop on machine learning meets fashion. ACM

  20. Liu W, Wang Z, Liu X, Zeng N, Liu Y, Alsaadi FE (2017) A survey of deep neural network architectures and their applications. Neurocomputing 234:11–26

    Article  Google Scholar 

  21. Liu Z, Luo P, Qiu S, Wang X, Tang X (2016) Deepfashion: powering robust clothes recognition and retrieval with rich annotations. In: 2016 IEEE Conference on computer vision and pattern recognition (CVPR), pp 1096–1104

  22. Lu J, Wu D, Mao M, Wang W, Zhang G (2015) Recommender system application developments: a survey. Decis Support Syst 74:12–32

    Article  Google Scholar 

  23. Ruder S (2016) An overview of gradient descent optimization algorithms. CoRR arXiv:1609.04747

  24. Schindler A, Lidy T, Karner S, Hecker M (2018) Fashion and apparel classification using convolutional neural networks. CoRR arXiv:1811.04374

  25. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556

  26. Sun GL, Cheng ZQ, Wu X, Peng Q (2017) Personalized clothing recommendation combining user social circle and fashion style consistency. Multimedia Tools and Applications 77:1–24. https://doi.org/10.1007/s11042-017-5245-1

    Google Scholar 

  27. Szegedy C, Ioffe S, Vanhoucke V (2016) Inception-v4, inception-resnet and the impact of residual connections on learning. CoRR arXiv:1602.07261

  28. Szegedy C, Liu W, Jia Y, Sermanet P, Reed SE, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2014) Going deeper with convolutions. CoRR arXiv:1409.4842

  29. Young T, Hazarika D, Poria S, Cambria E (2018) Recent trends in deep learning based natural language processing [review article]. IEEE Comput Intell Mag 13(3):55–75

    Article  Google Scholar 

  30. Yu L, Simo-Serra E, Moreno-Noguer F, Rubio A (2017) Multi-modal embedding for main product detection in fashion. In: 2017 IEEE international conference on computer vision workshops (ICCVW), pp 2236–2242

  31. Yu W, Zhang H, He X, Chen X, Xiong L, Qin Z (2018) Aesthetic-based clothing recommendation. In: Proceedings of the 2018 world wide web conference, pp 649–658

  32. Zahavy T, Magnani A, Krishnan A, Mannor S (2016) Is a picture worth a thousand words? A deep multi-modal fusion architecture for product classification in e-commerce. CoRR arXiv:1611.09534

Download references

Acknowledgements

The authors would like to thank the Alagoas Research Foundation (FAPEAL) for the first author’s scholarship #60030001626/2018.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thales Vieira.

Ethics declarations

Conflict of Interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sales, L.F., Pereira, A., Vieira, T. et al. Multimodal deep neural networks for attribute prediction and applications to e-commerce catalogs enhancement. Multimed Tools Appl 80, 25851–25873 (2021). https://doi.org/10.1007/s11042-021-10885-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-021-10885-1

Keywords

Navigation