Skip to main content
Log in

Crowdsourcing aggregation with deep Bayesian learning

  • Research Paper
  • Published:
Science China Information Sciences Aims and scope Submit manuscript

Abstract

In this study, we consider a crowdsourcing classification problem in which labeling information from crowds is aggregated to infer latent true labels. We propose a fully Bayesian deep generative crowdsourcing model (BayesDGC), which combines the strength of deep neural networks (DNNs) on automatic representation learning and the interpretable probabilistic structure encoding of probabilistic graphical models. The model comprises a DNN classifier as a prior for the true labels and a probabilistic model for the annotation generation process. The DNN classifier and annotation generation process share the latent true label variables. To address the inference challenge, we developed a natural-gradient stochastic variational inference, which combines variational message passing for conjugate parameters and stochastic gradient descent for DNN and learns the distribution of latent true labels and workers’ confusion matrix via end-to-end training. We illustrated the effectiveness of the proposed model using empirical results on 22 real-world datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Horvitz E. Reflections on challenges and promises of mixed-initiative interaction. AI Mag, 2007, 28: 13–22

    Google Scholar 

  2. Weld D, Lin C, Bragg J. Artificial intelligence and collective intelligence. In: Proceedings of Collective Intelligence Handbook. 2015

  3. Snow R, O’Connor B, Jurafsky D, et al. Cheap and fast — but is it good? Evaluating non-expert annotations for natural language tasks. In: Proceedings of Conference on Empirical Methods in Natural Language Processing, 2008. 254–263

  4. Raykar V, Yu S, Zhao L, et al. Learning from crowds. J Mach Learn Res, 2010, 11: 1297–1322

    MathSciNet  Google Scholar 

  5. Welinder P, Branson S, Belongie S, et al. The multidimensional wisdom of crowds. In: Proceedings of Advances in Neural Information Processing Systems, 2010. 2024–2432

  6. Li Q Q, Li Y L, Gao J, et al. A confidence-aware approach for truth discovery on long-tail data. Proc VLDB Endow, 2014, 8: 425–436

    Article  Google Scholar 

  7. Dawid A P, Skene A M. Maximum likelihood estimation of observer error-rates using the EM algorithm. Appl Stat, 1979, 28: 20

    Article  Google Scholar 

  8. Whitehill J, Ruvolo P, Wu T, et al. Whose vote should count more: optimal integration of labels from labelers of unknown expertise. In: Proceedings of Advances in Neural Information Processing Systems, 2009. 2035–2043

  9. Zhou D Y, Basu S, Mao Y, et al. Learning from the wisdom of crowds by minimax entropy. In: Proceedings of Advances in Neural Information Processing Systems, 2012. 2195–2203

  10. Li Y, Rubinstein B I P, Cohn T. Exploiting worker correlation for label aggregation in crowdsourcing. In: Proceedings of the 36th International Conference on Machine Learning, 2019. 3886–3895

  11. Rodrigues F, Pereira F, Ribeiro B, et al. Gaussian process classification and active learning with multiple annotators. In: Proceedings of the 31st International Conference on Machine Learning, 2014. 433–441

  12. Albarqouni S, Baur C, Achilles F, et al. AggNet: deep learning from crowds for mitosis detection in breast cancer histology images. IEEE Trans Med Imag, 2016, 35: 1313–1321

    Article  Google Scholar 

  13. Atarashi K, Oyama S, Kurihara M. Semi-supervised learning from crowds using deep generative models. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence, 2018. 1555–1562

  14. Rodrigues F, Pereira P C. Deep learning from crowds. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence, 2018. 1611–1618

  15. Tanno R, Saeedi A, Sankaranarayanan S, et al. Learning from noisy labels by regularized estimation of annotator confusion. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2019. 11244–11253

  16. Imamura H, Sato I, Sugiyama M. Analysis of minimax error rate for crowdsourcing and its application to worker clustering model. In: Proceedings of the 35th International Conference on Machine Learning, 2018. 2152–2161

  17. Sheng V S, Zhang J, Gu B, et al. Majority voting and pairing with multiple noisy labeling. IEEE Trans Knowl Data Eng, 2019, 31: 1355–1368

    Article  Google Scholar 

  18. Tao F N, Jiang L X, Li C Q. Label similarity-based weighted soft majority voting and pairing for crowdsourcing. Knowl Inf Syst, 2020, 62: 2521–2538

    Article  Google Scholar 

  19. Liu Q, Peng J, Ihler A. Variational inference for crowdsourcing. In: Proceedings of Advances in Neural Information Processing Systems, 2012. 692–700

  20. Kim H C, Ghahramani Z. Bayesian classifier combination. In: Proceedings of the 15th International Conference on Artificial Intelligence and Statistics, 2012. 619–627

  21. Simpson E, Roberts S, Psorakis I, et al. Dynamic Bayesian combination of multiple imperfect classifiers. In: Decision Making and Imperfection. Berlin: Springer, 2013

    Google Scholar 

  22. Venanzi M, Guiver J, Kazai P, et al. Community-based bayesian aggregation models for crowdsourcing. In: Proceedings of the 23rd International Conference on World Wide Web, 2014. 155–164

  23. Moreno P G, Artes-Rodrguez A, Teh Y W, et al. Bayesian nonparametric crowdsourcing. J Mach Learn Res, 2015, 16: 1607–1627

    MathSciNet  MATH  Google Scholar 

  24. Rodrigues F, Lourenco M, Ribeiro B, et al. Learning supervised topic models for classification and regression from crowds. IEEE Trans Pattern Anal Mach Intell, 2017, 39: 2409–2422

    Article  Google Scholar 

  25. Zhang H, Jiang L X, Xu W Q. Multiple noisy label distribution propagation for crowdsourcing. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence, 2019. 1473–1479

  26. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature, 2015, 521: 436–444

    Article  Google Scholar 

  27. Luo Y C, Tian T, Shi J X, et al. Semi-crowdsourced clustering with deep generative models. In: Proceedings of Advances in Neural Information Processing Systems, 2018. 3216–3226

  28. Kingma D P, Mohamed S, Rezende D J, et al. Semi-supervised learning with deep generative models. In: Proceedings of Advances in Neural Information Processing Systems, 2014. 3581–3589

  29. Kingma D P, Welling M. Auto-encoding variational bayes. In: Proceedings of the 2nd International Conference on Learning Representations, 2014

  30. Johnson M J, Duvenaud D, Wiltschko A B, et al. Composing graphical models with neural networks for structured representations and fast inference. In: Proceedings of Advances in Neural Information Processing Systems, 2016. 2946–2954

  31. Hoffman M D, Blei D M, Wang C, et al. Stochastic variational inference. J Mach Learn Res, 2013, 14: 1303–1347

    MathSciNet  MATH  Google Scholar 

  32. Li S Y, Jiang Y, Chawla N V, et al. Multi-label learning from crowds. IEEE Trans Knowl Data Eng, 2019, 31: 1369–1382

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by Fundamental Research Funds for the Central Universities (Grant No. NJ2019010), National Natural Science Foundation of China (Grant No. 61906089), Jiangsu Province Basic Research Program (Grant No. BK20190408), and China Postdoc Science Foundation (the First Pre-station Special Grant).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shao-Yuan Li.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, SY., Huang, SJ. & Chen, S. Crowdsourcing aggregation with deep Bayesian learning. Sci. China Inf. Sci. 64, 130104 (2021). https://doi.org/10.1007/s11432-020-3118-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11432-020-3118-7

Keywords

Navigation