Skip to main content
Log in

Find truth in the hands of the few: acquiring specific knowledge with crowdsourcing

  • Research Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

Crowdsourcing has been a helpful mechanism to leverage human intelligence to acquire useful knowledge. However, when we aggregate the crowd knowledge based on the currently developed voting algorithms, it often results in common knowledge that may not be expected. In this paper, we consider the problem of collecting specific knowledge via crowdsourcing. With the help of using external knowledge base such as WordNet, we incorporate the semantic relations between the alternative answers into a probabilistic model to determine which answer is more specific. We formulate the probabilistic model considering both worker’s ability and task’s difficulty from the basic assumption, and solve it by the expectation-maximization (EM) algorithm. To increase algorithm compatibility, we also refine our method into semi-supervised one. Experimental results show that our approach is robust with hyper-parameters and achieves better improvement than majority voting and other algorithms when more specific answers are expected, especially for sparse data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Howe J. The rise of crowdsourcing. Wired Magazine, 2006, 14(6): 1–4

    Google Scholar 

  2. Wang J, Li G, Kraska T, Franklin M J, Feng J. Leveraging transitive relations for crowdsourced joins. In: Proceedings of ACM Conference on Management of Data. 2013, 229–240

  3. Russell B C, Torralba A, Murphy K P, Freeman W T. Labelme: a database and Web-based tool for image annotation. International Journal of Computer Vision, 2008, 77(1–3): 157–173

    Article  Google Scholar 

  4. Hwang K, Lee S Y. Environmental audio scene and activity recognition through mobile-based crowdsourcing. IEEE Transactions on Consumer Electronics, 2012, 58(2): 700–705

    Article  Google Scholar 

  5. Vondrick C, Patterson D, Ramanan D. Efficiently scaling up crowdsourced video annotation. International Journal of Computer Vision, 2013, 101(1): 184–204

    Article  Google Scholar 

  6. Waggoner B, Chen Y. Output agreement mechanisms and common knowledge. In: Proceedings of the 2nd AAAI Conference on Human Computation and Crowdsourcing. 2014

  7. Ordonez V, Deng J, Choi Y, Berg A C, Berg T. From large scale image categorization to entry-level categories. In: Proceedings of IEEE International Conference on Computer Vision. 2013, 2768–2775

  8. Feng S, Ravi S, Kumar R, Kuznetsova P, Liu W, Berg A C, Berg T L, Choi Y. Refer-to-as relations as semantic knowledge. In: Proceedings of International Conference on Automated Planning and Scheduling. 2015

  9. Dawid A P, Skene A M. Maximum likelihood estimation of observer error-rates using the em algorithm. Applied Statistics, 1979, 28(1): 20–28

    Article  Google Scholar 

  10. Whitehill J, Wu T F, Bergsma J, Movellan J R, Ruvolo P L. Whose vote should count more: optimal integration of labels from labelers of unknown expertise. In: Proceedings of Annual Conference on Neural Information Processing Systems. 2009, 2035–2043

  11. Salek M, Bachrach Y, Key P. Hotspotting-a probabilistic graphical model for image object localization through crowdsourcing. In: Proceedings of International Conference on Automated Planning and Scheduling. 2013

  12. Bachrach Y, Minka T, Guiver J, Graepel T. How to grade a test without knowing the answers—a bayesian graphical model for adaptive crowdsourcing and aptitude testing. In: Proceedings of the 29th International Conference on Machine Learning. 2012, 819–826

  13. Raykar V C, Yu S, Zhao L H, Valadez G H, Florin C, Bogoni L, Moy L. Learning from crowds. Journal of Machine Learning Research, 2010, 11(43): 1297–1322

    MathSciNet  Google Scholar 

  14. Demartini G, Difallah D E, Cudré-Mauroux P. Zencrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. In: Proceedings of the 21st International Conference on World Wide Web. 2012, 469–478

  15. Zhou D, Basu S, Mao Y, Platt J C. Learning from the wisdom of crowds by minimax entropy. In: Proceedings of Annual Conference on Neural Information Processing Systems. 2012, 2195–2203

  16. Han T, Sun H, Song Y, Fang Y, Liu X. Incorporating external knowledge into crowd intelligence for more specific knowledge acquisition. In: Proceedings of International Joint Conference on Artificial Intelligence. 2016, 1541–1547

  17. Chilton L B, Little G, Edge D, Weld D S, Landay J A. Cascade: crowdsourcing taxonomy creation. In: Proceedings of SIGCHI Conference on Human Factors in Computing Systems. 2013, 1999–2008

  18. Bragg J, Weld D S. Crowdsourcing multi-label classification for taxonomy creation. In: Proceedings of the 1st AAAI Conference on Human Computation and Crowdsourcing. 2013

  19. Sun Y, Singla A, Fox D, Krause A. Building hierarchies of concepts via crowdsourcing. In: Proceedings of International Joint Conference on Artificial Intelligence. 2015, 844–851

  20. Fellbaum C. WordNet: An Electronic Lexical Database. MIT Press, 1998

  21. Lenat D B, Guha R V. Building Large Knowledge-Based Systems: Representation and Inference in the Cyc Project. Addison-Wesley, 1989

  22. Speer R, Havasi C. Representing general relational knowledge in conceptnet 5. In: Proceedings of Language Resources and Evaluation Conference. 2012, 3679–3686

  23. Wu W, Li H, Wang H, Zhu K Q. Probase: a probabilistic taxonomy for text understanding. In: Proceedings of ACM Conference on Management of Data. 2012, 481–492

  24. Prelec D, Seung H S, McCoy J. A solution to the single-question crowd wisdom problem. Nature, 2017, 541(7638): 532–535

    Article  Google Scholar 

  25. Divvala S K, Farhadi A, Guestrin C. Learning everything about anything: webly-supervised visual concept learning. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2014, 3270–3277

  26. Sheng V S, Provost F, Ipeirotis P G. Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2008, 614–622

  27. Ipeirotis P G, Provost F, Wang J. Quality management on amazon mechanical turk. In: Proceedings of the ACM SIGKDD Workshop on Human Computation. 2010, 64–67

  28. Han T, Sun H, Song Y, Wang Z, Liu X. Budgeted task scheduling for crowdsourced knowledge acquisition. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. 2017, 1059–1068

  29. Callison-Burch C. Fast, cheap, and creative: evaluating translation quality using amazon’s mechanical turk. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. 2009, 286–295

  30. Hu C, Bederson B B, Resnik P. Translation by iterative collaboration between monolingual users. In: Proceedings of Graphics Interface 2010. 2010, 39–46

  31. Ambati V, Vogel S, Carbonell J. Active learning and crowd-sourcing for machine translation. In: Proceedings of the 7th International Conference on Language Resources and Evaluation. 2010

  32. Dong X L, Gabrilovich E, Heitz G, Horn W, Murphy K, Sun S, Zhang W. From data fusion to knowledge fusion. Proceedings of the VLDB Endowment, 2014, 7(10): 881–892

    Article  Google Scholar 

  33. Ma F, Li Y, Li Q, Qiu M, Gao J, Zhi S, Su L, Zhao B, Ji H, Han J. Faitcrowd: fine grained truth discovery for crowdsourced data aggregation. In: Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2015, 745–754

  34. Fang Y, Sun H, Chen P, Huai J. On the cost complexity of crowdsourcing. In: Proceedings of International Joint Conference on Artificial Intelligence. 2018, 1531–1537

  35. Luengo-Oroz M A, Arranz A, Frean J. Crowdsourcing malaria parasite quantification: an online game for analyzing images of infected thick blood smears. Journal of Medical Internet Research, 2012, 14(6): e167

    Article  Google Scholar 

  36. Kalman R E. A new approach to linear filtering and prediction problems. Journal of Basic Engineering, 1960, 82(1): 35–45

    Article  MathSciNet  Google Scholar 

  37. Sun H, Hu K, Fang Y, Song Y. Adaptive result inference for collecting quantitative data with crowdsourcing. IEEE Internet of Things Journal, 2017, 4(5): 1389–1398

    Article  Google Scholar 

  38. Dai P, Lin C H, Weld D S. Pomdp-based control of workflows for crowdsourcing. Artificial Intelligence, 2013, 202: 52–85

    Article  MathSciNet  Google Scholar 

  39. Dai P, Weld D S. Artificial intelligence for artificial artificial intelligence. In: Proceedings of the 25th AAAI Conference on Artificial Intelligence. 2011

  40. Fang Y, Sun H, Li G, Zhang R, Huai J. Context-aware result inference in crowdsourcing. Information Sciences, 2018, 460: 346–363

    Article  Google Scholar 

  41. Otani N, Baba Y, Kashima H. Quality control of crowdsourced classification using hierarchical class structures. Expert Systems with Applications, 2016, 58: 155–163

    Article  Google Scholar 

  42. Deng J, Dong W, Socher R, Li L J, Li K, Fei-Fei L. Imagenet: a large-scale hierarchical image database. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2009, 248–255

Download references

Acknowledgements

This work was supported partly by National Key Research and Development Program of China (2019YFB1705902), partly by the National Natural Science Foundation of China (Grant Nos. 61932007, 61972013, 61976187, 61421003). We thank Prof. Jinpeng Huai for his valuable support and contributions to this work. The authors would thank the anonymous reviewers for the helpful comments and suggestions to improve this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hailong Sun.

Additional information

Tao Han received the BS degree in the School of Mathematics and System Science, Beihang University, China in 2014. He is currently a PhD candidate in the School of Computer Science and Engineering, Beihang University, China. His research interests mainly include machine learning and human computation/crowdsourcing.

Hailong Sun received the BS degree in computer science from Beijing Jiaotong University, China in 2001. He received the PhD degree in computer software and theory from Beihang University, China in 2008. He is an associate professor in the School of Computer Science and Engineering, Beihang University, China. His research interests include crowdsourcing, software analytics, and distributed systems. He is a member of the Yangqiu Song received the BE and PhD degree from Tsinghua University, China in July 2003 and January 2009. He is now an assistant professor at the Department of CSE with a joint appointment at the Math Department at HKUST, China, associate director of WeChat-HKUST Joint Lab on Artificial Intelligence Technology WHATLab and HKUST-WeBank Joint Lab. His research interests mainly include machine learning, data mining, natural language processing, knowledge graph, information networks.

Yili Fang is currently an assistant professor in the school of computer and information engineering at the University of Zhejiang Gongshang, China. Yili completed his PhD at Beihang University, China. His research interests mainly include crowd computing/crowdsourcing, social computing and decision science.

Xudong Liu received the PhD degree in computer application technology from Beihang University, China. He is a professor and doctoral supervisor at Beihang University, China. His research interests mainly include middle-ware technology and applications, service-oriented computing, trusted network computing, and network software development.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Han, T., Sun, H., Song, Y. et al. Find truth in the hands of the few: acquiring specific knowledge with crowdsourcing. Front. Comput. Sci. 15, 154315 (2021). https://doi.org/10.1007/s11704-020-9364-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11704-020-9364-x

Keywords

Navigation