Skip to main content
Log in

Crowdsourcing: Descriptive Study on Algorithms and Frameworks for Prediction

  • Review article
  • Published:
Archives of Computational Methods in Engineering Aims and scope Submit manuscript

Abstract

Data mining, data analytics and data processing are three inter-related processes that are carried out on large volume of datasets. Data can be of any form such as text, numeric, ontology, alpha-numeric, images, video, and other multi-dimensional datasets. People dataset is one of the famous datasets from the above datasets. Crowdsourcing is used to solve the large size of data with people. The crowdsourcing input will be from a group of people by collecting a large number of people and analysis it is one the emerging technology, which initiate a new model for big data mining process. To define the nature of data, data mining is one of the traditional process for the exert in analytics domain. Data mining is an expensive process and it also take long time to complete the process. In industry and research area, crowdsourcing has become a very active component. Crowdsourcing uses smart phone users as volunteers and share their annotation process for different type of contributions. This paper is used to review about the bigdata mining from crowdsourcing in recent years. Using crowdsourcing the opportunities and challenges of data analytics are reviewed, and summarize the data analytics framework. Then it is discussed several algorithms of including applications, cost control, quality control, latency control and big data mining framework which must be consider in the field of crowdsourcing. Finally, the conclusion of this project tells about the data mining limitation and give some suggestions for future research in crowdsource data analytics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Howe J (2006) The rise of Crowdsourcing. Wired Magazine 14(6):1–4

    Google Scholar 

  2. Faridani S, Lee B, Glasscock S, Rappole J, Song D, Goldberg K (2009) A networked telerobotic observatory for collaborative remote observation of avian activity and range change. Elsevier, International Federation of Automatic Control

    Book  Google Scholar 

  3. Von Ahn L (2006) Games with a purpose. Computer 39(6):92–94

    Article  Google Scholar 

  4. Verykios VS et al (2007) Duplicate record detection: a survey. IEEE Trans Knowl Data Eng 19(1):1–16

    Article  Google Scholar 

  5. Getoor L, Machanavajjhala A (2012) Entity resolution: theory, practice & open challenges. In: Proceedings of the VLDB endowments, vol 5, no. 12

  6. Christen P (2012) The data matching process. In: Data matching. Data-centric system and application. Springer, pp 23–35

  7. Davidson S, Khanna S, Milo T, Roy S (2014) Top-K clustering with noisy comparisons. ACM Trans Database Syst, 39(4)

  8. Firmani D, Saha B, Srivastava D, Online entity resolution using an Oracle. Proceedings in VLDB Endowment, vol. 9, No. 5

  9. Verroios V, Garcia-Molina H (2015) Entity Resolution with crowd errors. In: 2015 IEEE 31st International Conference on Data Engineering, Seoul, pp 219–230

  10. Gruenheid A, Nushi B, Kraska T, GatterBAuer W, Kossmann D (2015) Fault-tolerant entity resolution with the Crowd, arXiv.Org, arXiv.1512.00537v1

  11. Wang J, Kraska T, Franklin MJ, Feng J (2012) CrowdER: crowdsourcing entity resolution. Proc VLDB Endowment 5(11):1483–1494

    Article  Google Scholar 

  12. Wang J, Li G, Kraska T, Frankline MJ, Feng J (2013) Leveraging transitive relations for crowdsourced joins. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pp 229–240

  13. Vesdapunt N, Bellare K, Dalvi N (2014) Crowdsourcing algorithm for entity resolution. In: Proceedings of the VLDB Endowment, vol 7, no. 12

  14. Yi J, Jin R, Jain S, Yang T, Jain AK (2012) Semi-crowdsourced clustering: generalizing crowd labeling by robust distance metric learning. Adv Neural Inf Process Syst 25(1):1–9

    Google Scholar 

  15. Whang SE, Lofgern P, Garcia-Molina H (2013) Question selection for crowd entity resolution. Proc VLDB Endowment 6(6):349–360

    Article  Google Scholar 

  16. Demartini G, Difallah DE, Cudre-Mauroux P (2012) ZenCrowd: leveraging probabilistic reasoning and crowdsourcing technique for large-scale entity linking. In: Proceedings of the 21st International Conference on World Wide Web, pp 469–478

  17. Mazumdar A, Saha B (2017) A theoretical analysis of first heuristics of crowdsourced entity resolution. In: AAAI'17: Proceedings of the thirty-first AAAI conference on artificial intelligence, pp 970–976

  18. Sheng VS, Provost F, Ipeirotis PG (2008) Get another label? Improving data quality and data mining using multiple, noisy labellers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge discovery and data mining, pp 614–622

  19. Salehian H, Howell P, Lee C (2017) Matching restaurant menus to crowdsourced food data: a scalable machine learning approach. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 2001–2009

  20. Welinder P, Branson S, Belongie S, Perona P (2010) The multidimensional wisdom of crowds. Adv Neural Inf Process Syst 23(1):1–9

  21. Bonald T, Combes R (2017) A minimax optimal algorithm for crowdsourcing. In: Proceedings of the 31st international conference on neural information processing systems, pp 4355–4363

  22. OferDekel and Ohad Shamir on VoxPopuli: Collecting High-Quality Labels from a Crowd in Twenty-Second Annual Conference on Learning Theory, 2009.

  23. Shi Z et al (2017) Leveraging crowdsourcing for efficient malicious users detection in large-scale social networks. IEEE Internet Things J 4(2):330–339

    Article  Google Scholar 

  24. Rogstadius J et al (2013) Crisis tracker: crowdsourced social media curation for disaster awareness. IBM J Res Develop 57(5):1–13

    Article  Google Scholar 

  25. Gomes RY, Welinder P, Krause A, Perona P (2011) Crowdclustering. Neural Information Processing Systems (NIPS)

  26. Mazumdas A, Saha B (2017) Clustering with noisy queries. Neural Information Processing System (NIPS)

  27. Vinayak RK, Hassibi B (2016) Crowdsourced clustering: querying edges vs. triangles. Advances in Neural Information Processing System (NIPS)

  28. Ukkonen A (2017) Crowdsourced correlation clustering with relative distance comparisons. In: 2017 IEEE International Conference on Data Mining (ICDM), pp. 1117–1122

  29. Wauthier FL, Jojic N, Jordan MI (2012) Active spectral clustering via iterative uncertainty reduction. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge discovery and Data Mining, pp 1339–1347

  30. Jiang H et al (2018) Fuzzy clustering of crowdsourced test reports for apps. ACM Trans Internet Technol 18(2):1–28

    Article  Google Scholar 

  31. Xu Q et al. (2017) Exploring outlier in crowdsourced ranking for QoE. In: Proceedings of the 25th ACM International Conference on Multimedia, pp 1540–1548

  32. Zhuang H, Parameswaran A, Roth D, Han J (2015) Debiasing Crowdsourcing Batches. In: Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1593–1602, 2015.

  33. Sun C, NarasimhanRampalli, Yang F, Doan A (2014) Chimera: Large-Scale Classification using Machine Learning, Rules, and Crowdsourcing. In: Proceedings of the VLDB Endowment, Vol. 7, No. 13, pp 1529–1540, 2014.

  34. Lease M (2011) On quality control and machine learning in crowdsourcing. Association for the Advancement of Artificial Intelligence

  35. Burrows S, Potthast M, Stein B (2013) Paraphrase acquisition via crowdsourcing and machine learning. ACM Trans Intell Syst Technol 4(3):1–21

    Article  Google Scholar 

  36. Cheng J, Bernstein MS (2015) Flock: hybrid crowd- machine learning classifiers. In: Proceedings of the 8th ACM Conference on Computer Supported Cooperative Work & Social Computing, pp 600–611

  37. Kamar E, Hacker S, Horvitz, Combining human and machine intelligence in large-scale crowdsourcing. In: Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems

  38. Brabham DC (2013) Crowdsourcing. The MIT Press, Cambridge

    Book  Google Scholar 

  39. Law E, Ahn LV (2011) Human computation. Synthesis lectures on artificial intelligence and machine learning. Morgan & Claypool Publishers, San Rafael

    Google Scholar 

  40. Michelucci P (2013) Handbook of Human Computation. Springer, Incorporated, New York

    Book  Google Scholar 

  41. Franklin MJ et al. (2011) CrowdDB: answering queries with crowdsourcing. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, pp. 61–72

  42. Alt F et al. (2010) Location-based crowdsourcing: extending crowdsourcing to the real world. In: Proceedings of the 6th Nordic Conference on Human-Computer Interaction: extending boundaries, pp. 13–22

  43. Georgios G, Konstantinidis A, Christos L, Zeinalipour-Yazti D, Crowdsourcing with smartphones. IEEE Internet Comput. 36–44

  44. Gupta A, Thies W, Cutrell E, BalaKrishnan R (2012) “mClerk: enabling mobile crowdsourcing in developing regions. In: Proceedings of the SIGCHI Conference on Human Factors in Computing System, pp. 1843–1852

  45. Charoy F, Benouaret K, Valliyur-Ramalingam R (2013) Answering complex location -based queries with crowdsourcing. In: 9th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing, Austin, pp 438–447

  46. Kazemi L, Shahabi C (2012) GeoCrowd: enabling query answering with Spatial crowdsourcing”, Proceedings of the 20th International Conference on Advance in Geographic Information Systems, pp. 189–198, 2012.

  47. Mea VD, Maddalena E, Mizzaro S (2012) Crowdsourcing to mobile users: a study of the role of platform and tasks. In: Proceedings of the 20th international conference on advances in geographic information systems, pp 189–198

  48. Yan T, Marzilli M, Holmes R, Ganesan R, Corner M (2009) mCrowd: a platform for mobile crowdsourcing. In: Proceedings of the 7th ACM conference on embedded networked sensor systems, pp 347–348

  49. Guo S, Parameswaran A (2012) Hector Garcia-Molina, “So who won?: dynamic max discovery with the crowd. In: Proceedings of the 2012 ACM SIGMOD international conference on management of data, pp 385–396

  50. Parameswaran AG et al. (2012) CrowdScreen: algorithm for filtering data with humans. In: Proceedings of the 2012 ACM SICMOD international conference on management of data, pp 361–372

  51. Sarma AD, Parameswaran A, Garcia-Molina H, Halevy A (2014) Crowd- powered find algorithm. In: 2014 IEEE 30th international conference on data engineering, Chicago, pp 964–975

  52. Dawid AP, Skene AM (1979) Maximum likelihood estimation of observer error-rates using the EM algorithm. J R Statist Soc, pp 20–28

  53. Hui SL, Walter SD (1980) Estimating the error rates of diagnostics tests. Int Biometric Soc 36(1):167–171

    Article  Google Scholar 

  54. Smyth P, Fayyad U, Burl M, Perona P, Baldi P (1995) Inferreing ground truth from subjective labelling of venus images. Adv Neural Inf Process Syst, pp 1085–1092.

  55. Albert PS, Dodd LE (2004) A cautionary note on the robustness of latent class models for estimating diagnostic error without a gold standard. J Int Biometr Soc, 60(2)

  56. Raykar VC et al. (2010) Learning from Crowds. J Mach Learn Res, 1297–1322

  57. Liu Q, Peng J, Ihler AT (2012) Variation inference for crowdsourcing. Adv Neural Inf Process Syst

  58. Welinder P, Perona P (2010) Online crowdsourcing: rating annotators and obtaining cost-effective labels. In: 2010 IEEE computer society conference on computer vision and pattern recognition- workshops, San Francisco, pp. 25–32

  59. Karger DR, Oh S, Shah D (2011) Iterative learning for reliable crowdsourcing systems. Adv Neural Inf Process Syst 24(1):1–9

    MATH  Google Scholar 

  60. Karger DR, Oh S, Shah D (2013) Efficient crowdsourcing for multi-class labelling. In: Proceedings of the ACM SIGMETRICS performance evaluation review, vol 41, no. 1, pp. 81–92, 2013.

  61. Karger DR, Oh S, Shah D (2014) Budget-optimal task allocation for reliable crowdsourcing systems. Oper Res 62(1):1–24

    Article  Google Scholar 

  62. Ghosh A, Kale S, McAfee P (2011) Who moderates the moderators?: crowdsourcing abuse detection in users-generated content. In: Proceedings of the 12th ACM conference on electronic commerce, pp 167–176

  63. Dalvi N, Dasgupta A, Kumar R, VibhorRastogi (2013) Aggregating crowdsourced binary ratings. In: Proceedings of the 22nd international conference on world wide web, pp 285–294

  64. Gao C, Zhou D (2015) Minimax optimal convergency rates for estimating ground truth from crowdsourced labels. arXiv: 1310.5764v6

  65. Zhang Y, Chen X, Zhou D, Jordan MI (2016) Spectral methods meet EM: a provably optimal algorithm for crowdsourcing. J Mach Learn Res 17(1):1–44

    MathSciNet  MATH  Google Scholar 

  66. Wang D et al. (2013) Recursive fact-finding: a streaming approach to truth estimation in crowdsourcing applications. In: 2013 IEEE 33rd international conference on distributed computing systems, pp 530–539

  67. Baba Y, Kashima H (2013) Statistical quality estimation for general crowdsourcing tasks. In: 19th ACMSIGKDDC conference knowledge discovery and data mining (KDD), (Baba and Kashima 2013).

  68. Ma F, Li Y, Li Q, MinghuiQiu, Gao J, Zhi S (2015) FaitCrowd (2015): fine grained truth discovery for crowdsourced data aggregation. In: KDD’15, 2015, Sydney, NSW, Australia, pp 745–754

  69. Stantchev V et al (2015) Cloud computing service for knowledge assessment and studies recommendation in crowdsourcing and collaborative learning environment based on social network analysis. Comput Hum Behav 15:762–770

    Article  Google Scholar 

  70. Najafabadi MM, Villanustre F, Khoshgoftaar TM et al (2015) Deep learning applications and challenges in big data analytics. J Big Data 2(1):1–21

    Article  Google Scholar 

  71. Doroudi S, Kamar E, Brunskill E, Horvitz E (2016) Toward a learning science for complex crowdsourcing tasks. In: Proceedings of the00202016 CHI conference on human factor in computing systems, pp 2623–2634

  72. Basharat A, Budak I, Rasheed K (2016) Leveraging crowdsourcing for the thematic annotation of the Qur’an”. In: Proceedings of the international conference on world wide web

  73. Alsheikh MA, DusitNiyato Lin S, Tan H-P, Han Z (2016) Mobile big data analytics using deep learning and apache spark. IEEE Netw 30(3):22–29

    Article  Google Scholar 

  74. Chen M, Yang J, Hu L, Shamim Hossain M, Muhammad G (2018) Urban healthcare big data system based on crowdsourced and cloud-based air quality indicators. IEEE Commun Magazine 56(11):14–20

    Article  Google Scholar 

  75. Liu S, Chen C, Lu Y, Ouyang F, Wang B (2019) An interactive method to improve crowdsourced annotations. IEEE Trans Visual Comput Graphics 25(1):235–245

    Article  Google Scholar 

  76. Kong X, Li M, Tang T, Tian K, Moreira-Matias L, Xia F (2018) Shared subway shuttle bus route planning based on transport data analytics. IEEE Trans Autom Sci Eng 15(4):1507–1520

    Article  Google Scholar 

  77. Birkin M (2019) Spatial data analytics of mobility with Consumer data. J Transp Geogr 76:245–253

    Article  Google Scholar 

  78. Rahman MM, Roy C (2018) Effective reformulation of query for code search using crowdsourcing knowledge and extra-large data analytics 2018. In: IEEE international conference on software maintenance and evolution (ICSME), pp 473–484

  79. Berhmer M, Lee B, Isenberg P, Choe E (2019) Visualizing ranges over time on mobile phones: a task-based crowdsourced evaluation. IEEE Trans Conf Visual Comput Graphics 25(1):619–629

    Article  Google Scholar 

  80. Yoonjiung K, Choong-Kikima, Dong K, Hyun-woo L, Rogelio II. T A (2019) Quantifying naturebased tourism in protected areas in development countries by using social big data. Tourism Manag, 72, 249–256

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to K. Dhinakaran.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dhinakaran, K., Nedunchelian, R. & Balasundaram, A. Crowdsourcing: Descriptive Study on Algorithms and Frameworks for Prediction. Arch Computat Methods Eng 29, 357–374 (2022). https://doi.org/10.1007/s11831-021-09577-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11831-021-09577-8

Navigation