Crowdsourcing: Descriptive Study on Algorithms and Frameworks for Prediction

Dhinakaran, K.; Nedunchelian, R.; Balasundaram, A.

doi:10.1007/s11831-021-09577-8

Crowdsourcing: Descriptive Study on Algorithms and Frameworks for Prediction

Review article
Published: 04 April 2021

Volume 29, pages 357–374, (2022)
Cite this article

Archives of Computational Methods in Engineering Aims and scope Submit manuscript

K. Dhinakaran¹,
R. Nedunchelian² &
A. Balasundaram³

518 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

Data mining, data analytics and data processing are three inter-related processes that are carried out on large volume of datasets. Data can be of any form such as text, numeric, ontology, alpha-numeric, images, video, and other multi-dimensional datasets. People dataset is one of the famous datasets from the above datasets. Crowdsourcing is used to solve the large size of data with people. The crowdsourcing input will be from a group of people by collecting a large number of people and analysis it is one the emerging technology, which initiate a new model for big data mining process. To define the nature of data, data mining is one of the traditional process for the exert in analytics domain. Data mining is an expensive process and it also take long time to complete the process. In industry and research area, crowdsourcing has become a very active component. Crowdsourcing uses smart phone users as volunteers and share their annotation process for different type of contributions. This paper is used to review about the bigdata mining from crowdsourcing in recent years. Using crowdsourcing the opportunities and challenges of data analytics are reviewed, and summarize the data analytics framework. Then it is discussed several algorithms of including applications, cost control, quality control, latency control and big data mining framework which must be consider in the field of crowdsourcing. Finally, the conclusion of this project tells about the data mining limitation and give some suggestions for future research in crowdsource data analytics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Toward Crowdsourcing Data Mining

A Review on Crowdsourcing Models in Different Sectors

From crowdsourcing to crowdmining: using implicit human intelligence for better understanding of crowdsourced data

Article 31 August 2019

References

Howe J (2006) The rise of Crowdsourcing. Wired Magazine 14(6):1–4
Google Scholar
Faridani S, Lee B, Glasscock S, Rappole J, Song D, Goldberg K (2009) A networked telerobotic observatory for collaborative remote observation of avian activity and range change. Elsevier, International Federation of Automatic Control
Book Google Scholar
Von Ahn L (2006) Games with a purpose. Computer 39(6):92–94
Article Google Scholar
Verykios VS et al (2007) Duplicate record detection: a survey. IEEE Trans Knowl Data Eng 19(1):1–16
Article Google Scholar
Getoor L, Machanavajjhala A (2012) Entity resolution: theory, practice & open challenges. In: Proceedings of the VLDB endowments, vol 5, no. 12
Christen P (2012) The data matching process. In: Data matching. Data-centric system and application. Springer, pp 23–35
Davidson S, Khanna S, Milo T, Roy S (2014) Top-K clustering with noisy comparisons. ACM Trans Database Syst, 39(4)
Firmani D, Saha B, Srivastava D, Online entity resolution using an Oracle. Proceedings in VLDB Endowment, vol. 9, No. 5
Verroios V, Garcia-Molina H (2015) Entity Resolution with crowd errors. In: 2015 IEEE 31st International Conference on Data Engineering, Seoul, pp 219–230
Gruenheid A, Nushi B, Kraska T, GatterBAuer W, Kossmann D (2015) Fault-tolerant entity resolution with the Crowd, arXiv.Org, arXiv.1512.00537v1
Wang J, Kraska T, Franklin MJ, Feng J (2012) CrowdER: crowdsourcing entity resolution. Proc VLDB Endowment 5(11):1483–1494
Article Google Scholar
Wang J, Li G, Kraska T, Frankline MJ, Feng J (2013) Leveraging transitive relations for crowdsourced joins. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pp 229–240
Vesdapunt N, Bellare K, Dalvi N (2014) Crowdsourcing algorithm for entity resolution. In: Proceedings of the VLDB Endowment, vol 7, no. 12
Yi J, Jin R, Jain S, Yang T, Jain AK (2012) Semi-crowdsourced clustering: generalizing crowd labeling by robust distance metric learning. Adv Neural Inf Process Syst 25(1):1–9
Google Scholar
Whang SE, Lofgern P, Garcia-Molina H (2013) Question selection for crowd entity resolution. Proc VLDB Endowment 6(6):349–360
Article Google Scholar
Demartini G, Difallah DE, Cudre-Mauroux P (2012) ZenCrowd: leveraging probabilistic reasoning and crowdsourcing technique for large-scale entity linking. In: Proceedings of the 21st International Conference on World Wide Web, pp 469–478
Mazumdar A, Saha B (2017) A theoretical analysis of first heuristics of crowdsourced entity resolution. In: AAAI'17: Proceedings of the thirty-first AAAI conference on artificial intelligence, pp 970–976
Sheng VS, Provost F, Ipeirotis PG (2008) Get another label? Improving data quality and data mining using multiple, noisy labellers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge discovery and data mining, pp 614–622
Salehian H, Howell P, Lee C (2017) Matching restaurant menus to crowdsourced food data: a scalable machine learning approach. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 2001–2009
Welinder P, Branson S, Belongie S, Perona P (2010) The multidimensional wisdom of crowds. Adv Neural Inf Process Syst 23(1):1–9
Bonald T, Combes R (2017) A minimax optimal algorithm for crowdsourcing. In: Proceedings of the 31st international conference on neural information processing systems, pp 4355–4363
OferDekel and Ohad Shamir on VoxPopuli: Collecting High-Quality Labels from a Crowd in Twenty-Second Annual Conference on Learning Theory, 2009.
Shi Z et al (2017) Leveraging crowdsourcing for efficient malicious users detection in large-scale social networks. IEEE Internet Things J 4(2):330–339
Article Google Scholar
Rogstadius J et al (2013) Crisis tracker: crowdsourced social media curation for disaster awareness. IBM J Res Develop 57(5):1–13
Article Google Scholar
Gomes RY, Welinder P, Krause A, Perona P (2011) Crowdclustering. Neural Information Processing Systems (NIPS)
Mazumdas A, Saha B (2017) Clustering with noisy queries. Neural Information Processing System (NIPS)
Vinayak RK, Hassibi B (2016) Crowdsourced clustering: querying edges vs. triangles. Advances in Neural Information Processing System (NIPS)
Ukkonen A (2017) Crowdsourced correlation clustering with relative distance comparisons. In: 2017 IEEE International Conference on Data Mining (ICDM), pp. 1117–1122
Wauthier FL, Jojic N, Jordan MI (2012) Active spectral clustering via iterative uncertainty reduction. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge discovery and Data Mining, pp 1339–1347
Jiang H et al (2018) Fuzzy clustering of crowdsourced test reports for apps. ACM Trans Internet Technol 18(2):1–28
Article Google Scholar
Xu Q et al. (2017) Exploring outlier in crowdsourced ranking for QoE. In: Proceedings of the 25th ACM International Conference on Multimedia, pp 1540–1548
Zhuang H, Parameswaran A, Roth D, Han J (2015) Debiasing Crowdsourcing Batches. In: Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1593–1602, 2015.
Sun C, NarasimhanRampalli, Yang F, Doan A (2014) Chimera: Large-Scale Classification using Machine Learning, Rules, and Crowdsourcing. In: Proceedings of the VLDB Endowment, Vol. 7, No. 13, pp 1529–1540, 2014.
Lease M (2011) On quality control and machine learning in crowdsourcing. Association for the Advancement of Artificial Intelligence
Burrows S, Potthast M, Stein B (2013) Paraphrase acquisition via crowdsourcing and machine learning. ACM Trans Intell Syst Technol 4(3):1–21
Article Google Scholar
Cheng J, Bernstein MS (2015) Flock: hybrid crowd- machine learning classifiers. In: Proceedings of the 8th ACM Conference on Computer Supported Cooperative Work & Social Computing, pp 600–611
Kamar E, Hacker S, Horvitz, Combining human and machine intelligence in large-scale crowdsourcing. In: Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems
Brabham DC (2013) Crowdsourcing. The MIT Press, Cambridge
Book Google Scholar
Law E, Ahn LV (2011) Human computation. Synthesis lectures on artificial intelligence and machine learning. Morgan & Claypool Publishers, San Rafael
Google Scholar
Michelucci P (2013) Handbook of Human Computation. Springer, Incorporated, New York
Book Google Scholar
Franklin MJ et al. (2011) CrowdDB: answering queries with crowdsourcing. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, pp. 61–72
Alt F et al. (2010) Location-based crowdsourcing: extending crowdsourcing to the real world. In: Proceedings of the 6th Nordic Conference on Human-Computer Interaction: extending boundaries, pp. 13–22
Georgios G, Konstantinidis A, Christos L, Zeinalipour-Yazti D, Crowdsourcing with smartphones. IEEE Internet Comput. 36–44
Gupta A, Thies W, Cutrell E, BalaKrishnan R (2012) “mClerk: enabling mobile crowdsourcing in developing regions. In: Proceedings of the SIGCHI Conference on Human Factors in Computing System, pp. 1843–1852
Charoy F, Benouaret K, Valliyur-Ramalingam R (2013) Answering complex location -based queries with crowdsourcing. In: 9th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing, Austin, pp 438–447
Kazemi L, Shahabi C (2012) GeoCrowd: enabling query answering with Spatial crowdsourcing”, Proceedings of the 20th International Conference on Advance in Geographic Information Systems, pp. 189–198, 2012.
Mea VD, Maddalena E, Mizzaro S (2012) Crowdsourcing to mobile users: a study of the role of platform and tasks. In: Proceedings of the 20th international conference on advances in geographic information systems, pp 189–198
Yan T, Marzilli M, Holmes R, Ganesan R, Corner M (2009) mCrowd: a platform for mobile crowdsourcing. In: Proceedings of the 7th ACM conference on embedded networked sensor systems, pp 347–348
Guo S, Parameswaran A (2012) Hector Garcia-Molina, “So who won?: dynamic max discovery with the crowd. In: Proceedings of the 2012 ACM SIGMOD international conference on management of data, pp 385–396
Parameswaran AG et al. (2012) CrowdScreen: algorithm for filtering data with humans. In: Proceedings of the 2012 ACM SICMOD international conference on management of data, pp 361–372
Sarma AD, Parameswaran A, Garcia-Molina H, Halevy A (2014) Crowd- powered find algorithm. In: 2014 IEEE 30th international conference on data engineering, Chicago, pp 964–975
Dawid AP, Skene AM (1979) Maximum likelihood estimation of observer error-rates using the EM algorithm. J R Statist Soc, pp 20–28
Hui SL, Walter SD (1980) Estimating the error rates of diagnostics tests. Int Biometric Soc 36(1):167–171
Article Google Scholar
Smyth P, Fayyad U, Burl M, Perona P, Baldi P (1995) Inferreing ground truth from subjective labelling of venus images. Adv Neural Inf Process Syst, pp 1085–1092.
Albert PS, Dodd LE (2004) A cautionary note on the robustness of latent class models for estimating diagnostic error without a gold standard. J Int Biometr Soc, 60(2)
Raykar VC et al. (2010) Learning from Crowds. J Mach Learn Res, 1297–1322
Liu Q, Peng J, Ihler AT (2012) Variation inference for crowdsourcing. Adv Neural Inf Process Syst
Welinder P, Perona P (2010) Online crowdsourcing: rating annotators and obtaining cost-effective labels. In: 2010 IEEE computer society conference on computer vision and pattern recognition- workshops, San Francisco, pp. 25–32
Karger DR, Oh S, Shah D (2011) Iterative learning for reliable crowdsourcing systems. Adv Neural Inf Process Syst 24(1):1–9
MATH Google Scholar
Karger DR, Oh S, Shah D (2013) Efficient crowdsourcing for multi-class labelling. In: Proceedings of the ACM SIGMETRICS performance evaluation review, vol 41, no. 1, pp. 81–92, 2013.
Karger DR, Oh S, Shah D (2014) Budget-optimal task allocation for reliable crowdsourcing systems. Oper Res 62(1):1–24
Article Google Scholar
Ghosh A, Kale S, McAfee P (2011) Who moderates the moderators?: crowdsourcing abuse detection in users-generated content. In: Proceedings of the 12th ACM conference on electronic commerce, pp 167–176
Dalvi N, Dasgupta A, Kumar R, VibhorRastogi (2013) Aggregating crowdsourced binary ratings. In: Proceedings of the 22nd international conference on world wide web, pp 285–294
Gao C, Zhou D (2015) Minimax optimal convergency rates for estimating ground truth from crowdsourced labels. arXiv: 1310.5764v6
Zhang Y, Chen X, Zhou D, Jordan MI (2016) Spectral methods meet EM: a provably optimal algorithm for crowdsourcing. J Mach Learn Res 17(1):1–44
MathSciNet MATH Google Scholar
Wang D et al. (2013) Recursive fact-finding: a streaming approach to truth estimation in crowdsourcing applications. In: 2013 IEEE 33rd international conference on distributed computing systems, pp 530–539
Baba Y, Kashima H (2013) Statistical quality estimation for general crowdsourcing tasks. In: 19th ACMSIGKDDC conference knowledge discovery and data mining (KDD), (Baba and Kashima 2013).
Ma F, Li Y, Li Q, MinghuiQiu, Gao J, Zhi S (2015) FaitCrowd (2015): fine grained truth discovery for crowdsourced data aggregation. In: KDD’15, 2015, Sydney, NSW, Australia, pp 745–754
Stantchev V et al (2015) Cloud computing service for knowledge assessment and studies recommendation in crowdsourcing and collaborative learning environment based on social network analysis. Comput Hum Behav 15:762–770
Article Google Scholar
Najafabadi MM, Villanustre F, Khoshgoftaar TM et al (2015) Deep learning applications and challenges in big data analytics. J Big Data 2(1):1–21
Article Google Scholar
Doroudi S, Kamar E, Brunskill E, Horvitz E (2016) Toward a learning science for complex crowdsourcing tasks. In: Proceedings of the00202016 CHI conference on human factor in computing systems, pp 2623–2634
Basharat A, Budak I, Rasheed K (2016) Leveraging crowdsourcing for the thematic annotation of the Qur’an”. In: Proceedings of the international conference on world wide web
Alsheikh MA, DusitNiyato Lin S, Tan H-P, Han Z (2016) Mobile big data analytics using deep learning and apache spark. IEEE Netw 30(3):22–29
Article Google Scholar
Chen M, Yang J, Hu L, Shamim Hossain M, Muhammad G (2018) Urban healthcare big data system based on crowdsourced and cloud-based air quality indicators. IEEE Commun Magazine 56(11):14–20
Article Google Scholar
Liu S, Chen C, Lu Y, Ouyang F, Wang B (2019) An interactive method to improve crowdsourced annotations. IEEE Trans Visual Comput Graphics 25(1):235–245
Article Google Scholar
Kong X, Li M, Tang T, Tian K, Moreira-Matias L, Xia F (2018) Shared subway shuttle bus route planning based on transport data analytics. IEEE Trans Autom Sci Eng 15(4):1507–1520
Article Google Scholar
Birkin M (2019) Spatial data analytics of mobility with Consumer data. J Transp Geogr 76:245–253
Article Google Scholar
Rahman MM, Roy C (2018) Effective reformulation of query for code search using crowdsourcing knowledge and extra-large data analytics 2018. In: IEEE international conference on software maintenance and evolution (ICSME), pp 473–484
Berhmer M, Lee B, Isenberg P, Choe E (2019) Visualizing ranges over time on mobile phones: a task-based crowdsourced evaluation. IEEE Trans Conf Visual Comput Graphics 25(1):619–629
Article Google Scholar
Yoonjiung K, Choong-Kikima, Dong K, Hyun-woo L, Rogelio II. T A (2019) Quantifying naturebased tourism in protected areas in development countries by using social big data. Tourism Manag, 72, 249–256

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Rajalakshmi Institute of Technology, Anna University, Chennai, Tamil Nadu, India
K. Dhinakaran
Department of Electronics and Communication Engineering, Karpaga Vinayaga College of Engineering and Technology, Anna University, Chennai, Tamil Nadu, India
R. Nedunchelian
School of Computer Science and Engineering, Center for Cyber Physical Systems, Vellore Institute of Technology (VIT), Chennai, Tamil Nadu, India
A. Balasundaram

Authors

K. Dhinakaran
View author publications
You can also search for this author in PubMed Google Scholar
R. Nedunchelian
View author publications
You can also search for this author in PubMed Google Scholar
A. Balasundaram
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to K. Dhinakaran.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dhinakaran, K., Nedunchelian, R. & Balasundaram, A. Crowdsourcing: Descriptive Study on Algorithms and Frameworks for Prediction. Arch Computat Methods Eng 29, 357–374 (2022). https://doi.org/10.1007/s11831-021-09577-8

Download citation

Received: 20 September 2020
Accepted: 20 March 2021
Published: 04 April 2021
Issue Date: January 2022
DOI: https://doi.org/10.1007/s11831-021-09577-8

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Crowdsourcing: Descriptive Study on Algorithms and Frameworks for Prediction

Abstract

Access this article

Similar content being viewed by others

Toward Crowdsourcing Data Mining

A Review on Crowdsourcing Models in Different Sectors

From crowdsourcing to crowdmining: using implicit human intelligence for better understanding of crowdsourced data

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Navigation

Crowdsourcing: Descriptive Study on Algorithms and Frameworks for Prediction

Abstract

Access this article

Similar content being viewed by others

Toward Crowdsourcing Data Mining

A Review on Crowdsourcing Models in Different Sectors

From crowdsourcing to crowdmining: using implicit human intelligence for better understanding of crowdsourced data

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation