General framework, opportunities and challenges for crowdsourcing techniques: A Comprehensive survey

https://doi.org/10.1016/j.jss.2020.110611Get rights and content

Highlights

  • Analyze existing definitions and present a simple definition of crowdsourcing.

  • Introduce a systematic framework, providing workflow of the crowdsourcing process.

  • Analyze the existing literature including surveys, reviews and general applications.

  • Provide emerged techniques and approaches for different components of the framework.

  • Explore significant research challenges and identify possible future research directions.

Abstract

Crowdsourcing, a distributed human problem-solving paradigm is an active research area which has attracted significant attention in the fields of computer science, business, and information systems. Crowdsourcing holds novelty with advantages like open innovation, scalability, and cost-efficiency. Although considerable research work is performed, however, a survey on the crowdsourcing process-technology has not been divulged yet. In this paper, we present a systematic survey of crowdsourcing in focussing emerging techniques and approaches for improving conventional and developing future crowdsourcing systems. We first present a simplified definition of crowdsourcing. Then, we propose a framework based on three major components, synthesize a wide spectrum of existing studies for various dimensions of the framework. According to the framework, we first introduce the initialization step, including task design, task settings, and incentive mechanisms. Next, in the implementation step, we look into task decomposition, crowd and platform selection, and task assignment. In the last step, we discuss different answer aggregation techniques, validation methods and reward tactics, and reputation management. Finally, we identify open issues and suggest possible research directions for the future.

Introduction

Crowdsourcing (CS), is an emerging paradigm that has been receiving great attraction since 2006. It is a method of solving a specific set of functions by outsourcing and utilizing distributed human computational capabilities through the Internet. Crowdsourcing reveals new approaches to harvest distributed human intelligence and provides unprecedented opportunities to people in sharing their observations and knowledge with the rest of the world Brabham (2008). This form of harvesting human wisdom has proven to be a promising problem-solving and business production model Allahbakhsh et al. (2013). The web-based model is capable of leveraging ingenuity, aggregating talent, collaborative intelligence, and reducing cost and processing time. Now, it’s widespread in supporting various applications and has achieved remarkable feats in the image or video classification or labeling Krizhevsky et al. (2012), natural language processing Inel et al. (2014), software development Design and Build High-Quality Software with Crowdsourcing (0000), character recognition von Ahn et al. (2008), designing fabrics Threadless (0000), and many other applications. Different areas making use of crowdsourcing are shown in Fig. 1.

Crowdsourcing is a neologism which refers to the compound fusion of three key defining elements, i.e., ‘crowd’, ‘outsourcing’ and ‘social web’ Saxton et al. (2013) as shown in Fig. 2. The success of outsourcing and advanced Internet technologies are significant factors for the attention to crowdsourcing. The evolution of Internet technology and its ubiquitous access gives a new dimension to this phenomenon in providing user interactivity and bringing massive intelligence to solve various problems at an affordable price. The tremendous growth of crowdsourcing is its inherited power of parallel processing, i.e., it enables to perform multiple tasks simultaneously, reducing the time as well. For example, several people working simultaneously on labeling images reduce the overall time required. Another reason is that some tasks that are hard for machines or individuals like audio translation and image tagging can be easily performed using crowdsourcing. People are always ready to do tasks for micro-payments, and in some instances, no payment which decreases overall expenses in generating quality work.

The notion of crowdsourcing is by no means new; the concept was developed in the 18th century and has been used for many years. In 1714, the British government required to develop a solution and offered £ 20,000 through an open call for so-called The Longitude Problem Sobel (2007), as this made sailing dangerous, killing a large number of sailors or lost on unknown islands due to absence of navigational parameters. This first-ever event of crowdsourcing, in which the winner was John Harrison, a clockmaker and a self-taught English carpenter Chrum (0000). In 1810, when Napoleon expanded his European empire, a large number of soldiers were employed in the armies, so it was required to preserve the food. The French government announced and offered 12,000 francs to the person who invented a practical method to store and avoid wastage of food. A prize was awarded for designing canned food Stol et al. (2017).

In 1884, crowd corrected and updated catalog of the Oxford English Dictionary Lynch (0000). The famous Japanese motor company arranged a public competition of logo design in 1936. The new logo ’Toyota’ was chosen from 27,000 submissions received from the crowd From “TOYODA” to “TOYOTA” (0000). In 1955, the Australian government held a contest to design a building for Sydney harbor. The winner among 233 contestants got the prize of £ 5,000. The winning design of Sydney’s Opera house is one of state of the art crowdsourced architectural designs Stol et al. (2017). During early 1998, Eli Lilly, the American multinational pharmaceutical company, created an online platform ‘InnoCentive’ that deals with business intelligence and embrace the power of the crowd InnoCentive (0000). These are few earlier crowd related examples from the past.

The concept of crowdsourcing expanded rapidly and continued to take hold in the 21st century. In 2003, the online encyclopedia Wikipedia was one of the best illustrations in acquiring collective wisdom and knowledge through crowdsourcing. The online video community YouTube initiated in 2005, is an example of crowdsourcing entertainment. Numerous other companies registered in Fortune 500 Fortune Data Store (0000) are dependent on crowdsourcing to solve different tasks. In 2009, NESTA in the UK announced three winners among 355 groups in the Big Green Challenge for reducing CO2 emissions, and each received a prize of GBP 300,000 Marjanovic et al. (2012). Above examples exhibit the power and revolutionary nature of the crowd in performing different tasks.

Since crowdsourcing has been an active research area in the past few years, there have been a variety of surveys discussing this area. One kind of surveys is concentrated, as mainly focus on the single aspect of crowdsourcing such as incentive engineering techniques in crowdsourcing Muldoon et al. (2018), privacy-preserving issues in crowdsourcing Alkharashi and Renauld (2018), quality control in crowdsourcing Jin et al. (2018); Daniel et al. (2018), applications of crowdsourcing in software engineering Mao et al. (2017); Sar et al. (2019), medical image analysis Orting et al. (2019) and data analytics Li et al. (2016), or statistical analysis of crowdsourcing research Tarrell et al. (2013); Aris (2017).

Another kind of surveys has attempted to comprehensively discuss crowdsourcing in different aspects Yuen et al. (2011); Hetmank (2013); Zhao and Zhu (2014); Chittilappilly et al. (2016); Ghezzi et al. (2017); Nassar and Karray (2019). Yuen et al. Yuen et al. (2011) presented a literature survey on different aspects of crowdsourcing systems. In addition to a taxonomy of CS, they categorized the studies in terms of the applications (voting systems, information sharing system, games, and creative systems), algorithms, performances (user participation, quality management, and cheating detection), and available datasets. Hetmank Hetmank (2013) focuses on system architecture design useful for supporting the process of crowdsourcing. Through a systematic literature review, the author identified components and functions that are implemented in traditional CS systems. Four components along with their functions were presented including user management (registration, evaluation, group formation and coordination mechanisms), task management (design and assign task), contribution management (evaluate and select contribution), and workflow management (define and manage workflow for a task). Zhao et al. Zhao and Zhu (2014) and Ghezzi et al. Ghezzi et al. (2017) summarized the current status and future avenues of research in crowdsourcing from a global view of information and systems; however, technical details are missed in these studies. While Ghezzi et al. (2017) introduced the framework for the CS process, where process-technology details are missed in this work. Chittilappilly et al. Chittilappilly et al. (2016) provided a brief description and limitations of existing technologies for solving various CS problems. They review existing studies in areas of workers’ motivation and engagement, task allocation, and quality control mechanisms. They also discussed the implementation of different CS techniques on a spatial crowdsourcing platform, gMission. Nassar et al. Nassar and Karray (2019) introduced a crowdsourcing process. The process mainly contains five modules: incentives, quality control approaches, collection and verification methods, aggregation, and topical experts discovery. They review different methods used to accomplish each processing step.

Taking into consideration the motivation for this research is that, the existing studies by and large ignore the crowdsourcing processes from the technical perspective that is typical of CS endeavours, failing to provide a systematic framework in the design and implementation of CS systems. Therefore, we present a survey of crowdsourcing process-technology in focusing on emerging techniques and approaches for developing and improving the process of CS systems. We propose a framework of the crowdsourcing process, synthesize a wide spectrum of existing studies for various dimensions of the framework. The paper is oriented to the emerging techniques and technical methodologies provided for different components and functions of the framework. The work can be seen as an extension of Hetmank (2013) with plenty of full advances during recent years. The further survey on the related works is justifiable and meaningful in our paper. A summary of advances and differences of existing survey papers is outlined in Table 1.

Specifically, the key contributions of our paper are summarized as follows.

  • We analyze existing important definitions of crowdsourcing and present a simple definition.

  • We review the existing surveys and summarize the advances and differences in Table 1.

  • We propose a systematic and modifiable framework which provides the complete workflow of the crowdsourcing process.

  • We synthesize a broad spectrum of existing studies and map them through the lenses of our framework.

  • We present a contemporary analysis of emerging techniques in improving conventional and developing CS systems.

  • Based on the analysis, we identify limitations, particular challenges and future research directions.

The survey is organized as follows. First, preliminaries and overview of the framework along with workflow are provided (Section 2). The later sections are structured according to corresponding parts of the framework. The nine dimensions of the framework according to which we analyzed the papers are described in more details (Sections 3 to  11). Several significant challenges are discussed and future emanating research directions are outlined (Section 12). In the end, a brief conclusion of the survey is presented (Section 13).

Section snippets

Preliminaries and framework

We first present research methodology, review a few definitions, typical system model and then refine our focus to the framework of crowdsourcing process.

Task design

In the proposed framework, the first step is initialization, and task design is the first activity in the crowdsourcing process. Task design is an important aspect of crowdsourcing. It is a model consisting of several components using which the requester explains a task to workers through semantic and visual presentations. The design of a task is central to the success of the CS system. Efforts should be made to design a task simple and unambiguous Gurari et al. (2016). There is no

Task settings

The requesters need to set up some parameters and settings during task preparation, depending on the general and particular requirement of the task. These parameters are task attributes, worker estimation and task interference.

Incentives

Incentives are essential for compensating people in performing tasks. Designing suitable incentives and compensation policies can affect the performance of workers and the output quality of the CS systems. These are a kind of incitement that encourages one to work hard. Without any incentive, it is unlikely that a person is interested in performing tasks. During the initialization phase, a requester defines different types of incentives. Numerous studies discussed incentives and different

Task decomposition

Most real-world problems are complicated and hard to solve that involve computational operations. These tasks require considerable efforts and dedicated resources, which limits the number of potential workers due to high skill barriers. Thus, decomposition techniques are used to split the main task into a chunk of subtasks, processed individually, and results of subtasks are recomposed to get the final solution. The decomposition techniques are commonly practised in computation field to reduce

Finding crowd and platform selection

Searching human workers is a significant issue in CS systems; workers are available in the existing marketplaces or requesters can recruit workers defining their component of user management. Whereas, for the execution of tasks, many CS platforms with different services and features exist. Based on task requirements and priorities, requesters can select a platform in performing tasks. A brief description of finding crowd and platform selection is given below.

Task assignment

Task assignment is an important aspect in crowdsourcing, as poor assignment affect the system performance in terms of time and money. The main problems in task assignment are unavailability of workers’ information and continuous arrival of both tasks and workers at the platform. Some platforms do not publish workers information, but systems require this information for security reasons during worker recruitment. Based on information accessibility, task assignment in crowdsourcing is classified

Answer aggregation

As mentioned before, crowdsourcing leverages the intelligence and wisdom by outsourcing tasks to potentially large groups of people for contributions. Generally, a task is assigned to several workers to provide redundancy and collect the wisdom of the crowd. The basic idea is to assign a task to multiple workers and infer its answers by integrating contributions from all workers. Such assignment leads not only to multiple solutions for the same task but also reduces the impact of wrong answers

Validation and rewards

Once the crowdsourcing process is over, results are reported to the requester by the CS platform. Then, the conclusion phase starts, in which a requester needs to validate the answers and compensate workers. The workers receive incentives according to rewarding strategies. Here we discuss different validation methods and rewarding strategies, whereas different types of incentives are already discussed in Section 5. Moreover, based on the outcome of the validation phase, a requester or platform

Reputation

In the crowdsourcing process, the trust relationship between the requesters and workers reflects the probability that the requester expects to receive a quality contribution from the workers and more workers to take part in for honest requesters. Below, the worker reputation and requester reputation are briefly explained.

Challenges and future directions

As mentioned before, substantial research work has been performed in the crowdsourcing, but there still exist problems. In this section, we discuss the challenges related to different segments of the proposed framework of the crowdsourcing process together with task management, worker management, unexplored issues and future opportunities.

Conclusion

Crowdsourcing is an evolving phenomenon, recognized as an effective and efficient mechanism in solving distributed human-powered problems. The main advantage of crowdsourcing is cost reduction as crowdsourcer does not need in setting up the infrastructure and workers are always available to perform tasks. However, crowdsourcing still has many challenges due to its openness and unreliability. As the entire process is online, the crowdsourcer is unaware of whether a user is genuine or wicked.

CRediT authorship contribution statement

Shahzad Sarwar Bhatti: Conceptualization, Methodology, Writing - original draft. Xiaofeng Gao: Writing - review & editing. Guihai Chen: Supervision.

Declaration of Competing Interest

The authors declare that they have no known competing financialinterestsor personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported by the National Key R&D Program of China [2018YFB1004703]; the National Natural Science Foundation of China [61872238, 61672353]; the Shanghai Science and Technology Fund [17510740200]; the Huawei Innovation Research Program [HO2018085286]; the State Key Laboratory of Air Traffic Management System and Technology [SKLATM20180X]; the Tencent Social Ads Rhino-Bird Focused Research Program, and the CCF-Huawei Database System Innovation Research Plan [CCF-Huawei DBIR2019002A].

Shahzad Sarwar Bhatti is a PhD candidate at the Department of Computer Science and Engineering, Shanghai Jiao Tong University, China. He received the B.S. degree in 2000 and the M.S. degree in 2015 from Pakistan. He had been affiliated with M/s PTCL (Pakistan Telecomm: Company Limited) as ICT professional working in various domains. His research interest includes crowdsourcing, spatial crowdsourcing cooperative communication, algorithms, and optimiza tion.

References (249)

  • I. Abraham et al.

    How Many Workers to Ask? Adaptive Exploration for Collecting High Quality Labels

    Proc. Int. ACM conf. Res. and Dev. Inf. Retriev. (SIGIR)

    (2016)
  • B.T. Adler et al.

    A Content-Driven Reputation System for the Wikipedia

    ACM WWW

    (2007)
  • A. Afuah et al.

    Crowdsourcing as a solution to distant search

    Academy of Manag. Rev.

    (2012)
  • S. Ahmad et al.

    The Jabberwocky Programming Environment for Structured Social Computing

    Proc. ACM Symp. on User Inter. Softw. and Technol. (UIST)

    (2011)
  • L. von Ahn et al.

    Labeling images with a computer game

    Proc. ACM Int. Conf. Human Factors in Comput. Syst. (CHI)

    (2004)
  • L. von Ahn et al.

    Designing games with a purpose

    Commun. ACM

    (2008)
  • L. von Ahn et al.

    Recaptcha: human-Based character recognition via web security measures

    Science (AAAS)

    (2008)
  • S. Albarqouni et al.

    Aggnet: deep learning from crowds for mitosis detection in breast cancer histology images

    IEEE Trans. Med. Imaging

    (2016)
  • A. Alkharashi et al.

    Privacy in Crowdsourcing: A Systematic Review

    Proc. Int. Conf. Information Security (ISC)

    (2018)
  • M. Allahbakhsh et al.

    A Task Decomposition Framework for Surveying the Crowd Contextual Insights

    Proc. IEEE Int. Conf. Service-Oriented Comput. and App. (SOCA)

    (2015)
  • M. Allahbakhsh et al.

    Quality control in crowdsourcing systems

    IEEE Internet Comput.

    (2013)
  • M. Allahbakhsh et al.

    Reputation Management in Crowdsourcing Systems

    Proc. IEEE Int. Conf. Collab. Comput.: Netw. Appl. and Workshar. (CollaborateCom)

    (2012)
  • M. Allahbakhsh et al.

    An analytic approach to people evaluation in crowdsourcing systems

    CoRR

    (2012)
  • V. Almendra et al.

    Fraud Detection by Human Agents: A Pilot Study

    Proc. Int. Conf. on E-Comm. and Web Technol. (EC-Web)

    (2009)
  • B.A. Alqahtani et al.

    Legal and ethical issues of crowdsourcing

    Int. J. of Computer Appl. (IJCA)

    (2017)
  • M.A. AlShehry et al.

    A Taxonomy of Crowdsourcing Campaigns

    Proc. ACM Int. Conf. on World Wide Web (WWW)

    (2015)
  • F. Alt et al.

    Location-based crowdsourcing: Extending Crowdsourcing to the Real World

    Proc. ACM Nordic Conf. Human-Computer Interaction (NordiCHI)

    (2010)
  • A. Amato et al.

    Divide and conquer: Atomizing and Parallelizing a Task in a Mobile Crowdsourcing Platform

    Proc. ACM Int. workshop on Crowdsourcing for multimedia (CrowdMM)

    (2013)
  • Amazon Mechanical Turk Human Intelligence through an API. [Online]. Available:...
  • Y. Amsterdamer et al.

    Crowdminer: mining association rules from the crowd

    Proceedings of the VLDB Endowment

    (2013)
  • J. Anderson et al.

    The crowd is the territory : assessing quality in peer-Produced spatial data during disasters

    Int. J. Human Comput. Interact. (IJHCI)

    (2018)
  • H. Aris

    Current State of Crowdsourcing Taxonomy Research: A Systematic Review

    Proc. Int. Conf. on Comput. and Informatics (ICOCI)

    (2017)
  • H. Aris et al.

    Crowdsourcing evolution: Towards a taxonomy of crowdsourcing initiatives

    Proc. IEEE Int. Conf. on Perv. Comput. and Comm. (PerCom)

    (2016)
  • A. Artikis et al.

    Heterogeneous Stream Processing and Crowdsourcing for Urban Traffic Management

    Proc. Int. Conf. on Extend. Database Technol. (EDBT)

    (2014)
  • S. Assadi et al.

    Online Assignment of Heterogeneous Tasks in Crowdsourcing Markets

    Proc. AAAI Conf. Human Comput. and Crowdsourc. (HCOMP)

    (2015)
  • J. Bauer et al.

    Intellectual property norms in online communities: how user-organized intellectual property regulation supports innovation

    Inf. Syst. Research (ISR)

    (2016)
  • M.S. Bernstein et al.

    Soylent: a word processor with a crowd inside

    Commun. ACM

    (2010)
  • M.S.M. Bernstein et al.

    Crowds in Two Seconds: Enabling Realtime Crowd-powered Interfaces

    Proc. ACM Symp. on User Inter. Softw. and Technol. (UIST)

    (2011)
  • J.P. Bigham et al.

    VizWiz : Nearly Real-time Answers to Visual Questions

    Proc. ACM Symp. on User Inter. Softw. and Technol. (UIST)

    (2010)
  • R. Boim et al.

    Asking the right questions in crowd data sourcing

    Proc. IEEE Int. Conf. on Data Eng. (ICDE)

    (2012)
  • E. Bonabeau

    Decisions 2.0: the power of collective intelligence

    MIT Sloan Manag. Rev.

    (2009)
  • I. Boutsis et al.

    On task assignment for real-time reliable crowdsourcing

    Proc. IEEE Int. Conf. Dist. Comput. Syst. (ICDCS)

    (2014)
  • Z.A. Bozat

    Crowdsourcing as an open innovation tool for entrepreneurship

    Int. J. of Eco. and Manag. Eng.

    (2017)
  • D.C. Brabham

    Crowdsourcing as a model for problem solving: an introduction and cases

    Convergence: The Int. J. of Res. into New Media Technol.

    (2008)
  • D.C. Brabham

    Crowdsourcing the public participation process for planning projects

    Plan. Th.

    (2009)
  • D.C. Brabham

    Moving the crowd at threadless: motivations for participation in a crowdsourcing application

    Inf. Commun. and Society (iCS)

    (2010)
  • D.C. Brabham

    ”Crowdsourcing”

    (2013)
  • D.C. Brabham

    Using crowdsourcing in government

    (2013)
  • C. Breazeal et al.

    Crowdsourcing human-Robot interaction: new methods and system evaluation in a public environment

    J. of Human-Robot Interaction (JHRI)

    (2013)
  • A. Carvalho et al.

    How many crowdsourced workers should a requester hire?

    Ann. Math. Artif. Intell.

    (2016)
  • Cited by (56)

    View all citing articles on Scopus

    Shahzad Sarwar Bhatti is a PhD candidate at the Department of Computer Science and Engineering, Shanghai Jiao Tong University, China. He received the B.S. degree in 2000 and the M.S. degree in 2015 from Pakistan. He had been affiliated with M/s PTCL (Pakistan Telecomm: Company Limited) as ICT professional working in various domains. His research interest includes crowdsourcing, spatial crowdsourcing cooperative communication, algorithms, and optimiza tion.

    Xiaofeng Gao received the B.S. degree in information and computational science from Nankai University, China, in 2004; the M.S. degree in operations research and control theory from Tsinghua University, China, in 2006; and the Ph.D. degree in comp uter science from the University of Texas at Dallas, USA, in 2010. She is currently P rofessor with the Department of Computer Science and Engineering, Shanghai Jiao Tong University, China. Her research interests include wireless communications, data engineering, and combinatorial optimizations. She has authored more than 100 peer reviewed papers and 7 book chapters in the related area, including well archived international journals such as IEEE TC, IEEE TKDE, IEEE TPDS, TCS, and also in well kn own conference proceedings such as INFOCOM, SIGKDD, ICDCS. She has also served on the editorial board of Discrete Mathematics, Algorithms and Applications, and as the PCs and peer reviewer for a number of international conferences and journals.

    Guihai Chen is a distinguished professor of Shanghai Jiao Tong University. He earned B.S. degree in computer software from Nanjing University in 1984, M.E. degree in computer applications from Southeast University in 1987, and Ph.D. degree in computer scie nce from the University of Hong Kong in 1997. He had been invited as a visiting professor by Kyushu Institute of Technology in Japan, University of Queensland in Australia and Wayne State University in USA. He has a wide range of research interests with fo cus on parallel computing, wireless networks, data centers, peer to peer computing, high performance computer architecture and data engineering. He has published more than 350 peer reviewed papers, and more than 200 of them are in well archived internation al journals such as IEEE TPDS, IEEE TC, IEEE TKDE, ACM/IEEE TON and ACM TOSN, and also in well known conference proceedings such as HPCA, MOBIHOC, INFOCOM, ICNP, ICDCS, CoNEXT and AAAI. He has won several best paper awards including ICNP 2015 best paper aw ard. His papers have been cited for more than 10000 times according to Google Scholar. He is a CCF fellow.

    View full text