Skip to main content
Log in

Crowdsourced top-k queries by pairwise preference judgments with confidence and budget control

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Crowdsourced query processing is an emerging technique that tackles computationally challenging problems by human intelligence. The basic idea is to decompose a computationally challenging problem into a set of human-friendly microtasks (e.g., pairwise comparisons) that are distributed to and answered by the crowd. The solution of the problem is then computed (e.g., by aggregation) based on the crowdsourced answers to the microtasks. In this work, we attempt to revisit the crowdsourced processing of the top-k queries, aiming at (1) securing the quality of crowdsourced comparisons by a certain confidence level and (2) minimizing the total monetary cost. To secure the quality of each paired comparison, we employ statistical tools to estimate the confidence interval from the collected judgments of the crowd, which is then used to guide the aggregated judgment. We propose novel frameworks, SPR and SPR\(^+\), to address the crowdsourced top-k queries. Both SPR and SPR\(^+\) are budget-aware, confidence-aware, and effective in producing high-quality top-k results. SPR requires as input a budget for each paired comparison, whereas SPR\(^+\) requires only a total budget for the whole top-k task. Extensive experiments, conducted on four real datasets, demonstrate that our proposed methods outperform the other existing top-k processing techniques by a visible difference.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19

Similar content being viewed by others

Notes

  1. https://translate.google.com/community.

  2. https://www.duolingo.com/.

  3. https://translate.twitter.com/.

  4. https://www.whoscored.com/PlayerComparison.

  5. https://www.topuniversities.com/university-rankings/world-university-rankings/2020.

  6. The confidence interval of an unobserved variable with confidence level \(1 - \alpha \) means that the variable falls into the interval with probability \(1 - \alpha \).

  7. A small \(\varepsilon >0\) guarantees that the interval excludes 0.

  8. Note that this adaptive budget allocation does not affect the unit price of one single judgment from the crowd, which is assumed to be fixed regardless the difficulty of the paired comparison.

  9. https://github.com/yanl2031/Pairwise-Preference-Judgment-Datasets.

  10. https://www.imdb.com/interfaces/.

  11. https://www.figure-eight.com/.

  12. http://www.shanghairanking.com/.

  13. https://www.topuniversities.com/university-rankings/world-university-rankings/2020.

  14. https://www.timeshighereducation.com/world-university-rankings.

References

  1. Amsterdamer, Y., Grossman, Y., Milo, T., Senellart, P.: Crowd mining. In: SIGMOD (2013)

  2. Bradley, R.A., Terry, M.E.: Rank analysis of incomplete block designs: I The method of paired comparisons. Biometrika 39(3/4), 324–345 (1952)

    Article  MathSciNet  Google Scholar 

  3. Busa-Fekete, R., Szörényi, B., Cheng, W., Weng, P., Hüllermeier, E.: Top-k selection based on adaptive sampling of noisy preferences. In: ICML (2013)

  4. Chen, X., Bennett, P.N., Collins-Thompson, K., Horvitz, E.: Pairwise ranking aggregation in a crowdsourced setting. In: WSDM (2013)

  5. Chu, W., Ghahramani, Z.: Preference learning with Gaussian processes. In: ICML (2005)

  6. Ciceri, E., Fraternali, P., Martinenghi, D., Tagliasacchi, M.: Crowdsourcing for top-k query processing over uncertain data. TKDE 28(1), 41–53 (2016)

    Google Scholar 

  7. Davidson, S.B., Khanna, S., Milo, T., Roy, S.: Using the crowd for top-k and group-by queries. In: ICDT (2013)

  8. Davidson, S.B., Khanna, S., Milo, T., Roy, S.: Top-k and clustering with noisy comparisons. TODS 39(4), 35:1–35:39 (2014)

    Article  MathSciNet  Google Scholar 

  9. de Alfaro, L., Polychronopoulos, V., Polyzotis, N.: Efficient techniques for crowdsourced top-k lists. In: IJCAI (2017)

  10. Diaconis, P., Graham, R.L.: Spearman’s footrule as a measure of disarray. JRSS: Series B pp. 262–268 (1977)

  11. Dong, J., Yang, K., Shi, Y.: Ranking from crowdsourced pairwise comparisons via smoothed matrix manifold optimization. In: ICDM workshops (2017)

  12. Dushkin, E., Milo, T.: Top-k sorting under partial order information. In: SIGMOD (2018)

  13. Dwork, C., Kumar, R., Naor, M., Sivakumar, D.: Rank aggregation methods for the web. In: WWW (2001)

  14. Efron, B., Tibshirani, R.J.: An Introduction to the Bootstrap. CRC Press, Boca Raton (1994)

    Book  Google Scholar 

  15. Ghosh, M., Mukhopadhyay, N., Sen, P.K.: Sequential Estimation, 1st edn. Wiley, Hoboken (2011)

    MATH  Google Scholar 

  16. Goldberg, K.Y., Roeder, T., Gupta, D., Perkins, C.: Eigentaste: a constant time collaborative filtering algorithm. Inf. Retr. 4(2), 133–151 (2001)

    Article  Google Scholar 

  17. Gottlieb, A., Hoehndorf, R., Dumontier, M., Altman, R.B.: Ranking adverse drug reactions with crowdsourcing. J. Med. Internet Res. 17(3), e80 (2015)

    Article  Google Scholar 

  18. Guo, S., Parameswaran, A.G., Garcia-Molina, H.: So who won?. In: SIGMOD, Dynamic Max Discovery with the Crowd (2012)

  19. Hoare, C.A.R.: Algorithm 65: Find. Commun. ACM 4(7), 321–322 (1961)

    Google Scholar 

  20. Hogg, R., Tanis, E., Zimmerman, D.: Probability and Statistical Inference, 9th edn. Pearson, London (2013)

    Google Scholar 

  21. Kemeny, J.G.: Mathematics without numbers. Daedalus 88(4), 577–591 (1959)

    Google Scholar 

  22. Khan, A.R., García-Molina, H.: Hybrid strategies for finding the max with the crowd. Tech. Rep. 1090, Stanford InfoLab, Stanford University (2014)

  23. Kou, N.M., Li, Y., Wang, H., U, L.H., Gong, Z.: Crowdsourced top-k queries by confidence-aware pairwise judgments. In: SIGMOD (2017)

  24. Lakshmivarahan, S., Dhall, S.K., Miller, L.L.: Parallel sorting algorithms. Adv. Comput. 23, 295–354 (1984)

    Article  MathSciNet  Google Scholar 

  25. Lee, J., Lee, D., Hwang, S.: CrowdK: answering top-k queries with crowdsourcing. Inf. Sci. 399, 98–120 (2017)

    Article  Google Scholar 

  26. Li, K., Zhang, X., Li, G.: A rating-ranking method for crowdsourced top-k computation. In: SIGMOD (2018)

  27. Li, G., Wang, J., Zheng, Y., Franklin, M.J.: Crowdsourced data management: a survey. TKDE 28(9), 2296–2319 (2016)

    Google Scholar 

  28. Li, Y., Kou, N.M., Wang, H., U, L.H., Gong, Z.: A confidence-aware top-k query processing toolkit on crowdsourcing. PVLDB 10(12), 1909–1912 (2017)

    Google Scholar 

  29. Lin, X., Xu, J., Hu, H., Fan, Z.: Reducing uncertainty of probabilistic top-k ranking via pairwise crowdsourcing. TKDE 29(10), 2290–2303 (2017)

    Google Scholar 

  30. Luce, R.D.: Individual Choice Behavior: A Theoretical Analysis. Wiley, Hoboken (1959)

    MATH  Google Scholar 

  31. Marcus, A., Wu, E., Karger, D.R., Madden, S., Miller, R.C.: Human-powered sorts and joins. PVLDB 5(1), 13–24 (2011)

    Google Scholar 

  32. Matsui, T., Baba, Y., Kamishima, T., Kashima, H.: Crowdordering. In: PAKDD (2014)

  33. Mohajer, S., Suh, C., Elmahdy, A.: Active learning for top-k rank aggregation from noisy comparisons. In: ICML (2017)

  34. Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: bringing order to the web. Tech. Rep. 422, Stanford InfoLab, Stanford University (1999)

  35. Polychronopoulos, V., de Alfaro, L., Davis, J., Garcia-Molina, H., Polyzotis, N.: Human-powered top-k lists. In: WebDB (2013)

  36. Rajpal, S., Parameswaran, A.: Holistic crowd-powered sorting via AID: Optimizing for accuracies, inconsistencies, and difficulties. In: CIKM (2018)

  37. Snyder, J.: Estimating the distribution of voter preferences using partially aggregated voting data. Polit. Methodol. 13(1), 2–5 (2005)

    MathSciNet  Google Scholar 

  38. Stein, C.: A two-sample test for a linear hypothesis whose power is independent of the variance. Ann. Math. Stat. 16(3), 243–258 (1945)

    Article  MathSciNet  Google Scholar 

  39. Thurstone, L.L.: A law of comparative judgement. Psychol. Rev. 34, 273–286 (1927)

    Article  Google Scholar 

  40. Venetis, P., Garcia-Molina, H., Huang, K., Polyzotis, N.: Max algorithms in crowdsourcing environments. In: WWW (2012)

  41. Voorhees, E.M.: Variations in relevance judgments and the measurement of retrieval effectiveness. Inf. Process. Manag. 36(5), 697–716 (2000)

    Article  Google Scholar 

  42. Wang, J., Li, G., Kraska, T., Franklin, M.J., Feng, J.: Leveraging transitive relations for crowdsourced joins. In: SIGMOD (2013)

  43. Xu, Q., Xiong, J., Sun, X., Yang, Z., Cao, X., Huang, Q., Yao, Y.: A margin-based MLE for crowdsourced partial ranking. In: Multimedia (2018)

  44. Ye, P., Doermann, D.: Combining preference and absolute judgements in a crowd-sourced setting. In: Machine Learning Meets Crowdsourcing (2013)

  45. Zhang, X., Li, G., Feng, J.: Crowdsourced top-k algorithms: an experimental evaluation. PVLDB 9(8), 612–623 (2016)

    MathSciNet  Google Scholar 

  46. Ziegler, C.N., McNee, S.M., Konstan, J.A., Lausen, G.: Improving recommendation lists through topic diversification. In: WWW (2005)

Download references

Acknowledgements

This work was supported by the National Key Research and Development Plan of China (No. 2019YFB2102100), Key-Area Research and Development Program of Guangdong Province (NO. 2020B010164003), the Science and Technology Development Fund, Macau SAR (File no. SKL-IOTSC-2018-2020 and 0015/2019/AKP), and University of Macau (File no. MYRG2019-00119-FST).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Hao Wang or Leong Hou U.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, Y., Wang, H., Kou, N.M. et al. Crowdsourced top-k queries by pairwise preference judgments with confidence and budget control. The VLDB Journal 30, 189–213 (2021). https://doi.org/10.1007/s00778-020-00631-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-020-00631-8

Keywords

Navigation