Skip to main content
Log in

Querying subjective data

  • Special Issue Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Online users are constantly seeking experiences, such as a hotel with clean rooms and a lively bar, or a restaurant for a romantic rendezvous. However, e-commerce search engines only support queries involving objective attributes such as location, price, and cuisine, and any experiential data is relegated to text reviews. In order to support experiential queries, a database system needs to model subjective data. Users should be able to pose queries that specify subjective experiences using their own words, in addition to conditions on the usual objective attributes. This paper introduces OpineDB, a subjective database system that addresses these challenges. We introduce a data model for subjective databases. We describe how OpineDB translates subjective queries against the subjective database schema, which is done by matching the user query phrases to the underlying schema. We also show how the experiential conditions specified by the user can be combined and the results aggregated and ranked. We demonstrate that subjective databases satisfy user needs more effectively and accurately than alternative techniques through experiments with real data of hotel and restaurant reviews.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. https://dictionary.cambridge.org/us/dictionary/english/subjective.

  2. We used an open-sourced implementation available at https://github.com/macanv/BERT-BiLSTM-CRF-NER.

  3. We collected the F1 scores of the SemEval datasets from [62, 63] and retrained their model on the hotel dataset (10 times to get the average).

References

  1. Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., et al.: Tensorflow: a system for large-scale machine learning. In: OSDI, pp. 265–283 (2016)

  2. Androutsopoulos, I., Ritchie, G.D., Thanisch, P.: Natural language interfaces to databases: an introduction. Nat. Lang. Eng. 1(1), 29–81 (1995)

    Article  Google Scholar 

  3. Aroyo, L., Welty, C.: Truth is a lie: crowd truth and the seven myths of human annotation. AI Mag. 36(1), 15–24 (2015)

    Google Scholar 

  4. Baeza-Yates, R.A.: Bias on the web. Commun. ACM 61(6), 54–61 (2018)

    Article  Google Scholar 

  5. Barker, K.: Combining structured and unstructured knowledge sources for question answering in watson. In: International Conference on Data Integration in the Life Sciences, pp. 53–55. Springer (2012)

  6. Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)

    Article  Google Scholar 

  7. Bird, S., Loper, E.: Nltk: the natural language toolkit. In: Proceedings of the ACL 2004 on Interactive Poster and Demonstration Sessions, p. 31 (2004)

  8. Brody, S., Elhadad, N.: An unsupervised aspect-sentiment model for online reviews. In: NAACL HLT, pp. 804–812 (2010)

  9. Buhrmester, M., Kwang, T., Gosling, S.D.: Amazon’s mechanical turk: a new source of inexpensive, yet high-quality, data? Perspect. Psychol. Sci. 6(1), 3–5 (2011)

    Article  Google Scholar 

  10. Chen, B., An, B., Sun, L., Han, X.: Semi-supervised lexicon learning for wide-coverage semantic parsing. In: COLING, pp. 892–904 (2018)

  11. Christopher, D.M., Prabhakar, R., Hinrich, S.: Introduction to information retrieval. Introd. Inf. Retriev. 151(177), 5 (2008)

    MATH  Google Scholar 

  12. Conneau, A., Kiela, D., Schwenk, H., Barrault, L., Bordes, A.: Supervised learning of universal sentence representations from natural language inference data. In: EMNLP, pp. 670–680 (2017)

  13. CREATE AGGREGATE (Transact-SQL). https://docs.microsoft.com/en-us/sql/t-sql/statements/create-aggregate-transact-sql

  14. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: Pre-training of Deep Bidirectional Transformers for Language Understanding (2018). arXiv:1810.04805

  15. Evensen, S., Feng, A., Halevy, A., Li, J., Li, V., Li, Y., Liu, H., Mihaila, G., Morales, J., Nuno, N., Pavlovic, E., Tan, W.C., Wang, X.: Voyageur: an experiential travel search engine. In: The World Wide Web Conference, WWW’19, pp. 3511–3515 (2019)

  16. Fagin, R.: Combining fuzzy information from multiple systems. In: PODS, pp. 216–226. ACM (1996)

  17. Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci. 66(4), 614–656 (2003)

    Article  MathSciNet  Google Scholar 

  18. Fast, E., Chen, B., Bernstein, M.S.: Empath: Understanding Topic Signals in Large-Scale Text. In: CHI, pp. 4647–4657 (2016)

  19. Feng, A., Chen, S., Li, Y., Matsuda, H., Tamaki, H., Tan, W.C.: Towards Productionizing Subjective Search Systems (2020). arXiv:2003.13968

  20. Feng, X., Kumar, A., Recht, B., Ré, C.: Towards a unified architecture for in-RDBMS analytics. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, pp. 325–336. ACM (2012)

  21. Ganesan, K., Zhai, C.: Opinion-based entity ranking. Inf. Retriev. 15(2), 116–150 (2012)

    Article  Google Scholar 

  22. Generalized Linear Models: https://docs.oracle.com/cd/B28359_01/datamine.111/b28129/algo_glm.htm

  23. Gormley, C., Tong, Z.: Elasticsearch: The Definitive Guide—A Distributed Real-Time Search and Analytics Engine. O’Reilly Media, Inc., Newton (2015)

    Google Scholar 

  24. Greenplum Database Data Types. https://gpdb.docs.pivotal.io/500/ref_guide/data_types.html

  25. Hamilton, W.L., Clark, K., Leskovec, J., Jurafsky, D.: Inducing domain-specific sentiment lexicons from unlabeled corpora. In: EMNLP, pp. 595–605 (2016)

  26. He, R., Lee, W.S., Ng, H.T., Dahlmeier, D.: An unsupervised neural attention model for aspect extraction. In: ACL, pp. 388–397 (2017)

  27. Hellerstein, J.M., Ré, C., Schoppmann, F., Wang, D.Z., Fratkin, E., Gorajek, A., Ng, K.S., Welton, C., Feng, X., Li, K., et al.: The madlib analytics library. In: Proceedings of the VLDB Endowment, vol. 5, no. 12 (2012)

  28. Hu, M., Liu, B.: Mining and summarizing customer reviews. In: SIGKDD, pp. 168–177 (2004)

  29. Hu, M., Liu, B.: Mining opinion features in customer reviews. In: AAAI, pp. 755–760 (2004)

  30. Ilyas, I.F., Beskales, G., Soliman, M.A.: A survey of top-k query processing techniques in relational database systems. ACM Comput. Surv. (CSUR) 40(4), 11 (2008)

    Article  Google Scholar 

  31. Iyer, S., Konstas, I., Cheung, A., Krishnamurthy, J., Zettlemoyer, L.: Learning a neural semantic parser from user feedback. In: ACL, pp. 963–973 (2017)

  32. Kiros, R., Zhu, Y., Salakhutdinov, R.R., Zemel, R., Urtasun, R., Torralba, A., Fidler, S.: Skip-thought vectors. In: NIPS, pp. 3294–3302 (2015)

  33. Klement, E.P., Mesiar, R., Pap, E.: Book review: “triangular norms”. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 11(02), 257–259 (2003)

    Article  Google Scholar 

  34. Klir, G., Yuan, B.: Fuzzy Sets and Fuzzy Logic, vol. 4. Prentice Hall, Upper Saddle River (1995)

    MATH  Google Scholar 

  35. Li, F., Jagadish, H.V.: Understanding natural language queries over relational databases. SIGMOD Rec. 45(1), 6–13 (2016)

    Article  Google Scholar 

  36. Liu, B.: Sentiment Analysis and Opinion Mining. Morgan & Claypool, San Rafael (2012)

    Book  Google Scholar 

  37. MADlib: Neural Network. https://madlib.apache.org/docs/latest/group__grp__nn.html

  38. Makris, C., Panagopoulos, P.: Improving opinion-based entity ranking. In: WEBIST, pp. 223–230 (2014)

  39. Miao, Z., Li, Y., Wang, X., Tan, W.C.: Snippext: semi-supervised opinion mining with augmented data. In: The World Wide Web Conference, WWW’20 (2020)

  40. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space (2013). arXiv:1301.3781

  41. Motro, A.: Vague: a user interface to relational databases that permits vague queries. ACM Trans. Inf. Syst. (TOIS) 6(3), 187–214 (1988)

    Article  Google Scholar 

  42. Pontiki, M., Galanis, D., Papageorgiou, H., Androutsopoulos, I., Manandhar, S., Mohammad, A.S., Al-Ayyoub, M., Zhao, Y., Qin, B., De Clercq, O., et al.: Semeval-2016 task 5: aspect based sentiment analysis. In: SemEval-2016, pp. 19–30 (2016)

  43. Pontiki, M., Galanis, D., Papageorgiou, H., Manandhar, S., Androutsopoulos, I.: Semeval-2015 task 12: aspect based sentiment analysis. In: Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), pp. 486–495 (2015)

  44. Pontiki, M., Galanis, D., Pavlopoulos, J., Papageorgiou, H., Androutsopoulos, I., Manandhar, S.: Semeval-2014 task 4: aspect based sentiment analysis. In: Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), pp. 27–35 (2014)

  45. Poon, H.: Grounded unsupervised semantic parsing. In: ACL, pp. 933–943 (2013)

  46. Popescu, A., Etzioni, O., Kautz, H.A.: Towards a theory of natural language interfaces to databases. In: IUI, pp. 149–157 (2003)

  47. PostgreSQL: Arrays. https://www.postgresql.org/docs/current/arrays.html

  48. Qiu, G., Liu, B., Bu, J., Chen, C.: Opinion word expansion and target extraction through double propagation. COLING 37(1), 9–27 (2011)

    Google Scholar 

  49. Rehurek, R., Sojka, P.: Gensim-statistical semantics in python. (2011)

  50. Rothe, S., Ebert, S., Schütze, H.: Ultradense word embeddings by orthogonal transformation. In: NAACL HLT, pp. 767–777 (2016)

  51. Sang, E.F., De Meulder, F.: Introduction to the Conll-2003 Shared Task: Language-Independent Named Entity Recognition (2003). arXiv:cs/0306050

  52. Savenkov, D., Agichtein, E.: When a knowledge base is not enough: question answering over knowledge bases with external text data. In: SIGIR, pp. 235–244 (2016)

  53. Suciu, D., Olteanu, D., Ré, C., Koch, C.: Probabilistic Databases. Synthesis Lectures on Data Management. Morgan & Claypool Publishers, San Rafael (2011)

    MATH  Google Scholar 

  54. Tai, Y., Kao, H.: Automatic domain-specific sentiment lexicon generation with label propagation. In: IIWAS, p. 53 (2013)

  55. The Booking.com Dataset. https://www.kaggle.com/jiashenliu/515k-hotel-reviews-data-in-europe

  56. The Yelp Dataset. https://www.yelp.com/dataset

  57. Trummer, I., Halevy, A.Y., Lee, H., Sarawagi, S., Gupta, R.: Mining subjective properties on the web. In: SIGMOD, pp. 1745–1760 (2015)

  58. User-Defined Aggregates. https://www.postgresql.org/docs/current/xaggr.html

  59. Using PL/SQL Collections and Records. https://docs.oracle.com/cd/B28359_01/appdev.111/b28370/collections.htm#CHDEIJHD

  60. Using User-Defined Aggregate Functions. https://docs.oracle.com/cd/B28359_01/appdev.111/b28425/aggr_functions.htm

  61. Vicente, I.S., Agerri, R., Rigau, G.: Simple, robust and (almost) unsupervised generation of polarity lexicons for multiple languages. In: EACL, pp. 88–97 (2014)

  62. Wang, W., Pan, S.J., Dahlmeier, D., Xiao, X.: Recursive neural conditional random fields for aspect-based sentiment analysis. In: EMNLP, pp. 616–626 (2016)

  63. Wang, W., Pan, S.J., Dahlmeier, D., Xiao, X.: Coupled multi-layer attentions for co-extraction of aspect and opinion terms. In: AAAI, pp. 3316–3322 (2017)

  64. Wilson, T., Wiebe, J., Hoffmann, P.: Recognizing contextual polarity in phrase-level sentiment analysis. In: HLT/EMNLP, pp. 347–354 (2005)

  65. Xin, H., Meng, R., Chen, L.: Subjective knowledge base construction powered by crowdsourcing and knowledge base. In: SIGMOD, pp. 1349–1361. ACM (2018)

  66. Yan, X., Guo, J., Lan, Y., Cheng, X.: A biterm topic model for short texts. In: WWW, pp. 1445–1456 (2013)

  67. Zadeh, L.A.: Fuzzy logic= computing with words. IEEE Trans. Fuzzy Syst. 4(2), 103–111 (1996)

    Article  Google Scholar 

  68. Zhang, L., Wang, S., Liu, B.: Deep learning for sentiment analysis: a survey. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 8(4) (2018)

  69. Zhong, V., Xiong, C., Socher, R.: Seq2sql: Generating Structured Queries from Natural Language Using Reinforcement Learning (2017). arXiv:1709.00103

Download references

Acknowledgements

We are grateful to the anonymous reviewers for their thorough reviews and suggestions that have greatly improved the paper. We would also like to thank Sara Evensen and Natalie Nuno for their help in the development of OpineDB, and George Mihaila, John Morales, and Ekaterina Pavlovic for their contribution in an earlier version of OpineDB.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuliang Li.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, Y., Feng, A., Li, J. et al. Querying subjective data. The VLDB Journal 30, 115–140 (2021). https://doi.org/10.1007/s00778-020-00634-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-020-00634-5

Keywords

Navigation