Abstract
Online users are constantly seeking experiences, such as a hotel with clean rooms and a lively bar, or a restaurant for a romantic rendezvous. However, e-commerce search engines only support queries involving objective attributes such as location, price, and cuisine, and any experiential data is relegated to text reviews. In order to support experiential queries, a database system needs to model subjective data. Users should be able to pose queries that specify subjective experiences using their own words, in addition to conditions on the usual objective attributes. This paper introduces OpineDB, a subjective database system that addresses these challenges. We introduce a data model for subjective databases. We describe how OpineDB translates subjective queries against the subjective database schema, which is done by matching the user query phrases to the underlying schema. We also show how the experiential conditions specified by the user can be combined and the results aggregated and ranked. We demonstrate that subjective databases satisfy user needs more effectively and accurately than alternative techniques through experiments with real data of hotel and restaurant reviews.
Similar content being viewed by others
Notes
We used an open-sourced implementation available at https://github.com/macanv/BERT-BiLSTM-CRF-NER.
References
Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., et al.: Tensorflow: a system for large-scale machine learning. In: OSDI, pp. 265–283 (2016)
Androutsopoulos, I., Ritchie, G.D., Thanisch, P.: Natural language interfaces to databases: an introduction. Nat. Lang. Eng. 1(1), 29–81 (1995)
Aroyo, L., Welty, C.: Truth is a lie: crowd truth and the seven myths of human annotation. AI Mag. 36(1), 15–24 (2015)
Baeza-Yates, R.A.: Bias on the web. Commun. ACM 61(6), 54–61 (2018)
Barker, K.: Combining structured and unstructured knowledge sources for question answering in watson. In: International Conference on Data Integration in the Life Sciences, pp. 53–55. Springer (2012)
Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)
Bird, S., Loper, E.: Nltk: the natural language toolkit. In: Proceedings of the ACL 2004 on Interactive Poster and Demonstration Sessions, p. 31 (2004)
Brody, S., Elhadad, N.: An unsupervised aspect-sentiment model for online reviews. In: NAACL HLT, pp. 804–812 (2010)
Buhrmester, M., Kwang, T., Gosling, S.D.: Amazon’s mechanical turk: a new source of inexpensive, yet high-quality, data? Perspect. Psychol. Sci. 6(1), 3–5 (2011)
Chen, B., An, B., Sun, L., Han, X.: Semi-supervised lexicon learning for wide-coverage semantic parsing. In: COLING, pp. 892–904 (2018)
Christopher, D.M., Prabhakar, R., Hinrich, S.: Introduction to information retrieval. Introd. Inf. Retriev. 151(177), 5 (2008)
Conneau, A., Kiela, D., Schwenk, H., Barrault, L., Bordes, A.: Supervised learning of universal sentence representations from natural language inference data. In: EMNLP, pp. 670–680 (2017)
CREATE AGGREGATE (Transact-SQL). https://docs.microsoft.com/en-us/sql/t-sql/statements/create-aggregate-transact-sql
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: Pre-training of Deep Bidirectional Transformers for Language Understanding (2018). arXiv:1810.04805
Evensen, S., Feng, A., Halevy, A., Li, J., Li, V., Li, Y., Liu, H., Mihaila, G., Morales, J., Nuno, N., Pavlovic, E., Tan, W.C., Wang, X.: Voyageur: an experiential travel search engine. In: The World Wide Web Conference, WWW’19, pp. 3511–3515 (2019)
Fagin, R.: Combining fuzzy information from multiple systems. In: PODS, pp. 216–226. ACM (1996)
Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci. 66(4), 614–656 (2003)
Fast, E., Chen, B., Bernstein, M.S.: Empath: Understanding Topic Signals in Large-Scale Text. In: CHI, pp. 4647–4657 (2016)
Feng, A., Chen, S., Li, Y., Matsuda, H., Tamaki, H., Tan, W.C.: Towards Productionizing Subjective Search Systems (2020). arXiv:2003.13968
Feng, X., Kumar, A., Recht, B., Ré, C.: Towards a unified architecture for in-RDBMS analytics. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, pp. 325–336. ACM (2012)
Ganesan, K., Zhai, C.: Opinion-based entity ranking. Inf. Retriev. 15(2), 116–150 (2012)
Generalized Linear Models: https://docs.oracle.com/cd/B28359_01/datamine.111/b28129/algo_glm.htm
Gormley, C., Tong, Z.: Elasticsearch: The Definitive Guide—A Distributed Real-Time Search and Analytics Engine. O’Reilly Media, Inc., Newton (2015)
Greenplum Database Data Types. https://gpdb.docs.pivotal.io/500/ref_guide/data_types.html
Hamilton, W.L., Clark, K., Leskovec, J., Jurafsky, D.: Inducing domain-specific sentiment lexicons from unlabeled corpora. In: EMNLP, pp. 595–605 (2016)
He, R., Lee, W.S., Ng, H.T., Dahlmeier, D.: An unsupervised neural attention model for aspect extraction. In: ACL, pp. 388–397 (2017)
Hellerstein, J.M., Ré, C., Schoppmann, F., Wang, D.Z., Fratkin, E., Gorajek, A., Ng, K.S., Welton, C., Feng, X., Li, K., et al.: The madlib analytics library. In: Proceedings of the VLDB Endowment, vol. 5, no. 12 (2012)
Hu, M., Liu, B.: Mining and summarizing customer reviews. In: SIGKDD, pp. 168–177 (2004)
Hu, M., Liu, B.: Mining opinion features in customer reviews. In: AAAI, pp. 755–760 (2004)
Ilyas, I.F., Beskales, G., Soliman, M.A.: A survey of top-k query processing techniques in relational database systems. ACM Comput. Surv. (CSUR) 40(4), 11 (2008)
Iyer, S., Konstas, I., Cheung, A., Krishnamurthy, J., Zettlemoyer, L.: Learning a neural semantic parser from user feedback. In: ACL, pp. 963–973 (2017)
Kiros, R., Zhu, Y., Salakhutdinov, R.R., Zemel, R., Urtasun, R., Torralba, A., Fidler, S.: Skip-thought vectors. In: NIPS, pp. 3294–3302 (2015)
Klement, E.P., Mesiar, R., Pap, E.: Book review: “triangular norms”. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 11(02), 257–259 (2003)
Klir, G., Yuan, B.: Fuzzy Sets and Fuzzy Logic, vol. 4. Prentice Hall, Upper Saddle River (1995)
Li, F., Jagadish, H.V.: Understanding natural language queries over relational databases. SIGMOD Rec. 45(1), 6–13 (2016)
Liu, B.: Sentiment Analysis and Opinion Mining. Morgan & Claypool, San Rafael (2012)
MADlib: Neural Network. https://madlib.apache.org/docs/latest/group__grp__nn.html
Makris, C., Panagopoulos, P.: Improving opinion-based entity ranking. In: WEBIST, pp. 223–230 (2014)
Miao, Z., Li, Y., Wang, X., Tan, W.C.: Snippext: semi-supervised opinion mining with augmented data. In: The World Wide Web Conference, WWW’20 (2020)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space (2013). arXiv:1301.3781
Motro, A.: Vague: a user interface to relational databases that permits vague queries. ACM Trans. Inf. Syst. (TOIS) 6(3), 187–214 (1988)
Pontiki, M., Galanis, D., Papageorgiou, H., Androutsopoulos, I., Manandhar, S., Mohammad, A.S., Al-Ayyoub, M., Zhao, Y., Qin, B., De Clercq, O., et al.: Semeval-2016 task 5: aspect based sentiment analysis. In: SemEval-2016, pp. 19–30 (2016)
Pontiki, M., Galanis, D., Papageorgiou, H., Manandhar, S., Androutsopoulos, I.: Semeval-2015 task 12: aspect based sentiment analysis. In: Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), pp. 486–495 (2015)
Pontiki, M., Galanis, D., Pavlopoulos, J., Papageorgiou, H., Androutsopoulos, I., Manandhar, S.: Semeval-2014 task 4: aspect based sentiment analysis. In: Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), pp. 27–35 (2014)
Poon, H.: Grounded unsupervised semantic parsing. In: ACL, pp. 933–943 (2013)
Popescu, A., Etzioni, O., Kautz, H.A.: Towards a theory of natural language interfaces to databases. In: IUI, pp. 149–157 (2003)
PostgreSQL: Arrays. https://www.postgresql.org/docs/current/arrays.html
Qiu, G., Liu, B., Bu, J., Chen, C.: Opinion word expansion and target extraction through double propagation. COLING 37(1), 9–27 (2011)
Rehurek, R., Sojka, P.: Gensim-statistical semantics in python. (2011)
Rothe, S., Ebert, S., Schütze, H.: Ultradense word embeddings by orthogonal transformation. In: NAACL HLT, pp. 767–777 (2016)
Sang, E.F., De Meulder, F.: Introduction to the Conll-2003 Shared Task: Language-Independent Named Entity Recognition (2003). arXiv:cs/0306050
Savenkov, D., Agichtein, E.: When a knowledge base is not enough: question answering over knowledge bases with external text data. In: SIGIR, pp. 235–244 (2016)
Suciu, D., Olteanu, D., Ré, C., Koch, C.: Probabilistic Databases. Synthesis Lectures on Data Management. Morgan & Claypool Publishers, San Rafael (2011)
Tai, Y., Kao, H.: Automatic domain-specific sentiment lexicon generation with label propagation. In: IIWAS, p. 53 (2013)
The Booking.com Dataset. https://www.kaggle.com/jiashenliu/515k-hotel-reviews-data-in-europe
The Yelp Dataset. https://www.yelp.com/dataset
Trummer, I., Halevy, A.Y., Lee, H., Sarawagi, S., Gupta, R.: Mining subjective properties on the web. In: SIGMOD, pp. 1745–1760 (2015)
User-Defined Aggregates. https://www.postgresql.org/docs/current/xaggr.html
Using PL/SQL Collections and Records. https://docs.oracle.com/cd/B28359_01/appdev.111/b28370/collections.htm#CHDEIJHD
Using User-Defined Aggregate Functions. https://docs.oracle.com/cd/B28359_01/appdev.111/b28425/aggr_functions.htm
Vicente, I.S., Agerri, R., Rigau, G.: Simple, robust and (almost) unsupervised generation of polarity lexicons for multiple languages. In: EACL, pp. 88–97 (2014)
Wang, W., Pan, S.J., Dahlmeier, D., Xiao, X.: Recursive neural conditional random fields for aspect-based sentiment analysis. In: EMNLP, pp. 616–626 (2016)
Wang, W., Pan, S.J., Dahlmeier, D., Xiao, X.: Coupled multi-layer attentions for co-extraction of aspect and opinion terms. In: AAAI, pp. 3316–3322 (2017)
Wilson, T., Wiebe, J., Hoffmann, P.: Recognizing contextual polarity in phrase-level sentiment analysis. In: HLT/EMNLP, pp. 347–354 (2005)
Xin, H., Meng, R., Chen, L.: Subjective knowledge base construction powered by crowdsourcing and knowledge base. In: SIGMOD, pp. 1349–1361. ACM (2018)
Yan, X., Guo, J., Lan, Y., Cheng, X.: A biterm topic model for short texts. In: WWW, pp. 1445–1456 (2013)
Zadeh, L.A.: Fuzzy logic= computing with words. IEEE Trans. Fuzzy Syst. 4(2), 103–111 (1996)
Zhang, L., Wang, S., Liu, B.: Deep learning for sentiment analysis: a survey. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 8(4) (2018)
Zhong, V., Xiong, C., Socher, R.: Seq2sql: Generating Structured Queries from Natural Language Using Reinforcement Learning (2017). arXiv:1709.00103
Acknowledgements
We are grateful to the anonymous reviewers for their thorough reviews and suggestions that have greatly improved the paper. We would also like to thank Sara Evensen and Natalie Nuno for their help in the development of OpineDB, and George Mihaila, John Morales, and Ekaterina Pavlovic for their contribution in an earlier version of OpineDB.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Li, Y., Feng, A., Li, J. et al. Querying subjective data. The VLDB Journal 30, 115–140 (2021). https://doi.org/10.1007/s00778-020-00634-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-020-00634-5