Querying subjective data

Li, Yuliang; Feng, Aaron; Li, Jinfeng; Chen, Shuwei; Mumick, Saran; Halevy, Alon; Li, Vivian; Tan, Wang-Chiew

doi:10.1007/s00778-020-00634-5

Querying subjective data

Special Issue Paper
Published: 08 September 2020

Volume 30, pages 115–140, (2021)
Cite this article

The VLDB Journal Aims and scope Submit manuscript

Yuliang Li ORCID: orcid.org/0000-0002-0602-149X¹,
Aaron Feng¹,
Jinfeng Li¹,
Shuwei Chen¹,
Saran Mumick²,
Alon Halevy³,
Vivian Li¹ &
…
Wang-Chiew Tan¹

619 Accesses
1 Citation
Explore all metrics

Abstract

Online users are constantly seeking experiences, such as a hotel with clean rooms and a lively bar, or a restaurant for a romantic rendezvous. However, e-commerce search engines only support queries involving objective attributes such as location, price, and cuisine, and any experiential data is relegated to text reviews. In order to support experiential queries, a database system needs to model subjective data. Users should be able to pose queries that specify subjective experiences using their own words, in addition to conditions on the usual objective attributes. This paper introduces OpineDB, a subjective database system that addresses these challenges. We introduce a data model for subjective databases. We describe how OpineDB translates subjective queries against the subjective database schema, which is done by matching the user query phrases to the underlying schema. We also show how the experiential conditions specified by the user can be combined and the results aggregated and ranked. We demonstrate that subjective databases satisfy user needs more effectively and accurately than alternative techniques through experiments with real data of hotel and restaurant reviews.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

Fig. 6

Fig. 8

Artificial intelligence in E-Commerce: a bibliometric study and literature review

Article 18 March 2022

Semantic memory: A review of methods, models, and current challenges

Article 03 September 2020

Color associations to emotion and emotion-laden words: A collection of norms for stimulus construction and selection

Article 19 May 2015

Notes

https://dictionary.cambridge.org/us/dictionary/english/subjective.
We used an open-sourced implementation available at https://github.com/macanv/BERT-BiLSTM-CRF-NER.
We collected the F1 scores of the SemEval datasets from [62, 63] and retrained their model on the hotel dataset (10 times to get the average).

References

Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., et al.: Tensorflow: a system for large-scale machine learning. In: OSDI, pp. 265–283 (2016)
Androutsopoulos, I., Ritchie, G.D., Thanisch, P.: Natural language interfaces to databases: an introduction. Nat. Lang. Eng. 1(1), 29–81 (1995)
Article Google Scholar
Aroyo, L., Welty, C.: Truth is a lie: crowd truth and the seven myths of human annotation. AI Mag. 36(1), 15–24 (2015)
Google Scholar
Baeza-Yates, R.A.: Bias on the web. Commun. ACM 61(6), 54–61 (2018)
Article Google Scholar
Barker, K.: Combining structured and unstructured knowledge sources for question answering in watson. In: International Conference on Data Integration in the Life Sciences, pp. 53–55. Springer (2012)
Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)
Article Google Scholar
Bird, S., Loper, E.: Nltk: the natural language toolkit. In: Proceedings of the ACL 2004 on Interactive Poster and Demonstration Sessions, p. 31 (2004)
Brody, S., Elhadad, N.: An unsupervised aspect-sentiment model for online reviews. In: NAACL HLT, pp. 804–812 (2010)
Buhrmester, M., Kwang, T., Gosling, S.D.: Amazon’s mechanical turk: a new source of inexpensive, yet high-quality, data? Perspect. Psychol. Sci. 6(1), 3–5 (2011)
Article Google Scholar
Chen, B., An, B., Sun, L., Han, X.: Semi-supervised lexicon learning for wide-coverage semantic parsing. In: COLING, pp. 892–904 (2018)
Christopher, D.M., Prabhakar, R., Hinrich, S.: Introduction to information retrieval. Introd. Inf. Retriev. 151(177), 5 (2008)
MATH Google Scholar
Conneau, A., Kiela, D., Schwenk, H., Barrault, L., Bordes, A.: Supervised learning of universal sentence representations from natural language inference data. In: EMNLP, pp. 670–680 (2017)
CREATE AGGREGATE (Transact-SQL). https://docs.microsoft.com/en-us/sql/t-sql/statements/create-aggregate-transact-sql
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: Pre-training of Deep Bidirectional Transformers for Language Understanding (2018). arXiv:1810.04805
Evensen, S., Feng, A., Halevy, A., Li, J., Li, V., Li, Y., Liu, H., Mihaila, G., Morales, J., Nuno, N., Pavlovic, E., Tan, W.C., Wang, X.: Voyageur: an experiential travel search engine. In: The World Wide Web Conference, WWW’19, pp. 3511–3515 (2019)
Fagin, R.: Combining fuzzy information from multiple systems. In: PODS, pp. 216–226. ACM (1996)
Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci. 66(4), 614–656 (2003)
Article MathSciNet Google Scholar
Fast, E., Chen, B., Bernstein, M.S.: Empath: Understanding Topic Signals in Large-Scale Text. In: CHI, pp. 4647–4657 (2016)
Feng, A., Chen, S., Li, Y., Matsuda, H., Tamaki, H., Tan, W.C.: Towards Productionizing Subjective Search Systems (2020). arXiv:2003.13968
Feng, X., Kumar, A., Recht, B., Ré, C.: Towards a unified architecture for in-RDBMS analytics. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, pp. 325–336. ACM (2012)
Ganesan, K., Zhai, C.: Opinion-based entity ranking. Inf. Retriev. 15(2), 116–150 (2012)
Article Google Scholar
Generalized Linear Models: https://docs.oracle.com/cd/B28359_01/datamine.111/b28129/algo_glm.htm
Gormley, C., Tong, Z.: Elasticsearch: The Definitive Guide—A Distributed Real-Time Search and Analytics Engine. O’Reilly Media, Inc., Newton (2015)
Google Scholar
Greenplum Database Data Types. https://gpdb.docs.pivotal.io/500/ref_guide/data_types.html
Hamilton, W.L., Clark, K., Leskovec, J., Jurafsky, D.: Inducing domain-specific sentiment lexicons from unlabeled corpora. In: EMNLP, pp. 595–605 (2016)
He, R., Lee, W.S., Ng, H.T., Dahlmeier, D.: An unsupervised neural attention model for aspect extraction. In: ACL, pp. 388–397 (2017)
Hellerstein, J.M., Ré, C., Schoppmann, F., Wang, D.Z., Fratkin, E., Gorajek, A., Ng, K.S., Welton, C., Feng, X., Li, K., et al.: The madlib analytics library. In: Proceedings of the VLDB Endowment, vol. 5, no. 12 (2012)
Hu, M., Liu, B.: Mining and summarizing customer reviews. In: SIGKDD, pp. 168–177 (2004)
Hu, M., Liu, B.: Mining opinion features in customer reviews. In: AAAI, pp. 755–760 (2004)
Ilyas, I.F., Beskales, G., Soliman, M.A.: A survey of top-k query processing techniques in relational database systems. ACM Comput. Surv. (CSUR) 40(4), 11 (2008)
Article Google Scholar
Iyer, S., Konstas, I., Cheung, A., Krishnamurthy, J., Zettlemoyer, L.: Learning a neural semantic parser from user feedback. In: ACL, pp. 963–973 (2017)
Kiros, R., Zhu, Y., Salakhutdinov, R.R., Zemel, R., Urtasun, R., Torralba, A., Fidler, S.: Skip-thought vectors. In: NIPS, pp. 3294–3302 (2015)
Klement, E.P., Mesiar, R., Pap, E.: Book review: “triangular norms”. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 11(02), 257–259 (2003)
Article Google Scholar
Klir, G., Yuan, B.: Fuzzy Sets and Fuzzy Logic, vol. 4. Prentice Hall, Upper Saddle River (1995)
MATH Google Scholar
Li, F., Jagadish, H.V.: Understanding natural language queries over relational databases. SIGMOD Rec. 45(1), 6–13 (2016)
Article Google Scholar
Liu, B.: Sentiment Analysis and Opinion Mining. Morgan & Claypool, San Rafael (2012)
Book Google Scholar
MADlib: Neural Network. https://madlib.apache.org/docs/latest/group__grp__nn.html
Makris, C., Panagopoulos, P.: Improving opinion-based entity ranking. In: WEBIST, pp. 223–230 (2014)
Miao, Z., Li, Y., Wang, X., Tan, W.C.: Snippext: semi-supervised opinion mining with augmented data. In: The World Wide Web Conference, WWW’20 (2020)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space (2013). arXiv:1301.3781
Motro, A.: Vague: a user interface to relational databases that permits vague queries. ACM Trans. Inf. Syst. (TOIS) 6(3), 187–214 (1988)
Article Google Scholar
Pontiki, M., Galanis, D., Papageorgiou, H., Androutsopoulos, I., Manandhar, S., Mohammad, A.S., Al-Ayyoub, M., Zhao, Y., Qin, B., De Clercq, O., et al.: Semeval-2016 task 5: aspect based sentiment analysis. In: SemEval-2016, pp. 19–30 (2016)
Pontiki, M., Galanis, D., Papageorgiou, H., Manandhar, S., Androutsopoulos, I.: Semeval-2015 task 12: aspect based sentiment analysis. In: Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), pp. 486–495 (2015)
Pontiki, M., Galanis, D., Pavlopoulos, J., Papageorgiou, H., Androutsopoulos, I., Manandhar, S.: Semeval-2014 task 4: aspect based sentiment analysis. In: Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), pp. 27–35 (2014)
Poon, H.: Grounded unsupervised semantic parsing. In: ACL, pp. 933–943 (2013)
Popescu, A., Etzioni, O., Kautz, H.A.: Towards a theory of natural language interfaces to databases. In: IUI, pp. 149–157 (2003)
PostgreSQL: Arrays. https://www.postgresql.org/docs/current/arrays.html
Qiu, G., Liu, B., Bu, J., Chen, C.: Opinion word expansion and target extraction through double propagation. COLING 37(1), 9–27 (2011)
Google Scholar
Rehurek, R., Sojka, P.: Gensim-statistical semantics in python. (2011)
Rothe, S., Ebert, S., Schütze, H.: Ultradense word embeddings by orthogonal transformation. In: NAACL HLT, pp. 767–777 (2016)
Sang, E.F., De Meulder, F.: Introduction to the Conll-2003 Shared Task: Language-Independent Named Entity Recognition (2003). arXiv:cs/0306050
Savenkov, D., Agichtein, E.: When a knowledge base is not enough: question answering over knowledge bases with external text data. In: SIGIR, pp. 235–244 (2016)
Suciu, D., Olteanu, D., Ré, C., Koch, C.: Probabilistic Databases. Synthesis Lectures on Data Management. Morgan & Claypool Publishers, San Rafael (2011)
MATH Google Scholar
Tai, Y., Kao, H.: Automatic domain-specific sentiment lexicon generation with label propagation. In: IIWAS, p. 53 (2013)
The Booking.com Dataset. https://www.kaggle.com/jiashenliu/515k-hotel-reviews-data-in-europe
The Yelp Dataset. https://www.yelp.com/dataset
Trummer, I., Halevy, A.Y., Lee, H., Sarawagi, S., Gupta, R.: Mining subjective properties on the web. In: SIGMOD, pp. 1745–1760 (2015)
User-Defined Aggregates. https://www.postgresql.org/docs/current/xaggr.html
Using PL/SQL Collections and Records. https://docs.oracle.com/cd/B28359_01/appdev.111/b28370/collections.htm#CHDEIJHD
Using User-Defined Aggregate Functions. https://docs.oracle.com/cd/B28359_01/appdev.111/b28425/aggr_functions.htm
Vicente, I.S., Agerri, R., Rigau, G.: Simple, robust and (almost) unsupervised generation of polarity lexicons for multiple languages. In: EACL, pp. 88–97 (2014)
Wang, W., Pan, S.J., Dahlmeier, D., Xiao, X.: Recursive neural conditional random fields for aspect-based sentiment analysis. In: EMNLP, pp. 616–626 (2016)
Wang, W., Pan, S.J., Dahlmeier, D., Xiao, X.: Coupled multi-layer attentions for co-extraction of aspect and opinion terms. In: AAAI, pp. 3316–3322 (2017)
Wilson, T., Wiebe, J., Hoffmann, P.: Recognizing contextual polarity in phrase-level sentiment analysis. In: HLT/EMNLP, pp. 347–354 (2005)
Xin, H., Meng, R., Chen, L.: Subjective knowledge base construction powered by crowdsourcing and knowledge base. In: SIGMOD, pp. 1349–1361. ACM (2018)
Yan, X., Guo, J., Lan, Y., Cheng, X.: A biterm topic model for short texts. In: WWW, pp. 1445–1456 (2013)
Zadeh, L.A.: Fuzzy logic= computing with words. IEEE Trans. Fuzzy Syst. 4(2), 103–111 (1996)
Article Google Scholar
Zhang, L., Wang, S., Liu, B.: Deep learning for sentiment analysis: a survey. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 8(4) (2018)
Zhong, V., Xiong, C., Socher, R.: Seq2sql: Generating Structured Queries from Natural Language Using Reinforcement Learning (2017). arXiv:1709.00103

Download references

Acknowledgements

We are grateful to the anonymous reviewers for their thorough reviews and suggestions that have greatly improved the paper. We would also like to thank Sara Evensen and Natalie Nuno for their help in the development of OpineDB, and George Mihaila, John Morales, and Ekaterina Pavlovic for their contribution in an earlier version of OpineDB.

Author information

Authors and Affiliations

Megagon Labs, Mountain View, USA
Yuliang Li, Aaron Feng, Jinfeng Li, Shuwei Chen, Vivian Li & Wang-Chiew Tan
University of Pennsylvania, Philadelphia, USA
Saran Mumick
Facebook AI, California, USA
Alon Halevy

Authors

Yuliang Li
View author publications
You can also search for this author in PubMed Google Scholar
Aaron Feng
View author publications
You can also search for this author in PubMed Google Scholar
Jinfeng Li
View author publications
You can also search for this author in PubMed Google Scholar
Shuwei Chen
View author publications
You can also search for this author in PubMed Google Scholar
Saran Mumick
View author publications
You can also search for this author in PubMed Google Scholar
Alon Halevy
View author publications
You can also search for this author in PubMed Google Scholar
Vivian Li
View author publications
You can also search for this author in PubMed Google Scholar
Wang-Chiew Tan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuliang Li.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, Y., Feng, A., Li, J. et al. Querying subjective data. The VLDB Journal 30, 115–140 (2021). https://doi.org/10.1007/s00778-020-00634-5

Download citation

Received: 31 January 2020
Revised: 04 August 2020
Accepted: 28 August 2020
Published: 08 September 2020
Issue Date: January 2021
DOI: https://doi.org/10.1007/s00778-020-00634-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Querying subjective data

Abstract

Access this article

Similar content being viewed by others

Artificial intelligence in E-Commerce: a bibliometric study and literature review

Semantic memory: A review of methods, models, and current challenges

Color associations to emotion and emotion-laden words: A collection of norms for stimulus construction and selection

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Querying subjective data

Abstract

Access this article

Similar content being viewed by others

Artificial intelligence in E-Commerce: a bibliometric study and literature review

Semantic memory: A review of methods, models, and current challenges

Color associations to emotion and emotion-laden words: A collection of norms for stimulus construction and selection

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation