Elsevier

Artificial Intelligence

Volume 310, September 2022, 103744
Artificial Intelligence

VoCSK: Verb-oriented commonsense knowledge mining with taxonomy-guided induction

https://doi.org/10.1016/j.artint.2022.103744Get rights and content

Abstract

Commonsense knowledge acquisition is one of the fundamental issues in realizing human-level AI. However, commonsense knowledge is difficult to obtain because it is a human consensus and rarely explicitly appears in texts or other data. In this paper, we focus on the automatic acquisition of a typical kind of implicit verb-oriented commonsense knowledge (e.g., “person eats food”), which is the concept-level knowledge of verb phrases. For this purpose, we propose a taxonomy-guided induction method to mine verb-oriented commonsense knowledge from verb phrases with the help of a probabilistic taxonomy. First, we design an entropy-based triplet filter to cope with noisy verb phrases. Then, we propose a joint model based on the minimum description length principle and a neural language model to generate verb-oriented commonsense knowledge. Besides, we introduce two strategies to accelerate the computation, including the simulated annealing-based approximate solution and the verb phrase clustering method. Finally, we conduct extensive experiments to prove that our solution is more effective than competitors in mining verb-oriented commonsense knowledge. We construct a commonsense knowledge base called VoCSK, containing 259 verbs and 18,406 verb-oriented commonsense knowledge. To verify the usefulness of VoCSK, we utilize the knowledge in this KB to improve the model performance on two downstream applications.

Introduction

Commonsense is the inherent background knowledge of humans in the cognitive process [49]. Although current intelligent systems surpass humans in many tasks such as reading comprehension [23] and machine translation [18], intelligent machines still lag behind humans in performing simple tasks. For example, given the sentence “The trophy would not fit in the brown suitcase because it was too big.” Levesque et al. [26], it is difficult for machines to accurately determine whether “it” in the text refers to “trophy” or “suitcase”. Instead, this problem is very easy for humans since they possess a great deal of commonsense knowledge (CSK) and reasoning ability. Unfortunately, modern machines still lack massive CSK. Thus, it is crucial to endow machines with CSK.

Among various kinds of CSK, verb-oriented CSK is especially important for machines to achieve human-level AI. Verbs, in general, are crucial for the understanding of natural language and thus are widely applicable in NLP tasks such as semantic role labeling [16], word sense disambiguation [11], and query understanding [52]. For example, given a query watch harry potter, the information retrieval (IR) system can understand that harry potter is a movie or a DVD instead of a book through the verb watch. The most fundamental CSK about a specific verb (e.g., eat) is what kind of subjects (e.g., person) will act on what kind of objects (e.g., food). One distinguishing characteristic of verb-oriented CSK is that it is only meaningful when expressed at the concept level. It is because verb-oriented knowledge at the instance level is so specific that it can only be factual knowledge. For example, “John eats bread” is just a trivial fact and cannot be considered as CSK since John and bread can be replaced by other specific words, such as Helen and apple. Humans exhibit intelligence not only because we can understand the meaning of trivial facts such as “John eats bread”, but also because we can understand that of “person eats food”. This means even given a new person (e.g., Wilbur) and a strange fruit (e.g., plantain), we also understand what “Wilbur eats plantain” means.

In this paper, we focus on the automatic acquisition of implicit verb-oriented CSK, which is the concept-level knowledge of verb phrases (VPs) consisting of a subject, verb, and object. Specifically, our task accepts instance-level VPs (e.g., “John eats bread”, “Helen eats apple”, etc.) as the input and outputs concept-level assertions (e.g., “person eats food”). VPs at the instance level often emphasize the relation between a specific subject and object, while those at the concept level describe the relation between concepts representing the common characteristics of a group of instances.

We argue that VPs at the concept level are rarely covered in existing verb-oriented knowledge bases (KBs). There are two reasons to support this argument. First, the concepts we mine for each verb are neither too abstract nor too specific (see Section 4.2 for details), while the thematic roles (e.g., Agent and Cause) in existing verb-oriented KBs, such as VerbNet [27], FrameNet [2], and PropBank [36], are often too abstract to differentiate verbs' semantics. For example, PropBank defines some general thematic roles that can be applied to any verb. Second, the coverage of the thematic roles for verbs is often limited since it is hard to manually identify a large amount of high-quality thematic roles. For example, only 23 pre-defined thematic roles are listed in VerbNet. In contrast, the concepts in a probabilistic taxonomy that we use in this paper are in the millions. Hence, it is necessary to mine VPs at the concept level to complement existing verb-oriented KBs. To more clearly distinguish our mined CSK from the knowledge in existing verb-oriented KBs, some examples from these KBs are given in Table 1.

It is difficult for most existing work to automatically acquire verb-oriented CSK since their methods are manual-based or data-driven. Manual-based methods resort to knowledge engineers or volunteers to obtain CSK by hand [30]. Many traditional commonsense KBs are built in this way, such as WordNet [34], Cyc [25], and ConceptNet [30]. Although hand-crafted CSK is of high quality, its coverage is limited for two reasons. First, manual-based methods are time-consuming and labor-intensive, hurting the recall of CSK acquisition. Second, CSK obtained by these methods is often not enough since only the CSK that people can think of can be obtained. To improve the recall, data-driven approaches are proposed to automatically collect CSK from large corpora. Typical efforts along this line include the discovery of CSK inference rules from the text [29], [5], the extraction of concept attributes from query logs [38], the construction of extensible ontology by linking Wikipedia and WordNet [14], and the mining of specific relations (e.g., Comparative [47] and Part-Whole [48] relation) from the Web. However, most of these methods focus on acquiring knowledge directly from corpora and thus are limited to extracting explicit CSK, without the ability to harvest implicit knowledge (e.g., “person eats food”) that rarely appears in corpora. Furthermore, the coverage of commonsense KBs is upper-bounded by the number of the pre-defined relations. For example, there are only about 10 relations in WordNet and 50 relations in ConceptNet 5.7.

Hence, in this paper, we propose an induction-based approach to automatically acquire verb-oriented CSK from VPs with the help of a large-scale probabilistic taxonomy. The core idea is to conceptualize the subjects and objects in VPs with isA relations in a probabilistic taxonomy. For example, “person eats food” can be induced by conceptualizing the VPs “Helen eats apple”, “John eats bread”, and “Michael eats egg” with isA(Helen/John/Michael, person) and isA(apple/bread/egg, food). There are two reasons why verb-oriented CSK can be obtained through the induction-based approach. On the one hand, great efforts have been devoted to constructing Web-scale probabilistic taxonomies, and most of them (e.g., Probase [54]) are available. On the other hand, with the rapid development of the Web, VPs at the instance level are widely available in the Web corpora, e.g., Google Syntactic N-Grams.1 Hence, such abundant instance VPs and large-scale probabilistic taxonomies enable the induction method to harvest rich conceptual CSK.

However, it is not easy to mine high-quality verb-oriented CSK, and several challenges need to be solved:

  • How to determine which VPs can be induced as verb-oriented CSK. In fact, we need to carefully select VPs that are easily conceptualized as CSK since some VPs are already abstract enough. For example, the VP “man eats fruit” is abstract enough to be viewed as the verb-oriented CSK.

  • How to select a concept with appropriate granularity for an instance. In general, an instance always has thousands of concepts in a large-scale probabilistic taxonomy. Some concepts are overly specific, and others are overly abstract. It is difficult to find a good trade-off between specificity and abstraction of the concept for a given instance. For example, given an instance bread, it could be conceptualized as staple food, food, or object. However, only food is appropriate since staple food is too specific and object is too abstract.

  • How to measure the semantic plausibility of the induced verb-oriented CSK. For example, although name is an appropriate concept for the instance John, the candidate “name eats food” is implausible.

To solve the above challenges, we design two modules in this paper. The first module is an entropy-based metric used to solve the first challenge. This metric is employed to measure the abstractness of the subject and object in each VP. The second module is a verb-oriented CSK generator used to solve the second and third challenges. The generator is realized as a joint optimization model based on the minimum description length (MDL) principle and a neural language model (NLM). MDL is used to select an appropriate concept for an instance, and NLM is employed to evaluate the plausibility of the candidate VPs at the concept level. Furthermore, we introduce two strategies to accelerate the computation of the objective function, including the simulated annealing-based approximate solution and the verb phrases clustering method.

Contributions. The contributions of this paper are summarized as follows:

  • To the best of our knowledge, we are the first to acquire implicit verb-oriented CSK automatically. The most significant characteristics of the target CSK are at the concept level and rarely explicitly stated in corpora.

  • We propose a joint optimization model based on the MDL principle and NLM to generate high-quality verb-oriented CSK. Besides, we also propose an entropy-based metric to identify noisy input VPs.

  • We conduct extensive experiments on real-world datasets, and the results prove the effectiveness of our approach. Finally, we harvest 259 verbs and 18,406 verb-oriented CSK to form a commonsense KB called VoCSK. To verify the usefulness of this KB, we utilize the knowledge in VoCSK to improve the model performance on two real-world tasks, including context-aware conceptualization and commonsense question answering.

The rest of this paper is organized as follows. Section 2 discusses the related work of this paper. Section 3 gives the background of probabilistic taxonomies. Section 4 formulates the problem and briefly introduces our solution for verb-oriented CSK acquisition. Section 5 describes an entropy-based metric to identify noisy input verb triplets. Section 6 details the generation of verb-oriented CSK with a joint optimization model. Section 7 gives two strategies to speed up the computation of the objective function. The experiments are reported in Section 8, and our conclusion and future work are given in Section 9.

This paper is extended from our previous work [31] to provide a more comprehensive analysis. First, we add the related work (Section 2) and preliminary (Section 3). Second, we add more details on using an NLM to measure the plausibility of candidate commonsense triplets (Section 6.2) and introduce two strategies to accelerate the computation of the objective function (Sections 7.1 and 7.2). Third, we also analyze the complexity and feasibility of our method (Section 7.3). Fourth, to verify that most of the knowledge in VoCSK rarely appears in existing KBs and corpora, we add two matching experiments between 1) VoCSK and existing KBs as well as 2) VoCSK and corpora (Section 8.2). Fifth, we add extensive comparison experiments to illustrate the rationality of the hyper-parameter settings in this paper (Section 8.3). Sixth, we add the comparison experiments of the entropy-based triplet filter, verb-oriented CSK generator, and optimization algorithms to evaluate the effectiveness of our methods (Section 8.4). Last, and most important, we add two downstream applications to assess the usefulness of our mined VoCSK (Sections 8.5 and 8.6).

Section snippets

Related work

Related work in this paper can be divided into three groups: commonsense acquisition, conceptualization, and other topics.

Commonsense Acquisition. The CSK acquisition has attracted a great deal of research interest. These methods can be divided into two categories. First, knowledge engineers and volunteers were asked to collect CSK manually. For example, in Cyc [25], commonsense facts were crafted by human experts using the CycL representation language. A lexical commonsense KB called WordNet

Preliminary

In this section, we describe the background of a large-scale probabilistic taxonomy. Based on it, two definitions are further given. Some typical notations used in this paper are shown in Table 2.

We use a large-scale probabilistic taxonomy (e.g., Probase [54]) to provide massive fine-grained concepts for subjects and objects in VPs. The taxonomy is a large semantic network that consists of isA relations between terms. For example, google isA company where google is the hyponym of company. The

Overview

In this section, we first formalize the problem and then outline our solution for verb-oriented CSK generation. The CSK generation of different verbs is independent of each other. Hence, we analyze each verb and its phrases separately. In the following paragraphs, we discuss our solution for a given verb.

Entropy-based triplet filter

In this section, we detail the entropy-based triplet filter with the help of a probabilistic taxonomy. As we mentioned above, we identify the noisy verb triplet by measuring the abstractness of its subject and object. According to our observation, the specific terms (subjects or objects) tend to be positioned at the lower level, while abstract terms are usually located at the higher level in a probabilistic taxonomy, as shown in Fig. 2. In this paper, the level of leaf nodes (i.e., the most

Verb-oriented CSK generator

In this section, we elaborate on our generator for acquiring verb-oriented CSK. First, the MDL principle is used to select appropriate concepts for a bag-of-subjects (objects). An NLM is then employed to measure the plausibility of candidate commonsense triplets. Finally, based on the MDL principle and NLM, a joint model is proposed to generate verb-oriented CSK for a given set of the remaining verb triplets after filtering.

Optimization algorithms

Unfortunately, the exhaustive enumeration of concepts in optimizing objective Eq. (16) is costly since an instance in a probabilistic taxonomy always has thousands of concepts. Besides, unrelated triplets are difficult to be conceptualized as appropriate verb-oriented CSK. For example, given the triplets (John,eat,apple) and (Robin,eat,wasp), it is hard to acquire the appropriate concepts of subjects and objects, respectively. To solve these problems, we propose two strategies to speed up the

Experiments

In this section, we first report the statistics of the constructed VoCSK and evaluate whether the knowledge in VoCSK rarely appears in corpora and existing commonsense KBs. Then, we conduct extensive experiments to analyze the hyper-parameters in our methods and evaluate the effectiveness of the methods. We finally use the mined verb-oriented CSK to enhance the model performance on two downstream applications.

Conclusion and discussion

In this paper, we focus on the automatic acquisition of a typical kind of implicit verb-oriented CSK that rarely appears in corpora. To this end, we propose a taxonomy-guided induction approach to mine CSK from verb phrases with the help of a probability taxonomy. Specifically, we design two modules to achieve this purpose. The first is an entropy-based metric to identify the noisy input phrases. The second is a joint model based on the MDL principle and an NLM to generate verb-oriented

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (57)

  • J. Rissanen

    Modeling by shortest data description

    Automatica

    (1978)
  • A.V. Aho et al.

    Efficient string matching: an aid to bibliographic search

    Commun. ACM

    (1975)
  • C.F. Baker et al.

    The Berkeley framenet project

  • A. Barron et al.

    The minimum description length principle in coding and modeling

    IEEE Trans. Inf. Theory

    (1998)
  • Y. Bengio et al.

    A neural probabilistic language model

    J. Mach. Learn. Res.

    (2003)
  • J. Berant et al.

    Global learning of typed entailment rules

  • K. Bollacker et al.

    Freebase: a collaboratively created graph database for structuring human knowledge

  • J. Cheng et al.

    Contextual text understanding in distributional semantic space

  • T.M. Cover et al.

    Elements of Information Theory

    (2012)
  • R. Cummins et al.

    A Pólya urn document language model for improved information retrieval

    ACM Trans. Inf. Syst.

    (2015)
  • H. Dai et al.

    Ultra-Fine Entity Typing with Weak Supervision from a Masked Language Model, 1790–1799

    (2021)
  • H.T. Dang

    Investigations into the Role of Lexical Semantics in Word Sense Disambiguation

    (2004)
  • J. Devlin et al.

    BERT: pre-training of deep bidirectional transformers for language understanding

  • M. Ester et al.

    A density-based algorithm for discovering clusters in large spatial databases with noise

  • M. Fabian et al.

    YAGO: a core of semantic knowledge unifying WordNet and Wikipedia

  • J.L. Fleiss

    Measuring nominal scale agreement among many raters

    Psychol. Bull.

    (1971)
  • D. Gildea et al.

    Automatic labeling of semantic roles

    Comput. Linguist.

    (2002)
  • A. Gupta et al.

    Taxonomy induction using hypernym subsequences

  • H. Hassan et al.

    Achieving human parity on automatic Chinese to English news translation

  • W. Hua et al.

    Short text understanding through lexical-semantic analysis

  • Z. Huang et al.

    Autoname: a corpus-based set naming framework

  • S. Kirkpatrick et al.

    Optimization by simulated annealing

    Science

    (1983)
  • O. Kurland et al.

    Pagerank without hyperlinks: structural reranking using links induced by language models

    ACM Trans. Inf. Syst.

    (2010)
  • Z. Lan et al.

    Albert: a lite bert for self-supervised learning of language representations

  • J. Lehmann et al.

    Dbpedia–a large-scale, multilingual knowledge base extracted from Wikipedia

    Semant. Web

    (2015)
  • D.B. Lenat

    Cyc: a large-scale investment in knowledge infrastructure

    Commun. ACM

    (1995)
  • H. Levesque et al.

    The winograd schema challenge

  • B. Levin

    English Verb Classes and AlternationA Preliminary Investigation

    (1993)
  • Cited by (0)

    This work was supported by National Key Research and Development Project (No. 2020AAA0109302), Shanghai Science and Technology Innovation Action Plan (No. 19511120400) and Shanghai Municipal Science and Technology Major Project (No. 2021SHZDZX0103).

    View full text