Skip to main content
Log in

Construction-Based Compositional Grammar

  • Published:
Journal of Logic, Language and Information Aims and scope Submit manuscript

Abstract

The paper presents a system for construction classification representing multiple levels of specification, such as grammatical functions, grammatically reflected actants, and lexical semantics, aligned with a compositional system of sign combination mediating between a construction perspective and a valence perspective. The system uses a feature structure formalism based on Head-Driven Phrase Structure Grammar (HPSG) but with essential elements from Lexical Functional Grammar (LFG; cf. Bresnan in Lexical functional syntax. Blackwell, Oxford, 2001), and has as implementation background large scale HPSG grammars. While on the one extreme being able to encode word level selection in multi-word patterns, the system on the other provides a compact format for construction specification, allowing for cross-language comparison both in construction and valence frame inventories. Pivotal in these capacities as well as in sign formalization in general are the grammatical functions. The paper motivates the usefulness of the various functionalities and illustrates the way in which they work together in a formally uniform system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. Computational grammars of course also have role to play in NLP, not least in cooperation with statistically based methods; our present focus however lies on the design of such grammars per se.

  2. Among essential general introductions, see Pollard and Sag (1994), Sag and Wasow (1999), Sag et al. (2003), Copestake (2002).

  3. This being said, the system can of course be used also for small grammar fragments.

  4. Other initiatives related to such a concept include Grammatical Framework (Ranta 2011), UDS (Nivre and Fang 2017), CoreGram (Müller 2015), and the HPSG Grammar Matrix (Bender et al. 2002, 2010). The latter, having already successfully contributed to the existence of many grammars in the DELPH-IN consortium (see footnote 7), is especially close to the present project, TypeGram differing from it in pursuing the amendments focused on in this paper, while ‘The Matrix’ keeps to the standard structures; they have in common the development of ‘libraries’ of types and rules representing a diversity of phenomena from across languages.

  5. It will be found on https://typecraft.org/tc2wiki/TypeGram, a site from which a grammar matrix accommodating phenomena in the verbal syntax of Germanic languages, Kwa, Bantu and Ethio-Semitic can be downloaded, practiced with, and used for the construction of grammars within these families. Test corpora for Ga (Kwa) and Kistaninya (Ethio-Semitic) are of about 100 sentences; cf. Section 4.4.

  6. Cf. Hellan and Bruland (2015) and https://typecraft.org/tc2wiki/Norwegian_HPSG_grammar_NorSource. The grammar has 90,000 lexical entries, 3-400 grammar rules and 2-300 valence types, analyses corpora of up to 22,000 sentences, and sustains online applications in grammar correction (https://typecraft.org/tc2wiki/A_Norwegian_Grammar_Sparrer), a valence corpus (https://typecraft.org/tc2wiki/Norwegian_Valency_Corpus), a web demo (http://regdili.hf.ntnu.no:8081/linguisticAce/parse), and more. Norsource belongs to the family of grammars in the DELPH-IN consortium (http://www.delph-in.net/wiki/index.php/Home), counting grammars of considerable size of 8–10 languages, whose formal designs are to a large extent derived from the English Resource Grammar (ERG; cf. Flickinger 2002, 2011), mediated by the HPSG Grammar Matrix (Bender et al 2002, 2010; cf. also fn. 6). While this enterprise successfully captures many aspects of what is referred to as ‘multilingual grammar engineering’, it has not yet reached a stage where computational grammars can be juxtaposed for simple search as to what they declare the languages encoded as having in common or as differentiating them.

  7. Jean Chavula, p.c.

  8. ‘1SM’ for ‘subject marker for noun class 1’, ‘1OM’ for ‘object marker for noun class 1’, ‘PST’ for ‘past tense’, Caus for ‘causative’, and ‘FV’ for ‘final vowel’, these being the meanings and grammatical features associated with the hyphenated morphs.

  9. Grammatical Functions are assumed widely in traditional grammar, and in theoretical work especially in Tesnière (1959) and in LFG (Bresnan 2001). See 2.3 and 5.1.

  10. See Copestake (2002).

  11. In formal and computational grammars using this notation one can derive predicate logic-like formulas; cf. Copestake et al. (2005).

  12. The Paninian k-system is the earliest in this tradition (cf. Staal and Frits 1972). PropBank’s (http://verbs.colorado.edu/~mpalmer/projects/verbnet.html) use of ARG0, ARG1, … is similar to the k-roles in that both represent fixed roles, contrary to the ACTNT system. A less formal convention is ‘role’ indications in the style of ‘kicker’ and ‘kickee’ for a verb like kick, where the –er suffix corresponds to the one who kicks and the –ee suffix to the item kicked. When this convention is applied to any kind of verb (as envisaged, e.g., in Sag et al. 2003), it reduces to a counterpart of the enumeration aspect of ACT1, ACT2, however in a formally non-manageable way, since there must be a supertype corresponding to each verb for each –er/ee pair. Relative to semantic designs in HPSG, our proposal for situational specification—see below—has perhaps most in common with Davis (2000).

  13. ACTNT specifications can be enriched with role and other semantic information, as explored in Beermann and Hellan (2004), and Hellan and Beermann (2005). The level SIT presently to be developed contains still richer information. Representations at the ACTNT level often closely mirror syntactic structure, but a notable exception is an analysis of comparative constructions implemented in Norsource built on a Montague-style analysis first proposed in Davis and Hellan (1975).

  14. The discussion covers the type of semantic space explored in traditions like Lexical Semantics and Situation Semantics (Barwise and Perry 1983); a division between such a space on the one hand and a more grammatically defined argument structure has been proposed in Melchuk (2004), and in another form in Grimshaw (2005), Levin and Hovav (2005) and Hovav and Levin (1998). Relative to semantic designs in HPSG, our proposal for situational specification has most in common with Davis (2000).

  15. In (5), ‘zip-lock’ refers to interlocking interplay between legs in walking/running.

  16. Example provided by Clement Appah, p.c.

  17. Many phenomena have been characterized in terms of the notion’light verb’; Butt (2010) may be seen for a summary of many of them. The usage of the term here employed goes back at least to Jespersen (1965), more recently Grimshaw and Mester 1988, and is a topic of much current attention, see, e.g., Pompei and Piunno (2015) for European languages. It is well recognized that LVCs as here understood constitute a major category in, e.g., Persian (cf. Karimi-Doostan 2011) and in Indic languages.

  18. Among European languages, selection in LVCs is far more pervasive in Norwegian than in, e.g., Romance languages, cf. Hellan (2016, 2017).

  19. Within the DELPH-IN grammars another feature by which a non-head constituent can select its head is the feature ‘SPEC’ by which a specifier selects the head noun. Once in the system a feature occurs generally, although possibly restricted to a given part of speech, such as noun in this case.

  20. The attribute COMP in LFG, in contrast, represents a standardly linked complement clause, and will, when acting as a direct object, belong to a CP type introduced by tr, followed by obDECL if declarative—cf. the next subsection on the formalism.

  21. It is clear that the notion ‘Construction Profile’ can be construed as a subtype of sign, but how exactly? The type sign at its most general might be said to have three attributes, PHON, ORTH, and MEANING. Reflecting the assumption that signs express either entities/things, or situations, a first subdivision could be equating MEANING with either PROPT (for Property of thing) or SIT(uation), the latter characterizing the Construction Profile (CP) type of sign, the former the Description Profile (DP) type of sign. The CP type often goes with V as head POS, but not necessarily; it often has a grammar profile as encoded in the attributes GF and ACTNT, although perhaps not always. What the corresponding attributes are in the DP type, we also leave open for now.

  22. As attested by Norsource.

  23. For instance, Bangla, Gurune (Gur), Luganda (Bantu), and further for English and German. See, for instance, https://typecraft.org/tc2wiki/Category:Valence_by_language, and initial discussion of ‘valency pods’ and ‘valency classes’ in Hellan et al. (2017), Hellan (2019b), Dakubu and Hellan (2016).

  24. It may be noted that specific properties of subjects and objects can be marked also in the CL notation, for instance, the MWE (cf. 3.4) ta feil can be encoded for its selection and idiosyncratic semantics with a label like v”ta”-tr-ob”feil”-BEWRONG, where v”ta” and ob”feil” read as verb and (head of the) object consisting of the strings “ta” and “feil”, respectively. Also instances of ‘irregularity’ in verbal phraseology can thus be represented in the system.

  25. An essential principle in the design of such lists is to avoid distinguishing items in terms of their being in different positions in the list, as when numbering items in a list–numbering represents no information about the item itself, and such a system is typically brittle in its lack of accommodation of ‘new’ items occupying places between items with adjacent enumeration codes.

  26. For fuller description see Hellan and Beermann (2014). This architecture is so far instantiated only for smaller size grammars and corpora. A construction/valence profile for Ga can be seen on https://typecraft.org/tc2wiki/Ga_Valence_Profile.

  27. A special test suite facility is one described in Hellan and Beermann (2014), where the sentences are annotated for POS and morphological properties in addition to CL templates, and provided with English glosses in the standard style of Interlinear Glossed Text (IGT), as in (1) above; the corpora are hosted in the online glosser and text database TypeCraft (http://typecraft.org; Beermann and Mihaylov (2014)), from where annotations at all levels can be exported by XML into rule files of computational grammars being constructed for the languages in question, as an aspect of Grammar Induction.

  28. A format for cross-linguistic valence representation is instantiated in the online multilingual valence dictionary MultiVal URL which includes not only grammars constructed using the CL notation (Norwegian and Ga) but other DELPH-IN grammars as well cf. http://regdili.hf.ntnu.no:8081/multilanguage_valence_demo/multivalence and Hellan et al. 2014.

  29. For instance, the CL template for (1) will be ditrCs-suC_obCsu_ob2Cob. This means a ditransitive resulting from Causativization, with root transitive, that the subject has been created through Causativization, that the object has been ‘changed’ from subject through Causativization, and that the ob2 has been ‘changed’ from object through Causativization. Each slot 3 MCUs label has an AVM which represents the constituent through its derivational stages, and like all slot 3 MCUs unify with the Core slot specification (ditrCs). For details, see Hellan (2019c, chapter 4).

  30. Such as Ackermann and Webelhuth (1997).

  31. The computational perspective allows for a technical observation related to the grammar Norsource’s use of GFs. While the computational grammar formalism in question disallows the specification of a list within the specification of a list, there are cases in Norwegian where one descriptively wants to state dependencies corresponding to such a situation; thus, in Norwegian it is frequently the case that in the COMPS list of a verb selecting a PP, the verb’s selection concerns not only the head preposition but also whether the item governed by the preposition is a noun, an infinitive, a declarative subordinate clause or an interrogative subordinate clause (the latter as in Jeg lurer på om hun kommer (lit.’I wonder about whether she comes’)). Such specifications represent selection by the verb, but must be done inside the COMPS list of the preposition, thus a case of specification in a COMPS list inside of a COMPS list. Once GFs are resorted to, there is an alternative feature path to the PP as introduced by the GF OBL in the verb’s GF, and the head of the preposition’s governee is reached by specifications within the GF OBJ in the path initiated by OBL; such ‘double-step’ selection is thus accommodated using GFs. This specification operates in tandem with list cancellation relative to the verb.

  32. A partial motivation behind the adoption of ARG-ST is a claim in Pollard and Sag (1994) that certain linguistic phenomena can be most adequately analyzed by means of a list construct like ARG-ST if this is interpreted as a kind of dominance hierarchy. The main example adduced is the definition of environments where an anaphor can find a licit binder, viz., that a binder must precede the anaphor on the ARG-ST list, assuming that binder and bindee are indeed members of the same argument structure. As argued in Hellan (2005) concerning binding of reflexives in Norwegian, this is patently false, since neither long distance seg nor possessive reflexive sin find their binder within the domain defined by a common ARG-ST list. Moreover, for ‘locally bound’ reflexives whose binder would be in the same ARG-ST list, even if the relevant properties were defined in the ARG-ST list, identifying the items in the list would face the problem of quantification over members in a list; and moreover, to the extent that the critical relation were one of dominance in terms of a role hierarchy, establishing such a relation in a list of roles would again necessitate quantification over items in a list, and thus not be available as a construct in the present framework. How significant these points are relative to possible other considerations favoring the use of ARG-ST is a matter we will not go into here, but it seems that from whatever viewpoint one considers the matter, there is little reason to try to avoid the use of GFs in an HPSG based system.

  33. For suggestions see Hellan (2019b).

  34. As for the issue of lexical entry proliferation, cf. Hellan (2019a).

References

  • Ackermann, F., & Webelhuth, G. (1997). A theory of predicates. Stanford: CSLI Publications.

    Google Scholar 

  • Barwise, J., & Perry, J. (1983). Situations and attitudes. Cambridge, MA: MIT Press.

    Google Scholar 

  • Beermann, D., & Hellan, L. (2004). A treatment of directionals in two implemented HPSG grammars. In: S. Müller (Ed.), Proceedings of the HPSG04 conference. Katholieke Universiteit Leuven. CSLI Publications, http://csli-publications.stanford.edu/.

  • Beermann, D., & Mihaylov, P. (2014). Collaborative databasing and resource sharing for linguists. Languages Resources and Evaluation, 48, 1–23.

    Article  Google Scholar 

  • Bender, E. M., Drellishak, S., Fokkens, A., Poulson, L., & Saleem, S. (2010). Grammar customization. Research on Language and Computation, 8(1), 23–72.

    Article  Google Scholar 

  • Bender, E. M., Flickinger, D., & Oepen, S. (2002). The grammar matrix: An open-source starterkit for the rapid development of cross-linguistically consistent broad-coverage precision grammars. In Proceedings of the workshop on grammar engineering and evaluation. Coling 2002, Taipei.

  • Bresnan, J. (2001). Lexical functional syntax. Oxford: Blackwell.

    Google Scholar 

  • Butt, M. (2010). The light verb jungle: Still hacking away. In M. Amberber, M. Harvey, & B. Baker (Eds.), Complex predicates in cross-linguistic perspective (pp. 48–78). Cambridge: Cambridge University Press.

    Chapter  Google Scholar 

  • Copestake, A. (2002). Implementing typed feature-structure grammars. Stanford: CSLI Publications.

    Google Scholar 

  • Copestake, A., Flickinger, D., Sag, I., & Pollard, C. (2005). Minimal recursion semantics: An introduction. Journal of Research on Language and Computation, 3, 281–332.

    Article  Google Scholar 

  • Dakubu, M. E. K., & Hellan, L. (2016). Verb classes and valency classes in Ga. In: Presented at SyWAL II (Symposium on West African Languages), Vienna.

  • Dakubu, M. E. K., & Hellan, L. (2017). A labeling system for valency: Linguistic coverage and applications. In L. Hellan, A. Malchukov, & M. Cennamo (Eds.), Contrastive studies in verbal valency. Amsterdam: J. Benjamins.

    Google Scholar 

  • Davis, A. (2000). The hierarchical lexicon. Stanford: CSLI Publications.

    Google Scholar 

  • Davis, C., & Hellan, L. (1975). An integrated analysis of comparatives. Unpubl., Notre Dame University and University of Trondheim.

  • Dowty, D. (1991). Thematic proto-roles and argument selection. Language, 67(3), 547–619.

    Article  Google Scholar 

  • Flickinger, D. (2002). On building a more efficient grammar by exploiting types. In S. Oepen, D. Flickinger, J. Tsujii, & H. Uszkoreit (Eds.), Collaborative language engineering (pp. 1–17). Stanford: CSLI Publications.

    Google Scholar 

  • Flickinger, D. (2011). Accuracy vs. Robustness in grammar engineering. In E. M. Bender & J. E. Arnold (Eds.), Language from a cognitive perspective: Grammar, usage, and processing (pp. 31–50). Stanford: CSLI Publications.

    Google Scholar 

  • Grimshaw, J. (2005 [1993]). Semantic structure and semantic content in lexical representation. In: Words and structure, Jane Grimshaw (pp. 75–89). Stanford, CA: CSLI.

  • Grimshaw, J., & Mester, A. (1988). Light verbs and θ-marking. Linguistic Inquiry, 19(2), 205–232.

    Google Scholar 

  • Hellan, L. (2005). Implementing norwegian reflexives in an HPSG Grammar. In St. Müller (Ed.), Proceedings of the 12th international conference on head-driven phrase structure grammar. Stanford: CSLI Publications. http://csli-publications.stanford.edu/hand/miscpubsonline.html.

  • Hellan, L. (2016). Light verb constructions as valency modeling. A study of Norwegian. Presented at SLE 2016, Naples. Submitted for proceedings as: Unification and selection in Light Verb Constructions. A study of Norwegian.

  • Hellan, L. (2017). A design for the analysis of bare nominalizations in Norwegian. In A. Malicka-Kleparska & M. Bloch-Trojnar (Eds.), Aspect and valency in nominals. Berlin: Mouton de Gruyter.

    Google Scholar 

  • Hellan, L. (2019a). When a verb has more than one valence frame. In: A. Malicka-Kleparska & M. Bloch-Trojnar (Eds.), Valency in verbs and verb related structures. Peter Lang.

  • Hellan, L. (2019b). Situations in grammar. In: J. Essegbey, D. Kallulli, & A. Bodomo (Eds.), The grammar of verbs and their arguments: A cross-linguistic perspective. Studies in African Linguistics. Berlin: R. Köppe.

  • Hellan, L. (2019c). TypeGram: A platform for cross-linguistic construction description. At: https://typecraft.org/tc2wiki/TypeGram.

  • Hellan, L., & Beermann, D. 2005. Classification of prepositional senses for deep grammar applications. In: V. Kordoni, & A. Villavicencio (Eds.), Proceedings of the second ACL-SIGSEM workshop on the linguistic dimensions of prepositions and their use in computational linguistics formalisms and applications. University of Essex.

  • Hellan, L., & Beermann, D. (2014). Inducing grammars from IGT. In Z. Vetulani & J. Mariani (Eds.), Human language technologies as a challenge for computer science and linguistics. Berlin: Springer.

    Google Scholar 

  • Hellan, L., Beermann, D., Bruland, T., Dakubu, M. E. K., Marimon, M. (2014). MultiVal: Towards a multilingual valence lexicon. In: LREC 2014.

  • Hellan, L., & Bruland, T. (2015). A cluster of applications around a deep grammar. In: Z. Vetulani, & J. Mariani (Eds.), Proceedings from the Language and Technology Conference (LTC) 2015, Poznan.

  • Hellan, L., & Dakubu, M. E. K. (2010). Identifying verb constructions cross-linguistically. In: Studies in the languages of the Volta Basin 6.3. Legon: Linguistics Dept., University of Ghana. http://www.typecraft.org/w/images/d/db/1_Introlabels_SLAVOB-final.pdf.

  • Hellan, L., Malchukov, A., & Cennamo, M. (2017). Introduction: Issues in contrastive valency studies. In L. Hellan, A. Malchukov, & M. Cennamo (Eds.), Contrastive studies in verbal valency. Amsterdam: John Benjamins Publ. Co.

    Chapter  Google Scholar 

  • Hovav, M. R., & Levin, B. (1998). Building verb meaning. In M. Butt & W. Geuder (Eds.), The projection of arguments (pp. 97–134). Stanford CA: CSLI.

    Google Scholar 

  • Jespersen, O. (1965). A modern english grammar on historical principles, Part VI, morphology. London: George Allen and Unwin Ltd.

    Google Scholar 

  • Karimi-Doostan, G. (2011). Separability of light verb constructions in Persian. Studia Linguistica, 65, 70–95.

    Article  Google Scholar 

  • Levin, B., & Hovav, M. (2005). Argument realization. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Melchuk, I. (2004). Actants in semantics and syntax I: actants in semantics. Linguistics, 42–1, 1–66.

    Article  Google Scholar 

  • Montague, R. (1974). The proper treatment of quantification in english. In: R. Thomason (Ed.), Formal philosophy. New Haven.

  • Müller, S. (2015). The CoreGram project: Theoretical linguistics, theory development and verification. Journal of Language Modelling, 3(1), 21–86.

    Article  Google Scholar 

  • Nivre, J., & Fang, C.-T. (2017). Universal dependency evaluation. In Proceedings of the NoDaLiDa 2017 workshop on universal dependencies (pp. 86–95). Gothenburg: Association for Computational Linguistics.

  • Pollard, C., & Sag, I. (1994). Head-driven phrase structure grammar. Chicago: Chicago University Press.

    Google Scholar 

  • Pompei, A., & Piunno, V. (2015). Light verb constructions. An interlinguistic analysis to explain systemic irregularities. In: Paper presented at SLE 2015, Leiden.

  • Ranta, A. (2011). Grammatical framework: Programming with multilingual grammars (pp. 8–9). Stanford: CSLI Publications, Center for the Study of Language and Information.

    Google Scholar 

  • Sag, I., & Wasow, T. (1999). Syntactic Theory A formal introduction. Stanford: CSLI Publications.

    Google Scholar 

  • Sag, I., Wasow, T., & Bender, E. (2003). Syntactic theory a formal introduction. Stanford: CSLI Publications.

    Google Scholar 

  • Smith, C. (1997). The parameter of aspect. Dordrecht: Kluwer.

    Book  Google Scholar 

  • Staal, J. F., & Frits, (1972). A reader on the Sanskrit grammarians. Cambridge: MIT Press.

    Google Scholar 

  • Tesnière, L. (1959). Éléments de syntaxe structurale. Paris: Klincksieck.

  • Vendler, Z. (1967). Linguistics in philosophy. Ithaca: Cornell University Press.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lars Hellan.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hellan, L. Construction-Based Compositional Grammar. J of Log Lang and Inf 28, 101–130 (2019). https://doi.org/10.1007/s10849-019-09284-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10849-019-09284-5

Keywords

Navigation