1 Introduction

Pluralia tantum are a type of noun that ought not to exist. Like many other linguistic terms, the term ‘pluralia tantum’ carries the implication that something is not quite right. Thus forms which are syncretic contradict the expectation that every grammatical function has a unique form, periphrastic forms involve multi-word expressions where elsewhere the system involves synthetic exponence, and so on. Pluralia tantum is similarly a label for nouns which have only a plural when, in some sense, this is not expected. English binoculars has no singular; this is worth noting, since it is not predictable, given that binoculars can denote one item or more than one. True, there are other nouns denoting items consisting of two significant parts which behave similarly (spectacles, trousers,…). But there are two reasons to focus on such nouns. First there are many English nouns equally denoting items consisting of two significant parts which are normal in respect of number: bicycle, bigraph, Bactrian camel, couple, duo... And second, there are languages with number systems roughly comparable to that of English in which the equivalent of binoculars is a normal count noun: Russian binokl’ (sg) and binokli (pl). Conversely, Russian sani ‘sledge’ is a plurale tantum noun, unlike its English equivalent. (Which prompts the question of how we talk about one sledge in Russian.) As we shall see, these examples lead to a wide variety of nouns, which lack the full range of number behaviour.

For investigating these different items, considering possible definitions will prove thought-provoking and helpful. It may seem obvious what pluralia tantum are, but while there are linguists with clear intuitions about what should count as pluralia tantum, these intuitions are not necessarily shared. The differences of view can be understood when we examine carefully the nature of the features involved. We shall move between definitions and data, homing in on a fuller typology, a more satisfying definition, and a better account of morphosyntactic features. Let us start with two definitions from dictionaries of linguistic terms:

pluralia tantum. Latin ‘plurals only’: i.e. nouns, like oats or trousers, which appear only in a plural form.” (Matthews 1997:284)

This is a simple definition, and better than most. Clearly it covers binoculars and the other examples we have noted so far. While this definition is a good starting point, much rests on how we interpret ‘plural form’.

In the same year, another dictionary of linguistic terms offered this definition:

plurale tantum A noun which is plural in form but singular in meaning, such as scissors, pants or binoculars. The plural of this term is pluralia tantum.” (Trask 1997:172)

This definition already illustrates some of the pitfalls. Scissors is not necessarily singular in meaning (see Sect. 2.1).Footnote 1

It is worth asking why pluralia tantum and comparable nouns deserve a new study. Part of the continuing interest is that pluralia tantum are clearly exceptional, yet in some instances partly predictable too (Sect. 5). They are missing something, but they are typically unlike other defectives (Sect. 5, Sect. 8.6). They are problematic for Full Interpretation (at least for strong versions), see Sect. 4.6. While morphologists and syntacticians have done most research on pluralia tantum, formal semanticists are becoming curious, as are others in cognitive science (Wisniewski 2009) and psycholinguistics (Bock et al. 2001; Nenonen and Niemi 2010; and Nickels et al. 2015). Most work in these wider areas, however, is limited to examples like scissors. This is a missed opportunity, given the variety of pluralia tantum. But this is part of a more general issue, the limited focus of much experimental research (see the recent survey by Acuña-Fariña 2016). Research on pluralia tantum has recently been extended to sign languages (Börstell et al. 2016).

The current paper analyses data from a range of languages, and includes pluralia tantum nouns within a broader landscape of nouns which have some sort of limitation in terms of the number feature. This entails both showing the rich variety of types of number-deficient nouns, of which scissors is the tip of a fascinating iceberg, and giving the wider context of limitations within number systems. Some include pluralia tantum under the umbrella ‘lexical plurals’ (Sect. 4.3); pluralia tantum require a specification in their lexical entry, and are ‘lexical’ in that sense. But they are to be strictly delineated from what is generally covered under the lexical plural heading by the characteristic of deficiency, as will be clear as our definitions are refined. I first introduce enough data to show that there is indeed a challenging set of questions here (Sect. 2). Then I show how these instances fit into a bigger picture, by taking a canonical approach. Useful previous work using this approach is laid out in Sect. 3, which leads to an analysis of what we might expect from a noun which is absolutely straightforward in terms of number (Sect. 4). This serves as a baseline against which we calibrate the real examples we find. The related issues of animacy and motivation are examined in Sect. 5; this takes us to the specific motivation which has been claimed for bipartites (like scissors), and from there it is a natural step to look at larger systems, those with duals and other number values (Sect. 6). In Sect. 7 we examine the ways in which pluralia tantum nouns are accommodated (how their lack of number form(s) is managed). Finally we bring together the typological range of pluralia tantum (Sect. 8), consider pluralia tantum constructions (Sect. 9) and draw the main conclusions (Sect. 10).

2 Pluralia tantum: an initial typology

To identify key types of pluralia tantum nouns, let us look at the basic relations between their semantics, syntax and morphology. What we might expect (and this will be the basis of our typology in Sect. 4) is that semantics, syntax and morphology would line up. Thus books denotes more than one entity, it is syntactically plural in that it takes plural agreement, and it is morphologically plural having the plural marker -s. With pluralia tantum nouns the three components typically do not line up neatly, which is the source of their theoretical importance. And this lack of alignment means that we must be careful in our use of terms, in particular what exactly we mean when we say a noun is ‘plural’. This need for care holds equally, whatever our theoretical position.

An approach using the traditional morpheme faces great challenges since, typically, pluralia tantum nouns involve different realizations of the plural (see Sect. 4.6), hence the special factor is not to be located in a plural morpheme but is a featural problem. We shall need to be clear about the type of feature involved, since it may be morphosemantic or morphosyntactic (Corbett 2012:49–50), a distinction examined carefully in Sect. 2.4. I shall therefore take an inferential-realizational approach (Stump 2001:1–30), and present the data as inflectional paradigms. There are good theoretical reasons for this (recapitulated in Corbett 2015a:147–148). Equally, for those who believe paradigms are merely a useful way of presenting data, we shall see that in this particular case they prove a very useful exploratory as well as presentational device.

2.1 Scissors: semantics vs (syntax and morphology)

Consider this example:

  1. (1)

    These scissors are blunt

The noun scissors is what most think of as a plurale tantum noun. It fits the definition given by Matthews since it is plural in form. It also takes plural agreement. Hence its morphology and syntax line up. It is a plurale tantum noun because on the obvious reading of examples like (1) it denotes a single entity, so that its semantics is out of step with its syntax and morphology. However, besides denoting one item, it can denote more than one, as is clear in (2):

  1. (2)

    All these scissors are blunt.

In both (1) and (2), syntax and morphology are aligned. In the instances where one item is denoted (as can be the case in (1)), the semantics does not align with the syntax and morphology.

While the English examples are well known,Footnote 2 there are many comparable but less familiar instances. Thus in Cicipu, a Benue-Congo language of northwest Nigeria, with about 20,000 speakers (in 1995), there is one plurale tantum noun. The noun à-húlá ‘name’ has a plural form, plural agreements, and this is so whether it denotes one name or more than one (McGill 2007:55, 2009:244); for examples see McGill (2009:145, 173, 187).Footnote 3

Definitions:

Not surprisingly scissors fits Matthews’ definition given above, but instances like (2) do not fit within Trask’s definition. However, while scissors is the most familiar type of example, there are other possibilities, which are more challenging for the definition, as we shall see.Footnote 4

2.2 Tsez xex-bi ‘child(ren)’: (semantics and syntax) vs morphology

The noun xex-bi ‘child(ren)’, from the Dagestanian language Tsez, provides an illuminating contrast with nouns like English scissors, as seen in Comrie (2001). For the essential background, consider first a regular Tsez count noun:

  1. (3)

    Regular Tsez noun besuro ‘fish’ (Comrie et al. 1998:6–7)

     

    singular

    plural

    absolutive

    besuro

    besuro-bi

    ergative

    besur-ā

    besuro-z-ā

    genitive 1

    besuro-s

    besuro-za-s

    dative a

    besuro-r

    besuro-za-r

    .

    .

    .

    .

    .

    .

    .

    .

    .

    aFor the relation of dative to (al)lative in Tsez see Comrie and Polinsky (1998:104).

    Note: a vowel is dropped before a following vowel.

These are just some of the case values in Tsez (Comrie and Polinsky 1998); the many further case values are indicated by the dots in (3) and (4). It is against that background that we should consider xexbi ‘child(ren)’ (Comrie 2001:381–383, discussed in Corbett 2007b:31–38).

  1. (4)

    Paradigm of Tsez xexbi ‘child(ren)’

     

    singular

    plural

    absolutive

    xex-bi

    xex-bi

    ergative

    xex-z-ā

    xex-z-ā

    genitive 1

    xex-za-s

    xex-za-s

    dative

    xex-za-r

    xex-za-r

    .

    .

    .

    .

    .

    .

    .

    .

    .

Comparing with besuro ‘fish’ in (3) above, we see that xexbi ‘child(ren)’ is plural in form (-bi and -za- are clear plural markers); it has a full set of plural case forms. The paradigm as presented in (4) suggests that xexbi ‘child(ren)’ is both singular and plural. The evidence comes both from semantics: xexbi ‘child(ren)’ may denote one or more children, and from syntax: this noun takes the appropriate agreements, singular for one and plural for more than one. (And at least in terms of number the noun \(\gamma^{\varsigma}\)anabi ‘woman/women’ behaves similarly.) To confirm this, we look at the agreement system of Tsez: agreement involves four gender values as well as two number values. Assignment of these gender values is by a combination of semantic and formal criteria. The main semantic assignment rules are included in (5), from Polinsky and Comrie (1999:110); more detail can be found in Plaster et al. (2013). Here then are the agreement forms for Tsez verbs:

  1. (5)

    Gender and number agreement markers in Tsez verbs

    Gender

    singular

    plural

    I (male humans)

    Øa

    b-

    II (female humans, and some inanimates)

    y-

    r-

    III (animals, and some inanimates)

    b-

    IV (residue)

    r-

    aSince (5) gives agreement affixes, the contrast with the overt affixes elsewhere in this paradigm means that using a Ø here makes good sense. Where full word forms are given, these will be glossed as bare stems (using [ ] to indicate the relevant values).

The singular-plural syncretisms in this system are tricky for our purposes. However, there is in addition the demonstrative, which distinguishes singular and plural, and Comrie (2001) shows that there are therefore clear diagnostics for singular versus plural.

  1. (6)

    The Tsez demonstrative howdu ‘this’ (Comrie 2001:380)

    Gender

    singular

    plural

    I (male humans)

    howda

    howziri

    II-IV all others

    howdu

Examples (7)–(9) include the forms for singular nouns (gender values iii and i) and for the plural of gender value i.

Tsez (Comrie 2001:381–383)

  1. (7)
    figure a
  1. (8)
    figure b
  1. (9)
    figure c

While the verb forms are the same in (7) and (9), the demonstrative distinguishes the two situations. Given these diagnostic environments we turn to xexbi ‘child(ren)’. Example (10) is exactly as we might expect:

  1. (10)
    figure d

Here more than one child is referred to and the agreements are plural (specifically gender i plural, which includes groups consisting only of males or of a mixture of males and females). Now consider what happens for one child. There are two possibilities. Traditional usage is as follows:

  1. (11)
    figure e

The combination of agreements shows that we are dealing with gender iii singular: howdu ‘this’ is singular, the b- marker on the verb could indicate gender iii singular or gender i plural, so the only consistent specification is gender iii singular.Footnote 5 This is understandable when we note that cross-linguistically, ‘child’ is often treated as not quite male human or female human. The noun xexbi ‘child’ itself is unchanged. Younger speakers, on the other hand, have gender i:Footnote 6

  1. (12)
    figure f

In (12) there is again one child in question, and the agreements are singular. More generally, the agreements are singular or plural as appropriate, but the forms of the noun stay the same (the noun takes plural inflectional forms and the appropriate case value). Given the substantial inventory of case values in Tsez, xexbi ‘child(ren)’ has an appropriate inventory of case forms, but with just one number form for each of them; the inflectional forms it has are recognizably plural. This is important: the absolutive has the unmistakable plural marker -bi, and all the oblique forms have the plural augment -za-. This noun has therefore more evidence for its plurality than, say, scissors or oats. However, unlike nouns like English scissors, it takes singular and plural agreements according to the meaning.

As pointed out earlier, \(\gamma^{\varsigma}\)anabi ‘woman/women’ has the same behaviour in terms of number; but importantly its gender agreements are different. For single referents it takes gender ii (like all female human nouns), and when used as a plural it takes non-male-human agreement (again like all female human nouns). Thus its gender agreements are as expected. This demonstrates, therefore, that the number problem which this noun shares with xexbi ‘child(ren)’ cross-cuts the gender issue.Footnote 7

The Tsez data are significant, because in addition to their relevance for number, they are hard to classify in terms of their morphological behaviour. In Corbett (2007b:31–38) it is argued that xexbi ‘child(ren)’ falls mid-way between canonical syncretism and canonical deponency, and hence there is no established term for such examples.

Definitions:

the Tsez examples fit Matthews’ definition well (once it is extended from ‘plural form’ to ‘plural forms’), in that they have only plural forms, in the morphological sense. Yet this is only half the story, since the forms we are focussing on (those denoting a single entity), fail to control plural agreement.

We return in Sect. 2.4 to the way in which the paradigms are presented. A related phenomenon, but more complex than the Tsez type, is found in Archi (analysed in Sect. 4.5).

2.3 Russian galife ‘riding breeches’: semantics vs syntax vs morphology

Russian nouns distinguish six indisputable case values, found in both singular and plural. (Further, more contentious values are discussed in Corbett 2012:200–222.) Nouns fall into four main inflection classes; I illustrate with nouns of inflection class II, since this class most readily demonstrates the need for six case values.

  1. (13)

    A normal and a plurale tantum noun in Russian

     

    bol’nica ‘hospital’

    nožnicy ‘scissors’

    singular

    plural

    singular

    plural

    nominative

    bol’nic-a

    bol’nic-y

    nožnic-y

    accusative

    bol’nic-u

    bol’nic-y

    nožnic-y

    genitive

    bol’nic-y

    bol’nic

    nožnic

    dative

    bol’nic-e

    bol’nic-am

    nožnic-am

    instrumental

    bol’nic-ej

    bol’nic-ami

    nožnic-ami

    locative

    bol’nic-e

    bol’nic-ax

    nožnic-ax

The noun bol’nica ‘hospital’ has the expected singular-plural opposition, with the plural forms used to refer to more than one entity. There are various syncretisms, but it is clear which is the set of plural forms. Note that for our definitions we need to refer to ‘plural forms’ here, unlike definitions framed for English. Clearly nožnicy ‘scissors’, has plural forms (comparable to those of bol’nica ‘hospital’) but no singular forms. It is similar to English scissors, in that it can be used of one entity or more than one, and it takes plural agreements.

Russian also offers a less common type of plurale tantum. Consider these two nouns:

  1. (14)

    Uninflecting nouns in Russian

     

    pal’to ‘coat’

    galife ‘riding breeches’

    singular

    plural

    singular

    plural

    nominative

    pal’to

    pal’to

    galife

    accusative

    pal’to

    pal’to

    galife

    genitive

    pal’to

    pal’to

    galife

    dative

    pal’to

    pal’to

    galife

    instrumental

    pal’to

    pal’to

    galife

    locative

    pal’to

    pal’to

    galife

These are unusual paradigms. Pal’to ‘coat’, like many similar nouns, does not inflect. Yet it is not defective. It can stand in all syntactic environments appropriate for nouns. And where agreement is required, the agreement targets all agree appropriately. We may treat nouns like pal’to ‘coat’ as constituting a separate inflection class (as in Corbett and Fraser 2000:308), or we may treat them as different from inflecting nouns. The essential point is that their inflectional behaviour (rather their lack of it) is a morphology-internal matter. It has no repercussions in syntax.

We can now see the significance of galife ‘riding breeches’. Like pal’to ‘coat’, galife ‘riding breeches’ can appear in syntactic environments requiring each of the case values. Yet it is also a plurale tantum noun, in the sense that it takes only plural agreements (Isačenko 1962:77), as here:

  1. (15)
    figure g

This fits a pattern: several other nouns in Russian, denoting bipartites, are pluralia tantum nouns of the scissors type (having plural morphology and syntax, as in Sect. 2.2). The effect of the pattern can be seen in the treatment of the borrowing džins-y ‘jeans’, where Russian plural morphology is added to the original plural form (-y is the nominative/accusative plural, and there is a full plural paradigm), making it a plurale tantum noun.Footnote 8 This means that galife has the common semantics-syntax mismatch (it can denote a single object, but takes plural agreement) and in addition its morphology does not match the syntax (it does not show plural inflection, indeed any inflection).

Definitions:

the key point is that galife ‘riding breeches’ indeed has the plural only, but this relates to its agreement requirement and not to morphological form.

This restricted application of the definition is reasonable, since we typically understand that definitions apply to the extent that is possible. Given that galife ‘riding breeches’ does not inflect, it is a plurale tantum noun to the extent that this is possible (in terms of agreement).Footnote 9 While in Russian it is exceptional to be uninflecting, it can be regular to be uninflecting (as we shall see in Sect. 4.5), and then there can still be pluralia tantum nouns.

2.4 Types of feature: morphosemantic and morphosyntactic

Let us take stock. We have encountered three types of pluralia tantum nouns, summarized in Table 1.

Table 1 Summary of the types of pluralia tantum nouns in Sect. 2

These examples have already presented difficulties of definition. While it seemed straightforward to say that pluralia tantum nouns have only the plural form, with that raising questions for expected count nouns like scissors, less so for non-count nouns like oats, it is not self-evident how to demonstrate that scissors is indeed the plural. Linguists point to the agreements required, though this does not relate directly to the form. And then, in some of the data presented above, the layout of the paradigms was surprising. It is time to confront these issues.

Let us take up a Russian example, since the existence of several case forms makes it easy to recognize singular and plural forms.Footnote 10 The Russian sani ‘sledge’ has only forms which are recognizably plural in their morphology. However, it is not a problem to refer to a single sledge, as with the phrase odn-i san-i one-pl sledge-pl ‘one sledge’ (well attested in the Russian National Corpus). Note that the numeral stands in the plural (see Sect. 4.6). There are two ways we might represent this noun:

  1. (16)

    Representations of Russian sani ‘sledge’

     

    morphosemantic

    morphosyntactic

    singular

    plural

    singular

    plural

    nominative

    san-i

    san-i

    san-i

    accusative

    san-i

    san-i

    san-i

    genitive

    san-ej

    san-ej

    san-ej

    dative

    sanj-am

    sanj-am

    sanj-am

    instrumental

    sanj-ami

    sanj-ami

    sanj-ami

    locative

    sanj-ax

    sanj-ax

    sanj-ax

If we ask how this noun realizes the grammatical meanings singular and plural, that is, how the semantics and morphology are linked, the answer is the morphosemantic representation on the left of (16). Each possibility can be realized, but the outcome is the same for singular and plural. So long as we consider only the “internal” picture, this is an appropriate representation. The phonological shape of the inflections, compared with regular nouns, makes it clear that the forms are what would elsewhere be used only for the plural. If we turn to the external requirements of sani ‘sledge’, then the morphosemantic representation is insufficient. Even when denoting a single object, this noun requires plural agreement, as mentioned above: odn-i san-i one-pl sledge-pl ‘one sledge’. For the external or morphosyntactic nature of sani ‘sledge’, the representation on the right is appropriate.

We tend to assume that the morphosemantic and morphosyntactic feature values associated with any paradigm cell are the same. In the canonical world they are. In many actual instances they are too, but there are also numerous examples where they are not. This possible discrepancy is what leads to some of the confusion with pluralia tantum nouns, and we shall see other examples where the distinction is needed below (Miya in Sect. 5 and Bayso in Sect. 6.4). For previous discussion of morphosemantic versus morphosyntactic features, see Corbett (2012:49–50) and Spencer (2013:219–232), and on their relation to the contextual-inherent distinction, see Corbett (2012:67). Note that in terms of Stump’s paradigm types, the difference between morphosemantic and morphosyntactic features would require splitting his content paradigm, which is the ‘interface with syntax and semantics’ (Stump 2016:104; see also Stump 2007).

In the earlier examples, I gave morphosyntactic representations. These are striking both in the case of Tsez xexbi ‘child(ren)’, where the difference compared with Russian sani ‘sledge’ in (16) was shown clearly, and in the case of Russian galife ‘riding breeches’ (14), where an uninflecting noun has specific morphosyntactic requirements. I continue to give morphosyntactic representations below, except where specified otherwise.

2.5 Prerequisites for a fuller typology

The examples so far demonstrate that the typology of pluralia tantum nouns is more extensive than most researchers have allowed for. Here I preview what will be required for the fuller account below.

2.5.1 Further criteria required

In brief, we need to lay out the criteria which allow us to calibrate the examples already discussed, and see whether these criteria allow for further types. To keep the scope manageable, I will concentrate on instances which are clear in terms of the lexical entries involved. There are monographs to be written on the lexicographical problems of related lexical entries, where an arguable instance of a plurale tantum noun may be related to a noun with a full paradigm. For instance, in Russian we find vybory ‘election’ (Soboleva 1984:67), which has only the plural in this sense. It is related to the noun vybor ‘choice’, which has a full paradigm. And there are many others, some harder to recognize, and the choices are not clear-cut. The issue was recognized by Wackernagel (1920:86–88), and numerous following researchers;Footnote 11 thus Payne and Huddleston (2002:334–338) include relevant examples from English where polysemy is the appropriate analysis. Where such an analysis is implausible, my approach would be to say that in a default inheritance lexicon there can be two entries, vybor1 and vybory2 in the case above. These inherit some of the same information (including the stem and the inflectional type) but they have different lexical indices.Footnote 12 In this paper, I concentrate on setting out the extremes, particularly the clear pluralia tantum, and so will not go further on this point here.

2.5.2 The typology of number

There has been progress on the typology of number, and some of this research will be cited as we work through the different aspects of pluralia tantum. For a general survey of number see Corbett (2000), with an extensive bibliography. For a helpful annotated bibliography see Acquaviva (2014), and for fine surveys see Acquaviva (2017) and Moravcsik (2017).

There are two key components of the typology. First there is the featural part, the values of number. Here we must remember that there are more number values than just singular and plural, and that these other values (of which the most common is the dual) can illuminate what is going on with pluralia tantum. And second, we must be clear about the part of the noun inventory involved in any generalization. There is typically a distinction between nouns for which the number system is clearly relevant, the count nouns, and those which do not make number distinctions, the non-count or mass nouns. Cross-linguistically, they are distributed according to the Animacy Hierarchy, which we examine in Sect. 5. As we noted in Sect. 1, nouns of both types can be pluralia tantum, count nouns like trousers and non-count like oats.

2.5.3 The notions “canonical number” and “canonical noun”

From what we have already seen, it is evident that different criteria are in play. We need a typology of features and of nouns (both individual nouns and groups of nouns), and even beyond nouns (Sect. 9). It will be helpful to have baselines from which to calibrate the variation, and here the notions “canonical number” and “canonical noun” will prove invaluable (Sect. 3).

3 A canonical approach to pluralia tantum

The central strategy of Canonical Typology has been summarized by Stump (2016:31):

  1. (a)

    to identify the dimensions of possible cross-linguistic variation in a given phenomenon and the logical endpoints of these dimensions; and

  2. (b)

    to situate canonical types at the extremes of these dimensions, calibrating attested phenomena according to the degree and direction of their deviation from these canonical types.

A valuable effect of our strategy is that: ‘the canonical approach breaks down complex concepts in a way that clarifies where disagreements may lie between different linguists and theoretical frameworks.’ (Nikolaeva 2013:100). Canonical instances, those that are clear and indisputable, must match a full set of criteria. It follows that such instances may well be infrequent, hence canonical is to be clearly distinguished from prototypical. Canonical instances can even be non-existent; this is not a problem: we are adopting an axiomatic approach, designed to ensure that we recognize and investigate the full range of the phenomena we wish to account for, and that we have a metalanguage for describing them. This canonical approach is justified by utility and results. Recent research within this framework includes Kwon and Round (2015), Forker (2016a), Corbett and Fedden (2016), Audring (2017), Corbett et al. (2017), Fedden and Corbett (2017, 2018), Round and Corbett (2017), Stump (2016, 2017), Kwon (2017), Fedden et al. (2018) and Evans et al. (2018); a working bibliography can be found on the Surrey Morphology Group website.Footnote 13 A good deal has already been done within this approach, that we can use directly. We take the research on canonical features that is relevant to number (Sect. 3.1), the work on canonical parts of speech, as relevant to nouns (Sect. 3.2), and integrate them (Sect. 3.3).

3.1 Canonical number

First consider canonical morphosyntactic features in general. These have been described in terms of two overarching principles, and these principles in turn cover ten converging criteria (Corbett 2012:156–199):

Canonical morphosyntactic features—Principle I (evidence from form)

  • Features and their values are clearly distinguished by formal means (and the clearer the formal means by which a feature or value is distinguished, the more canonical that feature or value).

This principle is obviously relevant to our needs; a feature which is realized through various formal means (such as number in Russian) is more canonical than one where it is limited to agreement, as we shall see in Sect. 4.5.1.

Canonical morphosyntactic features—Principle II (syntax)

  • The use of canonical morphosyntactic features and their values is determined by simple syntactic rules.

This principle is less central for our concerns, but it will still prove useful; simple syntax includes consistent agreement, for which see Sect. 4.4 and Sect. 8.4.

Besides these two, for the realization of morphosyntactic features there is an additional general principle of inflection (Corbett 2012:197–198):

Canonical morphosyntactic features—Principle III (morphological realization)

  • Canonical morphosyntactic features and their values are expressed by canonical inflectional morphology.

This principle too is of use, in distinguishing some less obvious examples; we noted the non-canonical morphological behaviour of Russian galife ‘riding breeches’, which fails to inflect (Sect. 2.3).

It has been argued that number is the morphosyntactic feature that comes closest to the canonical idealization of a morphosyntactic feature (Corbett 2013:57). The argument depends, however, on the interaction with parts of speech, to which we now turn.

3.2 Canonical noun

We need the notion of a canonical part of speech (or ‘word class’ or ‘lexical category’). We may think of a part of speech primarily in terms of its syntax, and also in terms of its semantics and its morphology. In the canonical situation, equivalent to what Spencer calls the ‘morpholexically coherent lexicon’, these three specifications are fully aligned (Spencer 2005:102). Specifically for nouns, this means that canonical nouns denote referential entities, head an appropriate syntactic phrase and inflect for the relevant features.

3.3 Interaction: canonical number and canonical noun

Intuitively, the greater the degree of orthogonality shown by a feature, with parts of speech and with other features, the clearer the argument for recognizing a feature, and the more canonical it is. The arguments are laid out in Corbett (2013:52–57) and discussed in Corbett and Fedden (2016:499–503); they are reported briefly here.

Canonical morphosyntactic features—Principle IV (interaction with parts of speech)

  • Canonical morphosyntactic features and canonical parts of speech are fully orthogonal.

This is a general principle, whose implications can be spelled out in four criteria (Corbett 2013:52–57):

Canonical morphosyntactic features—Principle IV: Criterion 1: EXCLUSIVENESS

  • A lexical item belongs to just one part of speech;

    a value belongs to just one feature.

This criterion will be useful; in particular, it leads us to be wary of analyses which would treat pluralia tantum nouns as being specified for gender rather than number (discussed in Sect. 4.6).

Canonical morphosyntactic features—Principle IV: Criterion 2: EXHAUSTIVENESS

  • Every lexical item of every part of speech has available to it all values of all features. (Alternatively: Every feature value applies to all lexical items.)

In the canonical instance, a clear motivation for postulating a morphosyntactic feature is that it generalizes across a large number of items. If this is the case, without the feature we would need to multiply the number of lexical items according to the values of the feature, and would miss generalizations. This criterion is vital for calibrating potential pluralia tantum nouns. A canonical noun “should” have all values of number available to it; as we shall see, there are different ways of being non-canonical here (see particularly Sect. 6.3 on Yup’ik).

Canonical morphosyntactic features—Principle IV: Criterion 3: OPEN AND CLOSED CLASSES

  • All classes are closed, except the class of lexical items.

This idealization is necessary for the logic of the system, but it will not figure large in our discussion. It underlies the intuitive distinction between part of speech and morphosyntactic feature. In the canonical instance, features, values and parts of speech are all closed classes, and only lexical items constitute an open class.

Canonical morphosyntactic features—Principle IV: Criterion 4: COMPOSITIONALITY

  • Given the lexical semantics of a lexical item and a specification of its feature values, the meaning of the whole is fully predictable.

We separate out features and their values because of the regularity involved. In the canonical instance, if we can specify what a noun means, and what plural means, we know what the plural of that noun means. This principle is vital for delineating our problem. As well as being relevant for pluralia tantum nouns, it distinguishes various types of lexical plurals from semantically regular plurals.

4 The number behaviour of a canonical noun

We can put together two previously established notions: that of canonical number (as an instance of a canonical morphosyntactic feature) and that of canonical noun (as an instance of a canonical part of speech). This will provide what we require for a typology of pluralia tantum nouns. Given different deviations from canonicity, we can situate pluralia tantum nouns (of different types) within an overall scheme. We discuss the five criteria (Sects. 4.14.5) and then contrast canonical number with canonical gender (Sect. 4.6).

4.1 Exhaustiveness

Every lexical item of every part of speech has available to it all values of all features. (alternatively: every feature value applies to all lexical items)

(Criterion IV.2 above, Corbett 2013:54)

This gives a canonical point from which to calibrate. Items can be non-canonical by lacking one or more number values, and this in two main ways. First, the missing values may be predictable or not. A predictable gap (for instance, one consonant with the Animacy Hierarchy, Sect. 5) is closer to canonical than an idiosyncratic gap. And second, we should ask how many values are missing. In a two-value system (singular and plural), one missing is equivalent to one present, but in larger systems (for instance, including a dual), there can be different degrees of deviation from exhaustiveness.

4.2 Semantic-syntactic-morphological alignment

In the canonical world, an item’s semantic, syntactic and morphological properties all line up (Corbett 2013:52, Sect. 3.2 above). This alignment holds also at the level of their feature values. The importance of this criterion to our typology should already be clear, from our initial examples in Sects. 2.12.3. The matching of features is often assumed, but we noted significant mismatches between morphosemantic and morphosyntactic feature values in Sect. 2.4.

4.3 Semantic compositionality

Given the lexical semantics of a lexical item and a specification of its feature values, the meaning of the whole is fully predictable.

(Criterion IV.4 above, Corbett 2013:56–57)

Since the canonical use of number is compositional, as in this criterion, the various types of lexical plurals are all non-canonical; see Acquaviva (2008, 2016a), Gardelle (2016), Lammert (2016) and Lauwers and Lammert (2016), among others. This criterion gives no reason to expect that the values should be anything other than “equal”, and hence equally distributed. Of course, we know that this is not what we find. We need to look at the various situations where there is an imbalance, to see whether there is less than totally free choice of number, and hence the combination of number value and lexical meaning is not fully compositional.Footnote 14

4.3.1 Dominance

The notion of dominance, and its importance for psycholinguistic research, was introduced in Baayen et al. (1997:97). They refer to contrasting distributions where ‘Either the singulars were much more frequent than their plurals (singular-dominant) or the plurals had a much higher surface frequency than their singulars (plural-dominant pairs).’Footnote 15 An illustration of such distributions is given in (17). In the CELEX database (Baayen et al. 1995), the lexemes neck and lip are approximately matched: each occurs around 80 times per million words. Yet their distribution with respect to number is dramatically different:

  1. (17)

    English examples per million words (Davis et al. 2003:430)

     

    singular

    plural

    neck

    72

    7

    lip

    17

    61

Like most nouns, neck occurs more frequently in the singular. The frequencies for lip are different: it occurs much more frequently in the plural; it is said to be ‘plural dominant’. Note the wording ‘much more frequent’ in the paper cited. We must bear in mind that overall, the singular occurs more frequently than the plural, in language after language. Thus for a noun to appear in the plural more often than in the singular, by whatever degree, is worthy of note, while for one to appear more often in the singular is of interest only when it is in excess of the general preference for singular. Evidence for this general pattern is found in Greenberg (1966:31–32), who gives data on French, Russian, Latin and Sanskrit, and the list is expanded in Corbett (2000:281–282) to include Slovene and Upper Sorbian. For those (Indo-European) languages the singular is used typically in over 70% of the instances in running text (somewhat lower in Upper Sorbian); 70% is also the approximate figure reported for English (Haskell et al. 2003:125), while 79% is reported for Finnish texts (Räsänen 1979:24), and 91% for the Dagestanian language Hinuq (Forker 2016b:93).

We should look beyond basic systems, and think of dominance with regard to values besides the plural. Here we consider dual dominance. The status of the dual varies considerably cross-linguistically; in languages with a facultative dual, like Slovene, the number of instances in running text is very low (Corbett 2000:281–282).Footnote 16 A language with an excellent claim to having an obligatory dual is Sanskrit.Footnote 17 The statistics in (18), derived from Lanman (1872:583), give a clear picture of the distribution of the number values (note that Lanman includes nouns, adjectives, participles and nominal forms from pronominal stems):Footnote 18

  1. (18)

    Distribution of number on nominals in Sanskrit (based on Lanman 1872:583)

     

    N

    singular (%)

    dual (%)

    plural (%)

    masculine

    57,950

    72.9

    5.7

    21.5

    feminine

    15,909

    55.2

    6.0

    38.8

    neuter

    19,418

    75.2

    0.6

    24.3

    total

    93,277

    70.3

    4.7

    25.0

    Note: percentage figures are rounded, and may not sum to 100

The total number of examples is substantial. Yet even though the dual is obligatory, it is relatively infrequent. Lanman’s remarkable corpus work sets a baseline against which we can measure the anomalous behaviour of particular nouns. Two intriguing examples are given in (19), this time with data from the Digital Corpus of Sanskrit (comprising some four million words, tagged manually).Footnote 19

  1. (19)

    Two Sanskrit nouns with interesting frequency distributions for number

     

    N

    singular %

    dual %

    plural %

    baseline (Lanman)

    93,277

    70.3

    4.7

    25.0

    karna ‘ear’

    2,173

    89.2

    7.9

    2.9

    upānah ‘sandal’

    50

    12.0

    86.0

    2.0

Compared with the baseline, karna ‘ear’ has a surprising distribution, but not what we might have expected. It does indeed have a raised frequency of dual use, but this is overshadowed by its very low plural frequency. On the other hand, upānah ‘sandal’ is clearly dual dominant. These examples demonstrate that there is still much to be learned about individual distributions, particularly in systems with more than just a singular-plural opposition.

Besides its importance for psycholinguistic research, value dominance has more obvious effects, of which we discuss three.

The first is resistance to change. Following Greenberg (1966:29), Tiersma (1982:841) showed that plural dominant nouns can maintain irregularities in the plural, when these are regularized with other nouns. He discussed the phenomenon in terms of ‘local unmarkedness’. Consider the native irregular plurals of English. Here we find men, women, children, oxen, geese, mice, lice, feet and teeth. It is well known that frequent items preserve irregularity; here, however, it is not the frequency of the lexeme, rather the high frequency of the plural as opposed to the singular.Footnote 20

It is not just individual exceptions that are involved in resistance to change. In Serbo-Croat,Footnote 21 there is an ongoing change in one inflection class affecting many hundreds of nouns. Nouns which otherwise inflect like prozor ‘window’ (plural prozori) have an augment in the plural, according to various conditions, a main one being that being monosyllabic favours having the augment: grad ‘city’, plural: grad-ov-i, film ‘film’, plural: film-ov-i. There are various complications, but one set of exceptions to the generalization is relevant here. Ethnonyms retain the short plural (without augment), for instance Kurd ‘Kurd’, plural Kurdi. Units of measurement behave similarly, as in volt ‘volt’, plural volti, as do a few isolated nouns like zub ‘tooth’ plural zubi. All of these share the property that they are more likely to appear in the plural. For details and sources see Baerman et al. (2017:93–98).

The second effect is the tendency for the dominant form to be borrowed into other languages, also pointed out by Tiersma (1982). Thus Russian has borrowed English rail (for railways) in the plural as rel’s, but treats this as the singular and gives it the regular plural rel’s-y. The role of dominance, among other factors, is demonstrated in code mixing by Hakimov (2016); his case study is the plural marking of German nouns inserted into Russian speech.

And third, there is the pattern ofmarking. Consider these examples from the Cushitic language, Arbore:

  1. (20)

    Arbore (Hayward 1984:159–183, Corbett 2000:17)

    singulative

    general

    plural

    keléh ‘gelded goat(s)’

    keleh-mé ‘gelded goats’

    garlá ‘needle(s)’

    garlá-n ‘needles’

    tiis-in ‘a maize cob’

    tíise ‘maize cob(s)’

    lassa-n ‘a loaf’

    lássa ‘bread’

    nebel-inté ‘a hen ostrich’

    nebel ‘ostrich(es)’

For some nouns, Arbore has general number forms (the number value is unspecified, see Corbett 2000:9–19), and this may be contrasted with a specified number value, singular or plural, depending on the noun. And even in more straightforward singular-plural systems, we may find examples where there is a base form and then one of the number values has a marker. Haspelmath and Karjus (2017) argue that there is a crosslinguistic tendency for such asymmetries to follow usage frequency. In our terms, plural dominant nouns are more likely to have a singulative (a singular with overt morphologically marking), while singular dominant nouns (the usual type) are more likely to have the normal plural marking. For statistical data on Middle Welsh, consistent with this view, see Nurmio (2017:70–71), and for Lopit, an Eastern Nilotic language, see Moodie (2016).

Given the variation in the proportions of number values we have seen, some quite extreme, we should ask whether full compositionality is maintained. Within the variation, it is important to retain the distinction between pluralia tantum, for which the singular is not available, and plural dominant nouns, for which the singular is rare, but fully available. For instance, for real world reasons Italian spaghetti appears frequently in the plural. In appropriate circumstances, say discussing how one strand of spaghetti breaks, the singular spaghetto is current and normal.Footnote 22

4.3.2 Value orientedness

Value orientedness is a semantic restriction, pointed out for Russian by Polivanova (1983) and Kulikov (2004:127), following work by Igor’ Mel’čuk. There are nouns which have all the appropriate number forms, but do not use them in an unconstrained way. Rather, one value is used by default, while the other is found only ‘under pressure’, so to speak. Note that the default value, the one to which the particular noun is oriented, varies from noun to noun:

  1. (21)
    figure h
  1. (22)
    figure i

Each of these nouns has forms for both values; however, unless there is specific contextual pressure (what Polivanova terms an ‘arithmetical context’), repa ‘turnip’ appears in the singular, and orgurcy ‘cucumbers’ in the plural, as in the same neutral context in (21) and (22). It is not just that one value is more likely (as with dominance), but rather that there are contextual limitations on one value (which is not the case with the items in (17) showing number dominance). For further discussion and additional data see Mel’čuk (1979) and Ljaševskaja (2004:28–30, 50, 227–236).

To return to the general criterion of compositionality, these examples are some way from canonical, while pluralia tantum represent greater non-canonicity. Examples of value orientedness are not fully compositional; rather it is as though ‘general number’ (see discussion of (20)) intrudes for some senses of these nouns. Note again that these nouns argue for a featural account (not a morphemic one) since the same considerations apply across the different morphological realizations (Corbett 2015a:148). These nouns are on the border of number differentiability, showing that this count-mass border can be gradient.

4.4 Syntactic consistency

Syntactic consistency falls under the general scope of Principle II (see Sect. 3.1), namely that the use of canonical morphosyntactic features and their values is determined by simple syntactic rules. But there is a more specific criterion that is particularly relevant, namely that in agreement it is canonical for features to have matching values rather than non-matching values (Corbett 2006:24). Given the external requirements of the noun, in the canonical world that is all that is required by the syntax: a feminine plural noun is feminine plural “whoever is asking”. However, there are various types of noun for which this is not the case: their agreement specification varies according to the agreement target (which is not simple syntax). Such nouns are termed hybrids. Familiar examples are English committee and family, which take only singular attributive modifiers but may take singular or plural for other agreement targets (Corbett 2012:94–105, 2015b; Smith 2017).

We might expect that pluralia tantum would be consistently plural. But there are instances which are hybrids. Thus in Finnish, according to Karlsson (1968) a singular predicate is normal with plural proper nouns like Yhdysvallat ‘the United States’, Filippiinit ‘the Philippines’. But in attributive position, he reported both singular and plural adjectives, with singular on the increase. Hannu Tommola (personal communications, 5 June 2017) states that the singular is now normal, though occasional plurals are found in attributive position.Footnote 23 This shows the path, the Agreement Hierarchy (Corbett 2006:206–237), along which a hybrid plurale tantum can be regularized in terms of its agreements, and become singular in all but morphological form. We expect first the anaphoric pronoun to take the singular, then the relative pronoun, next the verbal predicate, and only finally the attributive modifier (the stage being reached by Finnish).Footnote 24

A truly remarkable example of an item with inconsistent agreements is Modern Hebrew be’alim ‘owner(s)’, as analysed by Landau (2016). The singular ba’al is ambiguous between ‘husband’ and ‘owner’. In the plural, if we focus on the ‘owner’ meaning, the form be’alim ‘owner(s)’ can have a singular or plural referent.

Modern Hebrew (Landau 2016:984)

  1. (23)
    figure j
  1. (24)
    figure k

The plural agreements in (24) are as expected. Yet in (23) we find singular agreement, for a single referent, even though the noun is plural in form. So far, this is similar to the Tsez situation, except that the lexeme in question is split: there is a singular form ba’al, but with rather different properties.Footnote 25 In addition, however, be’alim ‘owner’ is a hybrid: it does not take consistent agreements, rather its agreements vary according to the agreement target. In what follows, for simplicity, I present only those of Landau’s examples which are relevant for a single referent. The first examples show agreement of different targets, namely attributive modifier and verb:

Modern Hebrew (Landau 2016:985)

  1. (25)
    figure l

Example (25), with syntactic agreement of the attributive modifier and semantic agreement of the verb, was considered ‘slightly off’ by some speakers. Yet it is much better than (26), with the reverse pattern of agreement:

Modern Hebrew (Landau 2016:985)

  1. (26)
    figure m

This relation between examples like (25) and (26) was confirmed in a web search of examples. The pattern is consistent with the Agreement Hierarchy, since we find syntactic agreement within the noun phrase and semantic agreement of the predicate. We should check the personal pronoun:

Modern Hebrew (Landau 2016:987)

  1. (27)
    figure n

Example (27) shows that the personal pronoun must show semantically justified agreement, and this too is in accord with the Agreement Hierarchy. Finally we find stacking of modifiers:

Modern Hebrew (Landau 2016:1005)

  1. (28)
    figure o
  1. (29)
    figure p

Example (28) is a textual example, with syntactic agreement inside semantic agreement. The reverse, as in (29), is not acceptable.

Be’alim ‘owner’ shows a combination of non-canonical behaviours. I have stressed its status as a hybrid; it does not take consistent agreements, rather the agreement specification depends in part on the target, and the pattern of possible agreements is in accord with the Agreement Hierarchy. It is not a straightforward plurale tantum noun, in that the singular exists, albeit with rather different properties.

4.5 Morphological distinctiveness

Recall from Sect. 3.1 ‘Canonical morphosyntactic features—Principle I’, according to which morphosyntactic features and their values are clearly distinguished by formal means. They have dedicated forms, they are distinguished across other features and their values, furthermore they are distinguished consistently across parts of speech and across lexemes within the parts of speech (Corbett 2012:155–167). This is clearly relevant to some of the examples we have discussed. It allows us to differentiate typologically within the pluralia tantum, and to situate other non-canonical behaviours in respect of number. We separate out those items that simply fail to inflect (Sect. 4.5.1), those which do inflect but in a non-canonical (non-distinctive) way (Sect. 4.5.2), and those with split paradigms (Sect. 4.5.3).

4.5.1 Uninflecting nouns

We have already seen uninflecting nouns as pluralia tantum, for instance, Russian galife ‘riding breeches’ (Sect. 2.3). This noun lacks any sign of number morphology, but is morphosyntactically plural, requiring plural agreement. This is a lexical exception, in a language where nouns typically inflect for number. We will see similar examples in Kiowa (Sect. 6.2). We now examine the numerous pluralia tantum nouns in Walman, a Torricelli language of Papua New Guinea. In Walman few nouns mark number, and there are many nouns which behave like Russian galife ‘riding breeches’. The data are from Dryer and Brown (2015, and personal communications). Consider first what we find with ordinary count nouns:

Walman (Dryer and Brown 2015)

  1. (30)
    figure q
  1. (31)
    figure r
  1. (32)
    figure s
  1. (33)
    figure t

There are four clearly distinct agreement markers; these appear on various targets, including personal pronouns, sometimes as prefixes, sometimes as suffixes, even as infixes. Note that in these examples there is no inflection on the noun; this is fully expected in Walman, since only a minority of nouns (mainly denoting humans) have plural forms. However, the agreement markers distinguish singular and plural. Given this background, the following are the significant examples:

  1. (34)
    figure u
  1. (35)
    figure v

In (34) and (35) we see no noun inflection (as expected) and we find plural agreement on two agreement targets, including subject agreement on the verb. The nouns apar ‘bed’ and tim ‘dew’ are pluralia tantum, just in terms of agreement. The following examples demonstrate object agreement on the verb:

  1. (36)
    figure w
  1. (37)
    figure x

Again the agreements are all plural, including those of the numeral (see Sect. 7.1 on plural agreeing numerals). Note that wi ‘hand’ (like similar body parts) is a plurale tantum noun even when denoting one item. Dryer and Brown (2015) have identified more than 80 pluralia tantum nouns, more indeed than they have found nouns of masculine gender.

Definitions:

The situation in Walman and similar languages is important for rebalancing the definition of pluralia tantum. The Walman examples are pluralia tantum to the extent that is possible, that is, morphosyntactically (for agreement). They are similar to Russian galife ‘riding breeches’, but while that is exceptional within Russian morphology, the Walman examples are morphologically regular, since Walman nouns typically do not inflect for number.

4.5.2 Non-distinctive inflection

An extreme type here is Tsez xexbi ‘child(ren)’. We noted in Sect. 2.2 that each of its inflectional forms is uniquely identifiable as a plural form, within the morphological system of Tsez. And yet each of its forms can be used to denote a single entitity, and take singular agreement. Thus for this noun, singular and plural are not distinguished inflectionally.

In a canonical system, then, plural inflection is uniquely for plural. Given an inflected form, it can be identified as plural, or as not plural. In a less canonical situation, given a form, it can be identified as plural, but only within the set of forms. Thus in a language with inflection classes, like Russian, a single inflected form may not be diagnostic, but the set of forms is uniquely identifiable (in (16) the nominative plural form sani ‘sledge’ could in principle be genitive singular, but the full set of forms is clearly plural, and makes that interpretation impossible). In a language with limited inflection, there are examples which are even less canonical. In English, given a noun ending in -s, there is a probability that it is plural.Footnote 26 But there are nouns like economics, linguistics (Juul 1975:18–24; Payne and Huddleston 2002:347), for which the model could be sticks vs stick or else mix vs mixes. Typically these nouns are treated as singular (despite the existence of the adjectives economic and linguistic which suggest that the noun has an -s affix). The system would be more canonical if either there were no nouns like mix or linguistics, or else there were larger paradigms, so that other forms of the paradigm were sufficient to make the distinction clear, and establish whether a given item has a stem-ending divide.Footnote 27

4.5.3 Split paradigms

Let us turn to further examples which fill out the typology of pluralia tantum, and then briefly consider other number effects which are non-canonical according to this criterion. In the canonical world, paradigms are consistent, but of course we find many instances of paradigms that are split (Corbett 2015a). In a sense, all pluralia tantum nouns have a split paradigm, in that a part of it is lacking. But here we look at the intersection of pluralia tantum with splits. That is, there are instances of split paradigms, where a noun is partially but not fully a plurale tantum noun. Note first that place names are a rich source of pluralia tantum nouns, often with a relatively clear origin. Thus in English we have the impressive Himalayas and the rather smaller Cotswolds. However, certain Finnish place names like Parainen add a new twist. They were pointed out by Karlsson (2000); the information here is from Tommola (2018 and personal communications). There are several dozen such place names, including Jokioinen, Kaipiainen and Oravainen. The clear facts are that the nominative is singular in form (the hypothetical plural *Paraiset is not accepted), while most of the remaining cases have plural markers:

Finnish (Karlsson 2000)

  1. (38)
    figure y

Karlsson states that the partitive is singular (as in kohti Parais-ta towards Parainen-partitive ‘towards Parainen’), which for Tommola sounds strange, and he points to examples on the web with the plural: kohti Parais-i-a towards Parainen-pl-partitive ‘towards Parainen’. In the accusative we find the singular (Paraisen). Unfortunately, agreeing modifiers appear to be avoided so we cannot investigate the morphosyntax of these split paradigms.

The split can be more complex, leading to a remarkable combination of non-canonical behaviours in respect of distinctiveness, as in the Archi noun χáli ‘family’. The data come originally from Kibrik (1977:28–29n18).Footnote 28 Archi nouns have a singular-plural distinction, indicated by different stems. There are many case values, with a clear divide between the absolutive and the oblique case values. By default, the absolutive singular consists of the bare stem, and the plural is marked with -mul (consonant final stems) and -tːu (vowel final stems). The ergative serves as the stem for the remaining oblique case values (for all but certain kinship terms with a different oblique stem, Kibrik 1977:21). While there are several possible oblique markers, the defaults are: singular: -li (or, for substantivized adjectives, -mu or -mi); plural: -čaj (consonant-final bases), -tːaj (vowel-final bases), -maj (substantivized adjectives). The important point is that, given a form, we have a clear prediction as to its number. Consider then the key cells of the paradigm of χáli ‘family’.

  1. (39)
    figure z

Given the information about defaults in the Archi noun paradigm, we have expectations about the forms here. Xáli is the morphologically unmarked form, and so should be the absolutive singular (as it is). Xáli-tːu is a regular absolutive plural. The form χáli-tːaj is the ergative plural (and the base for the oblique plural case values) which we would expect for a noun with a vowel-final stem. But what of χálə-maj (the shaded cell)? This has an ergative plural marker (albeit one expected of a substantivized adjective). Note that it is unproblematic to add the default ergative marker -li to a stem in -li with non-final stress; for instance, dáli ‘stick’ with ergative dálli, so χáli would be expected to have the ergative *χálli, rather than χálə-maj. When we turn to agreement, we find that the first three cells discussed control the expected agreement, singular or plural, as shown by modifiers within the noun phrase. What then of the ergative singular χálə-maj? This takes plural agreement, even though it denotes a single family. Here are Kibrik’s examples:

  1. (40)
    figure aa
  1. (41)
    figure ab

This pattern recalls that discussed in Tsez (Sect. 2.2), but there are two differences to note. First that there is a split in the singular, with the absolutive behaving normally and the ergative (and the remaining oblique cases) showing unexpected behaviour. And second, the form in the ergative singular cell, while plural in form, is not the same as that in the ergative plural (as was the case in Tsez). Still, this plural form (although singular in meaning) controls plural agreement. Thus in terms of agreement, this noun is normal in the absolutive, but it takes only plural agreement in the oblique cases. It has a split paradigm, and is a plurale tantum noun in the oblique cases (in terms of agreement). The oblique singular cells (see the shaded cell in (42)) are morphosemantically singular, but morphosyntactically plural (for this reason, exceptionally, (39) above is a morphosemantic paradigm, while generally I give morphosyntactic paradigms). The morphosyntactic requirements are as in (42):

  1. (42)

    Archi χáli ‘family’ morphosyntactic requirements (partial)

As (42) makes clear, χáli ‘family’ is a plurale tantum noun, in its forms and in its agreements, but only in the numerous oblique cases; its paradigm is split, and the absolutive is normal.

To fill out the typology, I mention briefly nouns like Serbo-Croat d(ij)ete ‘child’, which has the plural d(j)eca (and partly comparable are brat ‘brother’, plural braća, and nouns like tele ‘calf’ plural telad). These are the opposite of pluralia tantum nouns, in that their morphosemantic plural is singular in form (Corbett 2007b:39); hence they are non-canonical in respect of morphological distinctiveness. The agreements are complex (Corbett 1983:76–93; Wechsler and Zlatić 2003:50–60, 206–219, 2012, and references there).

4.6 Canonical number and canonical gender

In the canonical world, nouns have all possible number values and just one gender value (Corbett 2013:52, 58; Corbett and Fedden 2016:503–504). And in this instance the idealization is in harmony with the accepted terminology. There is no special term for a noun with a single gender value, but there is ‘common gender’ to indicate nouns with more than one gender available. Equally, pluralia tantum is an indication of nouns with one number value when they “should” have more than one. It is a common observation that pluralia tantum nouns need to be specified as plural in the lexicon (many have said so, including Pesetsky and Torrego 2007:263–264; for further instances see Klockmann 2017:144, who treats pluralia tantum nouns as ‘semi-lexical’). As a result, pluralia tantum nouns, like grammatical gender, are problematic for strong versions of Full Interpretation (as noted, for instance, by Bjorkman 2016:69). Sometimes a false connection is made, from pluralia tantum nouns being lexically specified, and therefore similar to gender, to them therefore being a gender value. Treating pluralia tantum nouns as a gender is particularly common in Cushitic studies, a position argued against in detail (Corbett 2012:224–234).Footnote 29

We need to be clear that ‘pluralia tantum’ is neither a gender value nor a separate inflection class. This is evident from these data:

Serbo-Croat (Leko 2009:25):

  1. (43)
    figure ac
  1. (44)
    figure ad
  1. (45)
    figure ae

Serbo-Croat has three gender values, with distinct agreements in the singular and the plural. Examples (43)–(45) show a plurale tantum noun of each gender, with the appropriate gender agreement in the plural (including on the numeral jedan ‘one’, compare Sect. 7.1). Thus the issue of pluralia tantum nouns cross-cuts gender. Furthermore, each noun illustrated belongs to an inflection class containing many other nouns (which have singular and plural); there is nothing inflectionally different about the example nouns, except their lack of morphologically singular forms, of course.Footnote 30

5 Count–mass and the issue of motivation

A key point for discussing whether pluralia tantum nouns are motivated or not is the expected distribution of number marking in a given language. There is considerable cross-linguistic variation in the balance between count and non-count nouns. This variation is constrained by the Animacy Hierarchy (Smith-Stark 1974, following earlier proposals). The modified version here is from Corbett (2000:54–88) where there is a good deal of relevant data, extended in Daniel (2005) and Haspelmath (2005); for values apart from the plural see Corbett (2000:89–132).

  1. (46)

    The Animacy Hierarchy

    speaker   >  addressee   >   3rd person   >   kin   >   human   >   animate   >   inanimate

    (1st person (2nd person

    pronouns)   pronouns)

Smith-Stark proposed that plurality can be ‘a significant opposition for certain categories but irrelevant for others’ (1974:657). Where this occurs, it will affect some top segment of the hierarchy. He adduced two types of evidence: marking of the noun phrase for number (on the noun itself) and agreement in number.

There are languages where the threshold for number-differentiability is much higher than in languages like English. In others, countability can extend right down the scale, well below the English system, as for instance in Yudja (Juruna family), a Tupi language of Brazil (Lima 2014). There can be intra-language variation too, as shown by the fact that the cut-off point for countability varies across the different varieties of English (Ziegeler 2010; Schmidtke and Kuperman 2017), and that there are degrees of countability (Allan 1980:562–563).Footnote 31 While the Animacy Hierarchy constrains the possibility of number differentiability, there are also differences in the frequency of number marking according to the hierarchy (Brown et al. 2013). And see Farmer (2015:80–109) for discussion of its relevance, for number marking and other phenomena, especially with respect to the Western Tukanoan language .

For pluralia tantum nouns, the issue of the two types of evidence is particularly relevant. On the evidence of morphological marking, English sheep does not fit the constraint of the Animacy Hierarchy; on the other hand, in terms of agreement it is in accord with the Hierarchy. And this is a general picture. Hence the following claim (Corbett 2000:67):

Lexical items may be irregular in terms of number marking with respect to the Animacy Hierarchy and regular in terms of agreement, but not vice versa.

Consider again Tsez xexbi ‘child(ren)’ from Sect. 2.2. In terms of number marking this noun does not conform, since singular and plural are identical. However, in terms of agreement it is fully regular, following the claim above.Footnote 32

Besides lexical examples, like sheep and xexbi ‘child(ren)’, there are principled sets of examples where the split on the Animacy Hierarchy differs, according to marking vs agreement. These are highly relevant to the point that internal (morphosemantic) and external (morphosyntactic) feature values can differ (as discussed in Sect. 2.4). A telling example is provided by the West Chadic language Miya. Outline data are given here, from Schuh (1989, 1998), discussed in Corbett (2006:177–179). Miya has two gender values, masculine and feminine, and two number values, singular and plural. In terms of number marking, there is an animacy distinction: animate nouns are those which denote ‘all humans, most, if not all, domestic animals and fowl, and some large wild animals.’ There is a grey area consisting of nouns denoting large wild animals, while the remaining nouns are inanimate (Schuh 1989:175). For animate nouns marking of plurality is obligatory.

There are several different agreement targets, and these have three agreement forms: masculine singular, feminine singular and plural. We can see this with a demonstrative:

  1. (47)

    The demonstrative ‘this’ in Miya (Schuh 1998:243)Footnote 33

     

    singular

    plural

    masculine

    nákən

    níykin

    feminine

    tákən

With animate nouns, agreement is as we would expect; when plural they take plural targets, like níykin ‘these’ in (47). With inanimate nouns, however, the situation is rather different. First, marking is optional (Schuh 1989:175), which is most evident according to Schuh in numeral phrases. Here singular and plural are both possible, but examples like (48b) are more common in texts (Schuh 1998:258):

Miya (Schuh 1998:198)

  1. (48)
    figure af

Second, and more important for us, even when marked as plural, inanimate nouns do not take plural agreement. But agreement is not blocked. The target shows different agreement forms, tracking the noun’s gender in the singular:

Miya (Schuh 1998:193n6, 197)

  1. (49)
    figure ag
  1. (50)
    figure ah

Thus, agreement in number with animates is obligatory, plural agreement with inanimates is impossible, whether or not they are morphologically marked as plural. Agreement with inanimate plurals does occur, but in gender not number. This means that in examples (49) and (50) the noun is clearly plural for internal (morphosemantic) purposes, while for agreement purposes (its morphosyntax) it requires singular agreement, and hence the target can show agreement in gender.

Returning to the central point of the section, we note that there is a substantial semantics literature on the count–mass distinction including: Link (1983), Jackendoff (1991), Borer (2005:86–135), Chierchia (2010), Landman (2011, 2016), Massam (2012), Nicolas (2016), Rothstein (2017:82–116) and Grimm (2018); Bale and Barner (2011) is a helpful bibliography, and Lasersohn (2011) includes clear discussion of the varying use of terms (including the issue of whether mass nouns are necessarily singular, as held by some). A valuable resource for English is described in Kiss et al. (2016). In a recent contribution, Sutton and Filip (2017) account for the variability in the treatment of particular nouns in terms of the competing pressures of individuation and reliability. While the discussion has been primarily in terms of familiar languages, the substantial cross-linguistic variation is beginning to be addressed: Doetjes (2012) extends the discussion, Pelletier (2011, 2012) emphasises the language-specific nature of the count-mass distinction, and Lin et al. (2018), and the papers in their collection, target typologically diverse languages.Footnote 34

For nouns which are below the threshold for number-differentiability on the Animacy Hierarchy there are different possibilities (we should not assume that they will be singular). Some languages do treat them as singular; thus the Finno-Ugric language Mansi, which has a singular- dual-plural system, makes all its non-count nouns singular (Rombandeeva 1973:40). Others, like Manam, have them all as plural (Lichtenberk 1983:269). And yet others have some singular and some plural, as in Turkana (Dimmendaal 1983:224). There may be regularities in such systems; for instance, Andersen (2014:253–254) points out that in Nilotic languages it is common for at least some nouns denoting liquids to be plural (he gives examples from Dinka). If all non-count nouns are plural in a given language, there is no more to be said, but if there is an option, then those non-count nouns which are plural are of interest. And while plural mass nouns are not our primary concern, they fall under Matthews’ definition, and indeed he includes oats as an example. Ojeda (2005:392) includes a list of English mass plurals, following Jespersen (1954:122–124), and Lauwers (2014) and Lammert (2015) discuss the situation in French; Smith (2016) analyses plural mass nouns in Telugu. The question of the motivation of singular vs plural, within the non-count nouns, has been the subject of lively discussion, starting from the opposition of oats and wheat (Wierzbicka 1988:499–560, and 1991a, 1991b; Palmer 1990:226–229; Moravcsik 1991:136–139), taken up again in Ljaševskaja (2004:147–150), Goddard (2009) and Wisniewski (2009:181–184).

From mass nouns, let us return to those pluralia tantum which would be expected to be count nouns. The question of motivation here, that is, the issue of which expected count nouns are pluralia tantum, is truly fascinating. For a given language, we expect countability to depend on the Animacy Hierarchy, as just discussed, and the threshold for number-differentiability varies from language to language. Once that threshold is established, the existence of pluralia tantum nouns may be surprising. To be specific, take English trousers. We might expect it to be a normal count noun, like shirt, jacket or hat. Or indeed like contemporary French pantalon ‘trousers’. Yet trousers is not a random exception. Williams (1994:13) points out that its behaviour is shared by all nouns denoting items of clothing ‘worn on the legs in such and such a way’. This exemplifies what Koenig (1999:1–2) calls a ‘medium size generalization’. The generalization is specific to English and yet, of course, equivalents of trousers recur cross-linguistically in the sets of pluralia tantum nouns.

Related languages may differ dramatically; thus English has many pluralia tantum nouns (Payne and Huddleston 2002:40–348) while there are few in German. It has been suggested that the prevalence of pluralia tantum nouns is an areal phenomenon, with Circum-Baltic languages having substantial inventories (Vraciu 1976; Koptjevskaja-Tamm and Wälchli 2001:629–637). That research on Circum-Baltic languages produced a useful table of the “usual suspects”. The method was first drawing up two lists of 30 items which were pluralia tantum in Baltic languages or in Russian; consolidating those lists gave a list of 56 items. These were translated into 41 languages of Europe. A fraction indicates the proportion of pluralia tantum when the dictionary consulted gave more than one translation. In addition the data are represented as a neighbour net in Wälchli (2011:327). Sadly the raw data are no longer available for further analysis. However, there is new work with the same areal perspective (Tommola 2018).

Table 2 shows clearly that there are items which are frequently pluralia tantum; at the same time, seeing them as naturally pluralia tantum is unwarranted because they are often not in this category. It is helpful to think of such phenomena in terms of two dimensions (cf. Bye 2015:107): first they can be predictable/rule governed vs unpredictable; and second, they can be functionally motivated/natural vs arbitrary.

Table 2 Frequency count of pluralia tantum in 41 languages of Europe (Koptjevskaja-Tamm and Wälchli 2001:631)

Consider first the opposition predictable or rule governed versus unpredictable. There are instances where there appears to be an exceptionless or almost exceptionless generalization, as in the claim cited earlier, that items of clothing worn like trousers will be pluralia tantum in English (Williams 1994:13). There are many areas, however, where predictability is less clear, and the degree of predictability is rather in the eye of the linguist. This leads to the question of whether pluralia tantum nouns are defective. Let us start from Matthews’ (1997:89) definition of a defective lexical item as one ‘whose paradigm is incomplete in comparison with others of the major class that it belongs to.’ Baerman and Corbett (2010:2) start from this definition, and point out that ‘… the more idiosyncratic and lexically restricted the gap, the more canonically defective it is, and the more canonically defective the gap is, the greater the analytical challenge.’ Thus the less predictable a plurale tantum noun is, the closer it comes to being defective. In the canonical instances of defectivity there is a resulting gap (as with the lack of the first singular of the verb pobedit’ ‘conquer’ in Russian). However, pluralia tantum nouns tend to be less than canonically defective, since there may well be a standard ‘patch’ for what would otherwise be missing. For instance, there are different possibilities for combining pluralia tantum nouns with numerals to obviate constructions that would otherwise be impossible (Sects. 7.17.2).

Turning now to the opposition between functionally motivated or natural versus arbitrary, the first observation is that pluralia tantum nouns often seem not to fit the Animacy Hierarchy, which is another possible motivation for linguists to look for semantic explanations. The fit depends on the version of the Animacy Hierarchy used; thus the version proposed by Sasse (1993:659) leads to more problem cases than that given as (46). Table 2 suggests possible arguments for motivation. There can be greater or smaller sub-regularities: so for instance, various languages of the Baltic region have large numbers of pluralia tantum, falling into semantic groups like names of meetings and festivals, and paired cutting tools. There are various lists of semantic groupings including Braun (1930:1–15), Karlsson (2000:648–649), Koptjevskaja-Tamm and Wälchli (2001:630), Kibrik (2003:92–97), Ljaševskaja (2004:92–108) and Acquaviva (2008:19–21). The difficulty is that one can point to the “usual suspects” that appear in several different languages, and equally one can stress the differences.Footnote 35

While the degree of synchronic motivation for particular instances of pluralia tantum needs careful, dispassionate study, their earlier motivation is often clearer. Take the Serbo-Croat plurale tantum noun kola ‘carriage, car’ (as in (45) above), compared with the singular kolo ‘wheel’. And Lezgian surar ‘cemetery’, which is originally the plural of sur ‘grave’. Surar ‘cemetery’ is on a trajectory towards losing its plurale tantum status. It can take a numeral (sa surar ‘one cemetery’) and it has alternative agreement possibilities (Haspelmath 1993:81–82):

  1. (51)
    figure ai

Pluralia tantum place names often betray their origin as denoting a significant group of landmarks. (For more on the origins of pluralia tantum, see Degtjarev 1982.)

Motivated exceptions are a lesser departure from canonicity than lexical exceptions. At the most general level we have motivated exceptions to systems of number, which are the items which fall below the threshold for number differentiability, the area on the Animacy Hierarchy in a given language. Then we have Koenig’s medium size generalizations, where all items in a restricted class follow a (perhaps surprising) generalization.Footnote 36 And least canonical are individual lexical stipulations. For each deviation from canonicity, we can ask whether it is predictable or not, and motivated or not. And specifically, since bipartites have figured large in discussions of motivation, we could ask how they fare in larger systems, including those with a dedicated number value for two referents—the dual. We turn to these larger systems.

6 Reduced expression of number in larger systems

We look at larger number systems, to see how the reduction of possibilities for particular nouns plays out here. The immediate link is that bipartites are often cited as being motivated pluralia tantum. This would lead to expectations about how they will behave in systems with a dual (Sect. 6.1). And more generally, larger systems help clarify the issues discussed earlier. The definitions given in Sect. 1 take the perspective that the items under consideration have only the plural. And this is what pluralia tantum means—having only the plural. But what if they lack the singular? In languages like English the point is moot. However, in languages with more than two values for number there is a difference: one value or more might be lacking. The distinctions we have discussed can now be applied to values other than singular and plural. We look at morphological reduction in a singular-dual-plural system in Sect. 6.2, and to morphological and syntactic reduction in Sect. 6.3. Finally we tackle a four-value system, one with a clear mismatch between morphosemantic and morphosyntactic values (Sect. 6.4).

6.1 The issue of bipartites and the dual

Linguists who stress the motivation of apparent irregularities sometimes point to bipartites, like trousers, as being motivated pluralia tantum. We might ask, then, what might happen to such nouns in a language with a dual. If ‘two-ness’ is the motivating factor, we might expect such nouns to occur as dualia tantum. Yet what we often find (in the relatively few instances where there are data) is that the usual suspects turn up, as pluralia tantum. Thus in Slovene, which has a singular-dual-plural system, nouns like hlače ‘trousers’ are pluralia tantum and not dualia tantum. Priestly (2006) investigates the issue specifically and states that there are no dualia tantum nouns in Slovene. In the other contemporary Slavonic languages with a dual, Upper and Lower Sorbian, there are numerous pluralia tantum nouns, including bipartites. Thus for Lower Sorbian, Janaš (1976/1984:72) gives examples of different types of pluralia tantum, pointing out that they can be used for one, two, or more than two referents. For Upper Sorbian, which also has numerous pluralia tantum, Faßke (1981: 417) suggests two dualia tantum, staršej ‘parents’ and dwójnikaj ‘twins’. However, the dictionary by Völkel (1981:87, 405) has the first as a duale tantum, but gives dwójnik (singular) ‘twin’. Gerald Stone (personal communication, 16 April 2018) states that the plural starši ‘parents’ is in regular use, but that this noun has no singular (like German Eltern ‘parents’, Sect. 8.7), while dwójnik ‘twin’ has a full number paradigm (see also Menzel 2018). And, examining the oldest Slavonic texts, Moszyński (1985) concludes that there were no dualia tantum nouns in Old Church Slavonic.

Now Slovene and Sorbian have ‘weak’ duals. What of languages with robust duals? These are more interesting, but for other reasons, as we shall see in Sect. 6.2 and Sect. 6.3. The Papuan language Lavukaleve has pluralia tantum nouns and no dualia tantum (Terrill 2003:130). Yet there are tantalizing notes in the literature, suggesting the existence of dualia tantum, but little detail; see, for example, Boas (1893:59) on Chinook.

6.2 Missing dual forms: Lavukaleve

Lavukaleve is a Papuan language spoken in the Solomons; the data are all due to Terrill (2003). It has a singular-dual-plural system, with obligatory marking of number. There are some pluralia tantum nouns (under 5% in a count of over 800 nouns), both count nouns such as neo ‘teeth’ and mass nouns like ura ‘steam’ (Terrill 2003:130–132). Such nouns are pluralia tantum in terms of their agreements, they do not necessarily have any morphological marking of plurality. As we noted in Sect. 6.1 Lavukaleve has no dualia tantum. Lavukaleve is of greater importance for us, however. It shows the coverage of our typology, demonstrating how in larger systems there are types of non-canonical behaviour which often slip under the radar. This can be seen in the representative nouns in Table 3.

Table 3 Number marking in Lavukaleve nouns (Terrill 2003:105–130), discussed in Baerman et al. (2017:35)

Various plural markers are illustrated in Table 3, and for each of them two nouns are given. One with a distinct dual marker and one without (this may have to be specified lexically, as shown by the two nouns with the stem kua which behave differently). Each of the nouns without a unique dual is non-canonical (Principle I, see Sect. 4.5). Where there is no unique dual form, the form matches the singular (it is the bare stem).Footnote 37

We see again that in larger systems, there may be one form lacking (here the dual) rather than only one being available. And in Lavukaleve only the morphological form is lacking: in terms of agreement, the noun with the dual form matching the singular still takes the appropriate agreements.

The non-canonical behaviour of Lavukaleve is taken a stage further in languages with ‘inverse’ systems, as shown for instance by Kiowa (a member of the Kiowa-Tanoan family, spoken in south-western Oklahoma). Its number system has rightly attracted considerable research: Early work was done by Wonderly et al. (1954), who introduced the term ‘inverse’, and Merrifield (1959). There is a valuable section in the grammar of Kiowa (Watkins 1984:78–100), taken up in several sources including Noyer (1992:228–236), Watkins (1995), Corbett (2000:159–162), and most recently in Harbour (2007, 2011) and Sutton (2010). Of all the interesting facets of the system, let us concentrate on the inverse marker -gɔ́ (and its variants, including -dɔ́). The following two nouns demonstrate its use:

  1. (52)

    Noun number marking in Kiowa (Watkins 1984:84, 86–87)

     

    singular

    dual

    plural

    ‘horse’

    cê̜ː

    cê̜ː

    cê̜ː-gɔ́

    ‘pole’

    áː-dɔ́

    áː

    áː

Nouns like cê̜ː ‘horse’, are comparable to those Lavukaleve nouns which do not have a unique dual. And, like Lavukaleve, the Kiowa verb distinguishes singular, dual and plural. Nouns like áː ‘pole’, are the opposite, in the sense that there is again no unique dual, but now the dual is syncretic with the plural. And what is striking is that it is the same marker, namely -gɔ́/-dɔ́, which marks the distinct value (hence the term ‘inverse’). This inverse marker makes Kiowa even less canonical in terms of number marking, since the marker -gɔ́/-dɔ́ does not uniquely mark any number values.

The nouns in (52) are representative in that animates behave broadly like cê̜ː ‘horse’, in marking the plural. (The noun t’áp ‘deer’ does not inflect, but its agreements are like those of cê̜ː ‘horse’, which makes it comparable to English sheep in a three-value system.) Most nouns denoting inanimates behave like áː ‘pole, stick’. We noted in Sect. 5 that there can be a mismatch between the evidence of marking on the noun and the evidence of agreement: the latter is the more regular. We have a similar situation here, in that the marking of number on the animates gives an unexpected plural versus singular / dual system; however, the verb is quite regular in showing singular-dual-plural. Having seen the importance of Kiowa for the issue of limited number marking, we should also note that Kiowa is noteworthy for its pluralia tantum nouns. It has uninflecting nouns which always take plural agreement (irrespective of the number of referents): these nouns include hóldà ‘dress, shirt’, kút ‘book, letter, school’, \(k^{h}\)ɔ́ːdé ‘trousers’ and tóː ‘tepee’ (Watkins 1984:90–91).

6.3 Missing singulars: Central Alaskan Yup’ik

Central Alaskan Yup’ik has a more robust dualFootnote 38 than that of the languages analysed in Sect. 6.1, and it proves particularly significant (for earlier discussion of nouns with reduced number possibilities see Corbett (2000:174) and Mithun (2010:132–135)). Recall our question about what it is that pluralia tantum lack, which is more pressing when there are three number values (singular, dual and plural). Central Alaskan Yup’ik has pluralia tantum nouns like niicugnissuute-t ‘radio(s)’ (Miyaoka 2012:716). Such nouns have a clear inflectional marker of plurality (the -t inflection here), they can refer to one item or more, and they take plural agreement. These are of the type discussed in Sect. 2.1, since their morphology and syntax line up, but their semantics can fail to match (when they are used of a single object).

There are other nouns, however, which have no singular, but do have a dual (and at least some are arguably bipartites (Jacobson 1995:434); thus ‘sled’ as in (53) has two runners):

  1. (53)
    figure aj

The key point is that the boy is using one sled, but the noun stands in the dual and the verb agrees with it in the dual. ‘Box’ is a similar noun:

  1. (54)
    figure ak

Miyaoka is quite explicit that the dual form can denote one or two boxes. If there are more, the plural is used:

  1. (55)
    figure al

This matches the view (using a different noun) of Elizabeth Ali and Marianne Mithun, reported in Corbett (2000:174). This means that these nouns are not dualia tantum. Rather they lack the singular. There is extensive data on the related Central Siberian Yupik in de Reuse (2000). He is primarily concerned with the question of motivation. Willem de Reuse confirms (personal communication, May 2012) that he believes the items he cites in the dual can also appear in the plural for denoting more than two referents.Footnote 39

6.4 A larger system: Bayso

The Cushitic language Bayso is significant for our typology, since it has a four-value number system, and this larger system gives novel possibilities for missing forms. Moreover there is a dramatic mismatch between the morphosemantic and morphosyntactic systems of number, which raises the issue of definition sharply. Bayso is a Cushitic outlier of Ethiopia, spoken on Gidicco island in Lake Abaya, and in certain villages on the shores of the lake (see Savà 2011 for information on its status). The data are originally from Hayward (1979), discussed in Corbett and Hayward (1987) and in Corbett (2012:224–233), with personal communications from Dick Hayward. The account here develops that work.

Bayso pronouns have a simple system:

  1. (56)

    Third person pronouns in Bayso (Corbett and Hayward 1987:12)

     

    singular

    plural

    masculine

    úsu

    íso

    feminine

    ése

This three-way opposition maps straightforwardly onto all agreement targets, namely the verbal predicate, the demonstrative and the associative particle.Footnote 40 Since personal pronouns are at the top of the Animacy Hierarchy (Sect. 5) we use them on principle to determine the appropriate glossing: thus our agreement targets have three forms, masculine singular, feminine singular and plural. Bayso nouns, however, distinguish four values, general, singular, paucal and plural. General number allows expression of the lexical meaning of a noun without reference to number (it may refer to one or more than one); that is, the speaker need not specify the actual number of referents. The singular is used of a particular individual only. The paucal is for a few individuals, from two to about six, and plural is for larger numbers. We now ask how the four values available for nouns map onto agreements. Here are relevant examples for a regular masculine noun (lúban ‘lion’) and a regular feminine noun (kimbír ‘bird’), each representing many nouns:

Bayso regular nouns (Hayward 1979 and personal communications, Corbett and Hayward 1987)

  1. (57)
    figure am
  1. (58)
    figure an
  1. (59)
    figure ao
  1. (60)
    figure ap

The agreements for general number and singular are the same, for masculines and feminines, and this is a general regularity. Then, nouns of both genders take the masculine singular form when plural. This type of pattern is widespread in Cushitic languages: many have the feminine singular here, and a minority like Bayso use the masculine singular. Bayso is unusual in innovating a paucal number, for which plural agreement is used.

Examples (57)–(60) demonstrate again that the morphosemantic specification of a paradigm cell may not match its morphosyntactic requirement. Thus the morphosemantic plural takes a singular agreement target for all regular nouns. The usual patterns of correspondence between the morphosemantics of the noun paradigm and its external requirements are given in (61):

  1. (61)

    The system of Bayso regular nouns

The shading shows the regular patterns, for masculine and feminine nouns. The left column gives the noun’s number value and the columns specify the agreement it takes; thus lúban ‘lion(s)’ is general number, and it takes masculine singular agreements.

Bayso has some non-count nouns, of two types. Some are like ees ‘grass’, which takes masculine singular agreement, and its form is best analysed as general number. A small group of others, namely eenoo ‘milk’, ogorroo ‘hair’, soo ‘meat’ and udú ‘faeces’ take plural agreement. These are in a sense like English oats. But this being Bayso things are not that simple. Taking plural agreement is not the same as being morphosemantically plural. Three of the nouns end in -oo, which reconstructs as a plural marker in Proto-Omo-Tana, from which Bayso descends (Corbett and Hayward 1987:19–26); nouns in -oo took plural agreement. In the modern language, however, being morphosemantically plural and taking plural agreement are different things. In order to specify the irregularity of eenoo ‘milk’, ogorroo ‘hair’, soo ‘meat’ and udú ‘faeces’, the logical way is to mark them as paucal. The regular rules of agreement will ensure that they then take only plural agreement. They are, then, paucal tantum.

A few nouns take irregular agreement patterns, which we mention briefly. There are four nouns which, when plural, take feminine singular agreement instead of the expected masculine singular (of them, three are masculine, like aar ‘ox/bull’, and one is feminine, abba ‘sister’). Then six nouns take plural agreement when plural (which is aberrant in Bayso); these include baal ‘feather/leaf’, which is masculine, with the plural baalallo, and nébe ‘ear’, feminine, with the plural nebebboo.

We now turn to the key examples: those count nouns which are missing a form, within this four-value system (we have discussed the non-count nouns already). There are two types.

  1. (62)

    Bayso nouns with a missing value—type 1

Type 1 nouns lack the singular. Their general number form takes plural agreement. The nouns of this type again contain some usual suspects: kalaljaa ‘kidney(s)’ has the shape of a paucal (though it has not been recorded without the paucal ending), presumably because kidneys typically occur in pairs; similarly for feet (lukkaa), sandals (keferoo), eyes (iḷoo) and hips (moo). These nouns, then, lack a singular, and require exceptional marking of their general number form as taking plural agreement. Arguably the second fact explains the first: since singulars are formed from general number forms, and retain the same gender, there is no regular general number form from which a singular could be derived. Thus specifying the irregular agreement of the general form is sufficient. Finally, then, our most intriguing example (another usual suspect):

  1. (63)

    Bayso noun with a missing value—type 2

Ilkoo ‘tooth/teeth’ is remarkable. Like type 1 nouns it lacks a singular, and its general form takes plural agreement. Its paucal form, ilkoojaa, takes plural agreement as expected. And then its plural form, irregularly, takes plural agreement. This is a noun which lacks only one form, the singular. But because of its irregularities, the effect is that it takes only plural agreements (defined earlier as those which agree with the plural pronoun). In a sense, this is indeed a plurale tantum noun, in terms of agreement rather than form.

Definitions:

the definitions given earlier do not specify what is intended by ‘form’. Naturally we expect that (i) cells labelled “plural” in a paradigm (the internal specification) are indeed plural in terms of their semantics (grammatical meaning), and (ii) control plural agreement (their external specification). But these specifications, internal and external, need not line up. The regular nouns of Bayso demonstrate this clearly. (If there was a total mismatch for all instances, we would assume that the forms were simply mislabelled, but this is not the case for Bayso.) And as we can have pluralia tantum which have only plural forms (but more than one type of number agreement), Bayso ilkoo ‘tooth/teeth’ is a plurale tantum noun in the sense that it takes only plural agreement, for all three of its morphosemantic number values.

7 Repercussions of pluralia tantum

The existence of pluralia tantum nouns, of different types, has repercussions through the system. The most striking arise when there is a mismatch of semantics versus syntax and morphology, in particular where a noun which is semantically a count noun, gives syntactic problems when it is to be counted. This may involve special numerals or forms of numerals (Sect. 7.1) or it may require a classifier-like construction (Sect. 7.2). In the opposite situation, where the plurale tantum noun is a mass noun, there may be problems induced by semantic recategorization (Sect. 7.3). Finally, with respect to morphology, compound formation may be difficult if a plurale tantum form is not appropriate within a compound (Sect. 7.4).

7.1 Numerals

Pluralia tantum nouns, when they would be expected to be count nouns, can create problems for numeral constructions, notably in instances where a singular would be expected. Let us start with the numeral ‘one’. In the Slavonic languages we find that this numeral agrees in number with its head, and hence is plural (as noted in Sect. 2.4 above):

Russian

  1. (64)
    figure aq

There is a trickier problem, however. As a result of the loss of the dual in most of the Slavonic languages, some very specific forms are found in numeral phrases. In particular, forms may be used which, for the majority of nouns, are synchronically equivalent to a singular form. In Russian, for instance, we find the noun in the genitive singular, with the numerals dva/dve ‘two’, tri ‘three’, četyre ‘four’ (and with some other items including oba/obe ‘both’). Besides the restriction to these numerals, the phrase must be in the nominative (or the accusative syncretic with the nominative); see Corbett (2012:209–210) for details and sources. The problem is now clear: numerals may require a form (genitive singular) that a plurale tantum noun cannot supply; specifically, there is no grammatical combination of dva/dve ‘two’ and sani ‘sledge’ for a phrase in the nominative. However, there is another set of numerals, the collective numerals.Footnote 41 These have distinct morphology and take the genitive plural; hence they can be used with pluralia tantum nouns:

  1. (65)
    figure ar

This usage is largely restricted to the situation where it is needed (the lower numerals in a direct case); elsewhere the normal numerals are much the more frequent. This has been confirmed with corpus and questionnaire research by Nikunlassi (2000:235–239).Footnote 42

There are further complications, which I will mention briefly (see also Pesetsky 2013:55–56 for discussion). First, not all pluralia tantum nouns behave alike. Mel’čuk (1985:379) and Nikunlassi (2006) show that for bipartites, like brjuki ‘trousers’, the classifier construction (Sect. 7.2) is preferred to the collective numeral. A part of the reason for this distinction may date from the time when collective numerals were used with paired objects, like boots, to enumerate the pairs, a usage which has now largely disappeared (Mel’čuk 1985:385). And second, in complex numeral phrases, it is the last element of the complex numeral which determines the featural specification of the noun. Thus dvadcat’ odin ‘twenty-one’ takes the nominative singular, and dvadcat’ dva/dve ‘twenty-two’ the genitive singular. But collective numerals cannot appear in complex numerals. As Mel’čuk (1985:378) points out, this means there are written forms, involving complex numerals and pluralia tantum nouns, which cannot be pronounced (also discussed in Pesetsky 2013:141n5). The classifier construction again saves the day, because the classifier can appear in the singular, and then take the plurale tantum noun in the plural.

While I have concentrated on the complexities of Slavonic, Ojeda (1997) gives detailed attention to the collective numerals of Latin, including their use with pluralia tantum (compare also Löfstedt 1958:100–109, 218); Ojeda also considers Icelandic collective numerals in this function. And Miyaoka (2012:716–717) illustrates the use of group numerals in such instances in Central Alaskan Yup’ik.

7.2 The ‘classifier’ solution

The problem of combining a plurale tantum noun with a numeral is solved in many languages by the use of a classifier, as in this example:

Russian

  1. (66)
    figure as

The lower numerals of Russian require a form that is for almost all nouns identical to the genitive singular, and so the construction with the classifier element para ‘pair’ (here in the singular) solves the incompatibility of numeral with plurale tantum noun. However, as the English translation shows, a classifier may be used where there is no obvious syntactic problem. In general, not just in English, there may be restrictions on how readily the classifier construction is accepted, according to how much lexical meaning is retained by the classifier and hence how compatible this is with the particular noun.

7.3 The problem of recategorization

English permits recategorization of mass nouns like coffee to count nouns (coffee with plural coffees) with a portion reading (a coffee and two teas please) and a sort reading (the coffees in this shop are really something). English is particularly permissive in this regard; the possibilities and the range of meanings of such recategorizations vary from language to language. With pluralia tantum, for mass nouns, it appears that recategorization involving the use of the singular is less readily available.Footnote 43

7.4 Derivational morphology

The canonical plurale tantum noun, having plural inflectional morphology only, would also have only the plural form available for derivation. And indeed, we often find this situation: derivational morphology in English uses the plural form for compounding, as with almsgiving, clothesbrush and so on (Kiparsky 1982:137–138). However, the singular is found in compound formation with bipartites, as in trouser pockets, spectacle case and others (Kiparsky 1982:174n3; Arregi and Nevins 2014:316–318). Haskell et al. (2003:132–134) investigated modifiers in compounds, obtaining acceptability ratings for singulars, regular plurals, and pluralia tantum nouns as modifiers (hammer rack / hammers rack / pliers rack). The pluralia tantum nouns fell between the singulars (most acceptable) and regular plurals (least acceptable), and were significantly less acceptable than the singulars. This, they claim, is not what would be predicted from a level-ordering account. Note that the potential to use apparent singular forms (bare stems) of pluralia tantum nouns for derivational purposes is a particular fact about English: it is not general.Footnote 44 Thus in Russian the singular *nožnica from nožnicy ‘scissors’ is impossible for any purpose (Zaliznjak 1967/2002:58).

8 The range of possibilities: defining pluralia tantum

Since different criteria apply to particular examples, it is now worth drawing out the range of variation we have identified.

8.1 Exhaustiveness

A canonical noun “should” have available all values of the number feature. Pluralia tantum nouns are non-canonical against this criterion. But in larger systems we see the range of contrast. In Central Alaskan Yup’ik (Sect. 6.3), there are full pluralia tantum nouns like niicugnissuutet ‘radio(s)’, which have only the plural, as opposed to nouns like ikamrak ‘sled(s)’, which lack the singular but have dual and plural.

8.2 Semantic-syntactic-morphological alignment

In the canonical world, a noun’s semantic, syntactic and morphological properties are aligned. We have documented pluralia tantum nouns for which this does not hold: scissors has the semantics of a count noun, which is out of alignment with its syntax and morphology (Sect. 2.1); Tsez xexbi ‘child(ren)’ has its (plural-only) morphology out of line with its semantics and syntax (Sect. 2.2); Russian galife ‘riding breeches’ has semantics, syntax and morphology out of alignment (Sect. 2.3). We return to a further instance in Sect. 8.7. These alignment issues can be seen in the light of the more regular misalignments we saw in Miya (Sect. 5) and Bayso (Sect. 6.4).

8.3 Semantic compositionality

In the canonical world, lexical and grammatical meaning (here the meaning of the number value) can be combined in a fully compositional way. We saw potential issues with full compositionality, from number dominance and orientedness through to pluralia tantum. It is not self-evident that pluralia tantum nouns are necessarily non-compositional in terms of number; see, for instance, Acquaviva (2016b) for instances where plural items which appear as non-compositional may yield to more nuanced interpretations.

8.4 Syntactic consistency

Many examples of pluralia tantum nouns are canonical in respect of consistency: they require the same (plural) agreement on all targets. However, pluralia tantum may be inconsistent, taking some singular agreements (as we noted for the Hebrew be’alim ‘owner(s)’, and Finnish Filippiinit ‘the Philippines’ Sect. 4.4). As singular agreement becomes available for agreement targets at successive positions on the Agreement Hierarchy, this is the route whereby pluralia tantum nouns can transition into being ordinary in respect of number. (Recall also the Lezgian example (51) in this respect.)

8.5 Morphological distinctiveness

In the canonical world, features and their values are fully distinguished by formal means. Again we draw a distinction between all values and particular values. Thus just in terms of number marking we can separate out English sheep (no distinction in form between singular and plural) and items like Lavukaleve buti ‘shoe’ (Table 3), where the plural is distinct, but dual is not distinguished from the singular. Relating this to pluralia tantum nouns, we have those like Tsez xexbi ‘child(ren)’ (Sect. 2.2), where we have robust plural marking, to those like Russian galife ‘riding breeches’ (Sect. 2.3) with no morphological distinctions. The latter exists in a language where number is normally marked formally but not on a minority (including this noun) and at the other extreme there is Walman (Sect. 4.5.1) where nouns in the main do not mark number, and so pluralia tantum nouns are pluralia tantum only in respect of agreement.

8.6 Defining pluralia tantum nouns

On the basis of the observations through the paper, and the criteria in Sects. 8.18.5, we can now offer a more robust definition.

Pluralia tantum nouns have only the plural.

  • inflectional criterion: In inflectional terms, having only the plural means that only the plural sub-paradigm is available. This plural sub-paradigm may consist of one cell or more than one.

    • For count nouns, the mismatch between semantic countability and the lack of inflectional form(s) for non-plural values may lead to defectiveness (Sect. 5); or there may be special strategies for using pluralia tantum nouns in environments where the number of items is distinguished (notably in constructions with numerals).

    • For non-count nouns, the defectiveness issue does not arise except in instances of recategorization.

    • Establishing that the form(s) available to the noun are indeed plural forms involves comparing with normal count nouns, and considering agreement evidence.

  • syntactic criterion: In syntactic terms, having only the plural implies that only plural agreements are possible.

  • relation of the criteria:

    • the inflectional and the syntactic criteria may be aligned (scissors Sect. 2.1)

    • a noun may be a plurale tantum just in terms of inflection, but not in terms of agreement (Tsez xexbi ‘child(ren)’ Sect. 2.2)

    • a noun may be a plurale tantum just in terms of agreement, but not in terms of inflection. This may apply to nouns which do not inflect for number (whether this is regular or exceptional), but where other count nouns control different number values in agreement (Russian galife ‘riding breeches’ Sect. 2.3, Walman apar ‘bed’ Sect. 4.5.1, Bayso ilkoo ‘tooth/teeth’ Sect. 6.4).

Thus what it means to be a plurale tantum noun is seen most clearly by calibrating from the baseline of a canonical noun with canonical number properties.

8.7 The canonical plurale tantum noun

In most work within Canonical Typology, it proves logical to anchor one end of a criterion but not the other (rather like the temperature scale anchored to zero Kelvin but without an upper limit). However, there are phenomena where both ends of a scale have natural end points. Suppletion is one such phenomenon: according to the criteria of maximally regular semantic correlation and maximally irregular formal correlation, we can calibrate from absolutely regular inflection at one end to full suppletion at the other (Corbett 2007a). We should consider pluralia tantum nouns in this way. At one end of the scale we have nouns that are fully canonical for number, and at the other end—arguably—we could situate canonically pluralia tantum nouns.

Intuitively a canonical plurale tantum noun needs to be lexically specified as such (it would not be part of a medium size generalization for instance). Moreover, in the canonical world, criteria line up, so we would set the limit at nouns whose semantics, syntax and morphology are aligned; that is to say, a noun which has only the plural in semantic, syntactic and morphological terms. (This is a “worse” scenario than those discussed in Sect. 2.) Such a noun would be defective: it would be specified as semantically plural, leaving a gap where the singular would be expected.

Such an item has been proposed (Baerman 2009). Surprisingly it is from German, a language with few pluralia tantum nouns (examples include Masern ‘measles’ and Spesen ‘expenses’), despite its areal position near the pluralia tantum hotspot of the Circum-Baltic region (Sect. 5). Yet German has the remarkable Eltern ‘parents’. This noun has only plural forms, it takes only plural agreements, and it must denote more than one parent. It can be used whenever more than one parent is intended and where there is no specific reference to sex. It does not have a corresponding singular.Footnote 45

8.8 Summary of noun types

We now survey the main noun types discussed, particularly from the perspectives of exhaustiveness (Sect. 8.1) and alignment (Sect. 8.2), see Table 4:

Table 4 Summary: number possibilities of the main types

Book has all number possibilities, for semantics, syntax and morphology. There are increasing restrictions through to the ‘worst’ case, German Eltern ‘parents’, which would be expected to be a count noun but which fully lacks the singular. Bold marks items that can be labelled pluralia tantum. Reduced indicates that not all the potential values are available, but still there is more than one. In the diversity, we note that there is a general trend for greater restriction in syntax (agreement) than in semantics, and greater in morphology than in syntax.

9 Pluralia tantum constructions

The criteria that prove useful for defining pluralia tantum nouns have extensions beyond the simple word. There are set expressions involving pluralia tantum: give someone the creeps. Then there are phrases which appear only in the plural, even though the head noun is not a plurale tantum: United States (but not state), Olympic Games (but not game). Spanish has an interesting set of these:

  1. (67)
    figure at

While the greeting appears to be pragmatically comparable to that in other languages, including other Romance languages, Spanish uses the plural in this set of expressions.Footnote 46

Moving to pluralia tantum compounds, Russian has figli-migli ‘tricks’, in which both parts inflect, but only in the plural (Zaliznjak 1977:283); there are several examples in the Russian National Corpus, with plural attributive modifiers. Recall too the Tsez compounds (plural because they consist of two items) in footnote 7.

We now consider a surprising plurale tantum construction in Russian (Sect. 9.1), and some general uses which are restricted to the plural (affective uses Sect. 9.2).

9.1 The Russian ‘including’ construction idti v letčiki

Russian has a construction which is noteworthy for several reasons, including the fact that the nominal element must be plural, even when this lacks motivation. The expression idti v letčiki means ‘become a pilot’ (literally ‘go into the pilots’). There are various verbs which can be the first element, and even some nouns. The last slot can be occupied by any animate noun, but typically it is one denoting a profession or social grouping. What is constant is the preposition v ‘into’ and the fact that the final noun is plural:

  1. (68)
    figure au

The point is that the election is for one president, the noun prezident ‘president’ has a perfectly good, frequent singular, but in this construction the plural is required (its case form is problematic, see Corbett 2012:210–213). Thus we have a productive, common construction, which involves a plurale tantum slot.Footnote 47 Similarly in French we find expressions like jouer les héros ‘play the hero’ (literally ‘the heroes’).

9.2 Affective uses

Number is used in ways beyond its basic function. It is commonly used for honorific purposes, and here all values may be involved, showing different degrees of respect (Corbett 2000:220–228). More relevant for us are various affective uses, such as the exaggerative and the intensificative (Corbett 2000:234–239 provides examples from a range of languages). Here is one from Russian:

  1. (69)
    figure av

Here the affective use is clear, since there is no straightforward interpretation of the plural prepositional phrase. Here is an intensificative use from Slovene:

  1. (70)
    figure aw

Recall that Slovene has a dual, so we can ask whether the dual is possible here (perhaps to indicate less intensity than the plural). In fact denarici ‘(two) purses’ can only have the literal meaning.

We can therefore see the same patterning for these uses of number values as we saw for nouns in larger systems in Sect. 6. For honorific use, all values may in principle be available. While for affective use, on the evidence to date, only the plural can be used.

10 Conclusion

The increased interest in pluralia tantum nouns is welcome. However, the topic is broader than most linguists have realized. Scissors, Tsez xexbi ‘child(ren)’, Russian galife ‘riding breeches’, Walman apar ‘bed’ and Bayso ilkoo ‘tooth/teeth’ are all pluralia tantum, but they are different. We have analysed their differences within a canonical typology, taking in the whole range of limited number, including restrictions within larger systems (even going beyond nouns).

We have seen that confusion over definitions has arisen in part because of the tacit assumption that a noun’s paradigm cells are consistent, in the sense that their “internal” morphosemantic specification and their “external” morphosyntactic specification are identical. While this identity holds in the default case, and indeed in the canonical world, there are important instances involving pluralia tantum nouns where it does not. These demonstrate that we must expand Stump’s content paradigm, in order to meet both sets of requirements. The concern to be fully explicit about the different types of pluralia tantum nouns has led us to this theoretical advance. We also saw that what is required for certain pluralia tantum nouns also holds for the more general, systematic instances in Miya and Bayso. Thus careful typology leads to theoretical advance, and equally formal theory makes explicit the ramifications of our typology.

The value of a specifically canonical approach to typology has also been demonstrated. The paper contributes to that body of research in analysing another phenomenon where we can calibrate from a fully canonical instance of a noun with all number possibilities, through to the “worst” case of a fully canonical plurale tantum noun.