1 Introduction

Throughout the development of generative grammar, there has been a recurring tendency to associate interrogative wh-phrases with focal elements. The exact nature of this connection has changed according to the view of focus, and of the syntax-semantics interface, that has prevailed at each stage of investigation.

In this paper we contribute to this issue by investigating the prosodic properties of root wh-questions in Italian. We argue that the placement of the Nuclear Pitch Accent (henceforth NPA) and main stress in root wh-questions speaks against a direct association between prosodic prominence and focal interpretation; instead, it offers support for the hypothesis that prosodic structure is sensitive to a syntactically active [focus] feature, which triggers a successive cyclic derivation through every phase edge intervening between the first-merge position of the wh-phrase and its final landing site.

1.1 The assimilation of focus and wh-phrases: A short history of the problem

The issue of the relationship between interrogative wh-phrases and focus originated from Chomsky’s (1976) observation that a focussed phrase, just like an interrogative wh-phrase, gives rise to the weak crossover effect, i.e. it cannot bind a pronoun that is to the left of its base position, as shown in (1)–(2).Footnote 1 (The focussed phrase is conventionally notated in bold.)

  1. (1)
    figure b
  1. (2)
    figure c

Chomsky explained this parallelism by assuming, with Jackendoff (1972), that a focussed phrase covertly moves to a scope position to the left of the pronoun; thus, the LF of (1) would be (3), which is fully parallel to (2):

  1. (3)
    figure d

The covert focus movement postulated in (3) was argued to have an overt counterpart in Hungarian. É. Kiss (1987, 1998) pointed out that in Hungarian, the preverbal position that a wh-phrase obligatorily moves to in a question (cf. (4a)) is typically filled by the narrowly focussed phrase in the answer, provided that the focus is interpreted as exhaustive (cf. (4b)) (see also Brody 1990).Footnote 2

  1. (4)
    figure e

Later on, within the early Minimalist framework (Chomsky 1993), it was assumed that all syntactic movement is triggered by the need to check a syntactically active feature. This feature-driven view was generalized in Rizzi (1997), who proposed that both wh-movement and focus movement—in fact, all types of overt A' movement—create a Specifier-Head relation between the moved phrase, endowed with a triggering feature, and a functional head endowed with the same feature (also known as “criterial configuration”). In Rizzi’s view, then, focus movement targets the Specifier of a Focus head above IP, as shown in (5).

  1. (5)
    figure f

Rizzi’s proposal prompted an intensive cross-linguistic investigation of focus movement within the theoretical framework that came to be known as “syntactic cartography” (cf. Puskás 2000; Frascarelli 2000; Alboiu 2002; Aboh 2004; Cruschina 2012; Bocci 2013, among others; see also Bianchi et al. 2016). This approach also strengthened the initial parallelism between wh-movement and focus movement: on the basis of distributional evidence, Rizzi (1997, 2001a) argued that in direct wh-questions the wh-phrase targets the same Spec,FocP position as focus fronting. At the semantic level, the criterial configuration is mapped at the interface into a structured meaning, consisting of a focus and a background or presupposition (see Jackendoff 1972; Krifka 1995, 2001, 2006, among others).Footnote 3

On the other hand, the status of focus as a syntactically active feature was criticized in that this is not inherently specified on certain lexical heads, but must be assigned to a syntactic element from outside the lexicon (e.g. to Olaszországban in (4b)): this violates the Inclusiveness Condition, whereby syntax cannot add any featural information to that which is carried by the lexical items (see in particular Szendrői 2001 and Horvath 2010). As an alternative, Neeleman and van de Koot (2008) propose that movement is not feature-driven but interface-driven, i.e. it is triggered by the need to create a syntactic configuration that can be properly interpreted.

A different perspective on focus originated from the work of Reinhart (2006: Chap. 3, first circulated around 1995), in which the interface with prosody took centre stage. It is generally acknowledged that a focussed element that takes the sentence as its scope must be maximally prominent in the prosodic structure and must associate with the main stress of the sentence (Truckenbrodt 1995; Selkirk 2008, among others). Reinhart argued that the focal interpretation is directly read off this prosodic marking, with no mediating role of a syntactic focus feature. In particular, the Nuclear Stress Assignment rule assigns the main stress to the most embedded element in the sentence (Cinque 1993; Zubizarreta 1998), e.g. the direct object in (6). The location of the main stress then determines the focus set of the sentence/derivation, i.e. the set of its possible foci, according to principle (7):

  1. (6)
    figure j
  1. (7)
    figure k

By principle (7), (6) has the focus set in (8), as exemplified in (8a-c): in each case, the size of the focus constituent can be determined by its congruence to the current question.

  1. (8)
    figure l

In addition to the default Nuclear Stress Assignment rule, the rule of Stress Shift applies whenever the focus set determined by the Nuclear Stress Assignment rule yields a focus set such that none of its members is appropriate in the context. For instance, none of the focus options in (8) is appropriate as an answer to the question Who is building the desk?: hence, the rule Stress Shift moves the main stress to the subject in the answer, so that the latter is narrowly focussed.

Importantly, no focus feature is assumed in the syntactic representation, nor does the interface principle (7) require any syntactic movement.Footnote 4 As Reinhart herself stressed, (7) constitutes a departure from the ‘T-model’ of grammar, in that it allows direct communication between the prosodic structure and the inferential and pragmatic components.

While this line of analysis abandons the initial parallelism between wh-movement and (overt or covert) focus movement, a different association between focus and wh-phrases was introduced. In Rooth’s (1985, 1992) alternative semantics, a focus contained in a constituent α yields, in addition to the ordinary denotation of α, a focus semantic value, namely a set of alternative denotations which differ from the ordinary denotation in the value of the focussed position. For instance, in (8a) we get a set of alternative propositions of the form ‘my neighbour is building x’; in (8b), a set of propositions of the form ‘my neighbour is X-ing’ (with X a one-place predicate). Similarly, Kratzer and Shimoyama (2002), building on Hamblin (1973), proposed that wh-phrases—and indefinites in general—introduce a set of alternatives into the semantic computation. This parallelism was made fully explicit by Beck (2006), who argued that wh-phrases contribute only a focus semantic value to the interpretation process.Footnote 5 From this perspective, a wh-phrase or a focussed constituent need not move in order to be properly interpreted.Footnote 6

Summing up, two main views of focus emerged in the literature. The first approach considers focus to be a syntactically active property (or feature) that drives movement of the focussed phrase; at the interfaces, this gives an instruction both to the prosodic component (e.g. driving stress assignment) and to the semantic component (yielding a structured meaning). This view complies with the T-model, but it requires the Inclusiveness Condition to be weakened.

In the second approach, focus is directly marked by prosody and, semantically, it introduces a set of alternatives; this does not require syntactic movement, meaning that the connection between prosodic marking and focal interpretation is not mediated by syntax.

In turn, the two approaches define in a different way the parallelism between foci and wh-phrases: in the first approach, they are both endowed with a syntactically active focus feature; in the second approach, they both introduce alternatives and, either they bear no focus feature at all, or they bear a focus feature that is not syntactically active (i.e. it is not specified on an attracting head).

1.2 Structure of the argument

In this paper we address the relation of focus to interrogative wh-phrases by examining the prosody of wh-questions in Italian, and in particular the distribution of the Nuclear Pitch Accents.

In the Autosegmental-Metrical model of intonation (see Pierrehumbert 1980; Ladd 1983; Pierrehumbert and Beckman 1988, a.o.), pitch accents (PAs) are analysed as tonal specifications that associate with strong, prominent elements in the metrical representation. In this respect, pitch accents contrast with edge-tones (i.e. phrase accents and boundary tones), since the latter correspond to tonal specifications that associate with the edges of the prosodic constituents (i.e. intermediate phrases and intonational phrases, respectively). Within a prosodic constituent, the most prominent PA is the Nuclear Pitch Accent (NPA). As we will discuss in Sect. 3.3, the wh-questions analysed are systematically phrased into a single intermediate phrase and into a single intonational phrase, which coincide. As a consequence, each wh-question contains a single NPA, which corresponds to the most prominent PA of the sentence.

In this paper, we predominantly discuss the location of NPA and we only consider the nuclear stress (i.e. main phrasal stress or sentential stress) when the discussion is directly supported by phonetic data (cf. Sect. 3.4). The phonological notion of nuclear stress refers to the element (i.e. the metrical head) that bears the highest level of prominence in the metrical structure of the sentence.Footnote 7 In principle, we should expect that intonational prominence and metrical prominence go hand by hand. In this sense, NPA—the most prominent PA in the intonational phrase—should also coincide with the metrical head of the intonational phrase, i.e. nuclear stress. We empirically assess this correspondence between NPA and nuclear stress in the first experiment.

In particular, we concentrate on the position of the NPA in Italian direct wh-questions featuring a bare wh-element. Calabrese (1982), Ladd (1996) and Marotta (2001) observed that, in this type of question, the NPA falls on the lexical verb, even when this is not in final position, as exemplified in (9), where boldface indicates main prominence. This finding is validated by two prosodic experiments that we report on in Sects. 3 and 4 below.

  1. (9)
    figure q

The non-final placement shows that the NPA here is not assigned by the default syntax-prosody mapping rules, which in Italian target the rightmost position within an intonational phrase (Rightmostness; see Nespor and Vogel 1986; Avesani 1990, among others). It must also be stressed that the systematic assignment of the NPA to the lexical verb observed in direct wh-questions like (9) does not seem to emerge in other clause types (Bocci and Cruschina 2018; Gili Fivela et al. 2015, cf. Sect. 8). If we consider the interpretation of these sentences, the NPA placement exemplified in (9) seems to have nothing to do with focus: in fact, the lexical verb is not interpreted as focussed ((9) could be uttered in an out-of-the-blue context). Moreover, in wh-questions focus is commonly associated with the wh-phrase, as discussed in the preceding section; yet the NPA is not assigned to the wh-phrase here.

If we leave focus aside, the exceptional prosodic pattern could be faced in two alternative ways. A first possibility is to assume that the NPA placement is a scope-marking mechanism, which marks the extension of the wh-chain: in fact, the lexical verb in (9) is adjacent to the first-merge position of the wh-phrase. Indeed, in the first prosodic experiment that we report on in Sect. 3 below, we observe that when a wh-phrase undergoes long-distance extraction from an embedded clause, the NPA predominantly falls on the lexical verb of the embedded clause, as exemplified in (10):

  1. (10)
    figure r

However, the scope-marking view predicts that the same marked NPA placement should be observed in indirect wh-questions, but this is not the case, as will be discussed in detail in Sect. 7.1. Furthermore, if the NPA placement played the role of a scope marker in wh-questions by signalling the extraction site of the wh-element, we would expect that in a direct wh-question in which the wh-phrase is a nominal complement, like in (11a), the noun associates with the NPA. The results of our second experiment disconfirm this prediction. In direct wh-questions, NPA is systematically assigned to the lexical verb when the wh-element is a nominal complement (11a), as well as a verbal complement (11b).

  1. (11)
    figure s

Thus, the results of the second experiment show that NPA distribution does not signal the first-merge position of the wh-element and does not mark the extension of the wh-dependency (cf. Sect. 4).

A second possibility would be to assume that in (9) the NPA is shifted from the default rightmost position by destressing/deaccenting of discourse-given material. It has been argued in the experimental literature that given information in Italian does not get destressed and deaccented in situ (Swerts et al. 2002; Avesani and Vayra 2005, a.o.).Footnote 8 Nevertheless, let us consider the hypothesis that a constraint/rule like “destress given” that Féry and Samek-Lodovici (2006) propose for English is operative in Italian: elements that qualify as discourse-given must be prosodically non-prominent (which can be defined as the incapability of being assigned phrase-level metrical heads, see Selkirk 2008). This approach implies that the post-verbal PP of (9) and (10) fails to bear NPA because it is discourse-given. While this is certainly possible, it is by no means necessary; in fact, the constituent following the NPA-marked verb can be a novel, non-specific indefinite within an out-of-the-blue question like (12). See Sect. 7.2 for a more detailed discussion and for empirical evidence corroborating the intuitive judgment of (12).

  1. (12)
    figure t

If we instead assume that the [focus] feature is involved in the marked NPA placement of direct wh-questions, the following possibilities arise.

The first possibility is that the [focus] feature is specified on the wh-phrase, but its prosodic realization is shifted to the right-adjacent finite verb. Marotta (2001) develops an account along these lines, based on the idea that bare wh-elements in Italian are weak (in the sense of Cardinaletti and Starke 1999), and cannot be assigned the NPA (cf. Sect. 2). In order to assess this hypothesis, in our first prosodic experiment we tested NPA placement under long-distance movement, as exemplified in (10) above. Marotta’s approach predicts that the NPA should be consistently shifted to the lexical verb that is closest to the final landing site of the wh-phrase, i.e. on the matrix clause verb in (10); but as already anticipated, this prediction is not borne out in our experimental results (cf. Sect. 3).

A second possibility is that the [focus] feature is assigned to the wh-phrase in its first-merge position, and that from this position it is transferred to the left-adjacent finite verb. This type of account is subsumed by the analysis proposed in Calabrese (1982), to be discussed in Sect. 2. This approach makes the same prediction as Marotta’s with respect to (9), but in contrast to the latter, it correctly predicts the NPA placement on the embedded clause verb in (10). In order to test the predictions of this approach, we ran a second prosodic experiment testing the extraction of a wh-phrase from within a noun phrase, as exemplified in (11a) above. More specifically, this approach—similarly to the scope marking view discussed above—predicts that the NPA should be assigned to the lexical noun that is left-adjacent to the first-merge position of the wh-phrase. However, this prediction is not borne out by the results of the second experiment: in producing sentences like (11a), our experimental subjects consistently placed the NPA on the lexical verb, which is not adjacent to the lowest wh-trace (cf. Sect. 4).

Having excluded these two possibilities, in Sect. 5 we advance our analysis, according to which the NPA placement in direct wh-questions is an effect of successive-cyclic movement of the wh-phrase. In a nutshell, we propose that the wh-phrase is endowed with a {wh, focus} feature bundle and shares it with every phase head that structurally intervenes along its movement path. The v phase head thus acquires the feature bundle and, since the traces of the wh-phrase undergo phonological deletion, it qualifies as the rightmost element in the syntactic structure that is endowed with the [focus] feature; accordingly, it is selected for realization of the NPA at the interface with prosody. Since v is incorporated to the lexical verb, the NPA is realized on the latter.

Crucially, in case of long-distance movement, as in (10) above, the wh-phrase moves though the edge of the embedded clause vP, and the embedded clause v becomes eligible for NPA assignment. In the results of our first experiment, we observed that in the majority of cases, the NPA associates with the lexical verb in the embedded clause, as predicted by our analysis. Still, other than this prevailing pattern, with long-distance wh-movement we observed a secondary prosodic pattern in which the NPA falls on the matrix clause verb. In Sect. 6 we discuss this unexpected result.

In Sect. 7 we turn to the motivation for the [focus] feature in the derivation of the wh-chain of direct questions. The received view in the literature is that interrogative wh-phrases are inherently focal, but this view raises the question of why the marked NPA placement is not found in indirect wh-questions, as already noted above: the latter show default placement of the NPA on the rightmost element of the clause (Bocci and Cruschina 2018). For this reason, we explore an alternative solution, based on the idea that interrogative wh-phrases are not inherently focal. Adopting the framework of inquisitive semantics, we suggest that in direct wh-questions, [focus] on the wh-chain is needed in order to generate an existential presupposition that makes the wh-clause contextually uninformative, hence qualifying it as a proper question. As for indirect wh-question, we suggest that uninformativeness is achieved by an extra operator, selected by the matrix clause verb on top of the wh-clause. While this second hypothesis is admittedly stipulative, we believe that it is potentially interesting at the cross-linguistic level, since it opens the possibility of two alternative routes to derive a wh-question, one involving focus and the other not.

In the final section (Sect. 8), we discuss some theoretical consequences of our proposal, concerning the status of focus at the interfaces and the issue of cyclic spell-out at the syntax-phonology interface. In particular, we believe that the data from Italian wh-questions cast doubt on a direct mapping of prosodic prominence into a focal interpretation, and support instead the view that focus is encoded by a syntactic feature that is involved in the successive-cyclic derivation of the wh-chain.

2 NPA assignment in Italian wh-questions: Previous observations and analyses

The distribution of the prosodic prominence in root wh-questions poses formal and conceptual problems for all theories of the relationship between the interpretation of focus and prosodic prominence (see Culicover and Rochmont 1983; Erteshik-Shir 1986; Lambrecht and Michaelis 1998 on English). These problems are described by Ladd (1996:170–174) from a cross-linguistic perspective:

Various recent work on focus and accent […] deal uneasily with the prominence of the WH-words in WHQs. Logic seems to suggest that the WH-word is the focus of the question, and yet, in English at least, the WH-word does not normally bear the most prominent accent (Ladd 1996:170)

Rather than on the wh-phrase itself, which under several accounts should qualify as the focus of the wh-question (cf. Sect. 1), in English the NPA falls by default on the stressed syllable of the last constituent of the sentence. Compare (13a), corresponding to the neutral prosodic pattern, with (13b), which is only possible under highly marked pragmatic conditions (where boldface indicates main prosodic prominence):

  1. (13)
    figure u

Ladd also points out that a different pattern is observed in other languages: the wh-phrase does bear the NPA not only in wh-in-situ languages, but also in some languages with wh-movement such as Romanian, Greek and Hungarian. This is illustrated in (14) for Romanian (from Ladd 1996:227; see also Jitcă et al. 2015; see Alexopoulou and Baltazani 2012 on Greek, and Ishihara 2003 on Japanese wh-in situ):

  1. (14)
    figure v

Two basic typological patterns can thus be identified: in the first pattern, wh-questions follow the same prosodic principles as declaratives, as in English (cf. (13)), that is, the NPA is assigned by default to the rightmost constituent. In the second pattern, by contrast, wh-questions are treated differently from declaratives, in that the NPA associates with the wh-word, as in Romanian (cf. (14)).

Ladd directly compares Romanian and Italian, arguing that while the NPA associates with the wh-word in the Romanian wh-question (15a), it associates with the verb in the Italian equivalent (15b):

  1. (15)
    figure w

From his discussion, however, it is not entirely clear whether Ladd considers Italian as a language that follows the first pattern, like Spanish for instance (see Hualde and Prieto 2015), or if it follows a distinct third pattern: in fact, in his examples the lexical verb coincides with the last element of the clause.

The assignment of the NPA in Italian wh-questions had been independently addressed in Calabrese (1982). Calabrese (1982) pointed out that in Italian root wh-questions with a bare wh-element, the NPA falls on the lexical verb irrespective of its position within the sentence. He accounted for this prosodic pattern by proposing that it results from a direct interaction between the syntactic and the phonological component. More specifically, Calabrese claims that the special NPA assignment of Italian is a consequence of a phonological requirement whereby the wh-phrase and the verb must form a single intonational phrase. In his analysis, wh-phrases are considered to be focal elements and, as such, they receive a focus feature [F] from the verb and must be adjacent to it (see Bianchi et al. 2017, 2018). The phonological group consisting of the verb and the [F]-marked elements forms the main intonational phrase of the sentence.Footnote 9 The rightmost element within this intonational phrase is then assigned the NPA: this is the verb adjacent to the foot (i.e. the lowest trace) of the wh-chain.

In wh-questions, both the head and the foot of the wh-chain are [F]-marked, under the assumption that the wh-phrase in the CP inherits [F] from its trace in the base-generation position and must be adjacent to the verb. Calabrese further assumes that any potential intervener (i.e. a non-[F]-marked element intervening between the verb and the wh-phrase) must be syntactically displaced. When the wh-phrase is extracted from an embedded clause (16a), this intonational phrase (ι) must include the head and the foot of the wh-chain, but also the embedded and the matrix verb, as shown in (16b). Thus, all these elements must be adjacent to one another:

  1. (16)
    figure x

Calabrese’s approach accounts for NPA assignment both in case of short-distance movement like (15b) and cases of long-distance movement like (16) (see also the discussion around (10) above).

Marotta (2001) experimentally investigated the placement of NPA in Italian root wh-questions. Her results independently confirm Calabrese’s observation that under short-distance movement, the NPA is assigned to the lexical verb adjacent to the wh-word. This is particularly evident in an example like (17), where the verb is not the rightmost element of the clause:

  1. (17)
    figure y

Marotta recognizes that this unexpected pattern is highly problematic for an isomorphic relationship between prosodic marking and focal interpretation, and proposes an explanation that relies on the morpho-phonological status of the wh-phrase: she argues that bare wh-words, being weak or clitic the sense of Cardinaletti and Starke (1999), cannot bear the NPA, so that the latter is ‘passed’ on to the closest non-weak element, namely, the following lexical verb.Footnote 10

Although Marotta’s experimental data validate the existence of a third prosodic pattern in Ladd’s prosodic typology for wh-questions, they are not sufficient to definitively determine the relevant mechanism of NPA assignment in Italian. In fact, NPA assignment to the verb in (17) could be due to a shift from the preceding weak wh-element (Marotta’s hypothesis) or to a shift from the phonologically deleted copy in the first-merge position (cf. Sect. 4). Moreover, Marotta did not experimentally test sentences with long-distance extraction, but her hypothesis predicts that in this case as well, the NPA should still fall on the matrix verb, which is adjacent to the wh-phrase.

In the light of these considerations, we decided to investigate with a production experiment the NPA assignment in pairs of direct wh-questions which minimally differ in that the first sentence in each pair involved short-distance movement, and the second one, long-distance movement. This is intended to test the prediction of Marotta’s approach, and to ascertain whether the prosodic interface is sensitive to the derivational path of the wh-phrase.

3 The first production experiment

In order to examine the placement of the NPA in Italian direct wh-questions, we carried out a production experiment. Ten native speakers of Tuscan Italian took part in this experiment (2 men and 8 women, ranging from 22 to 56 y.o.).Footnote 11 The experiment consisted in a reading task and was specifically designed to investigate the effects of movement type (short-distance and long-distance wh-movement) on the distribution of the NPA and main stress.

3.1 Materials

The experimental design included only one experimental factor, ‘movement type,’ with two possible levels, short-distance and long-distance movement. This factor was manipulated within participants and within items. The experimental material consisted of 12 items that we manipulated for the factor ‘movement type,’ so as to obtain 12 target sentences with short-distance wh-movement and 12 target sentences with long-distance wh-movement, for a total of 24 stimuli. All items consisted of a direct wh-question with an embedded complement clause: in the case of short-distance movement, the wh-phrase is an argument of the matrix verb and is therefore extracted from the matrix clause; in the case of long-distance movement, by contrast, the wh-phrase is an argument of the embedded verb and is extracted from the subordinate clause.

The morphosyntactic properties of the sentences were manipulated to unambiguously mark the extraction site of the wh-element. This was achieved by varying the number and person features expressed on the auxiliaries between the two conditions, and by placing disambiguating clitic pronouns. To control for information structure effects, the short- and long-distance movement versions of each item were presented right after the same introductory context. Furthermore, the last constituent of the target sentences was never mentioned in the introductory context, in order to prevent a possible interpretation of the final constituent as right dislocated. Finally, no potential prosodic intervener in Calabrese’s (1982) sense occurred between the verb and the wh-phrase (e.g. a preverbal subject in the embedded clause under long distance extraction).

More specifically, in 6 items, the wh-element corresponds to the matrix subject in the short-distance condition, (cf. (19a)), and to the object of the embedded clause in the long-distance condition (cf. (19b)).

  1. (18)
    figure z
  1. (19)
    figure aa

In the second set of 6 items, the wh-element always corresponds to the indirect argument, which is extracted from the matrix clause in the short-distance condition (cf. (21a)) and from the embedded clause in the long-distance condition (cf. (21b)).

  1. (20)
    figure ab
  1. (21)
    figure ac

Thus, for each item, the two target sentences were phonological near-minimal pairs, although not exactly minimal pairs. Still, the pairs of stimuli were characterized by a similar number of syllables; the main regions of interest for the prosodic analyses (i.e. the wh-element, the lexical verb of the matrix clause, the lexical verb in the embedded clause, and the sentence-final element) perfectly matched across conditions.

3.2 Procedure

24 fillers were added to the 24 experimental stimuli. The fillers were bi-clausal declarative sentences preceded by a context analogous to those used in the experimental stimuli.

The 48 stimuli were presented twice, for a total of 96 trials. The order of the trials was pseudo-randomized, so that fillers and experimental stimuli rigidly alternated. Moreover, to prevent possible carry-over effects, the experiment was divided into 4 blocks. Each block included 12 experimental stimuli: 6 stimuli under the long-distance condition and 6 stimuli under the short-distance condition, along with 12 fillers. Within a single block, each item was presented only once, under a single condition. The procedure guaranteed that the two versions of the same item were not presented within the same block, and that the two occurrences of the same stimulus were assigned to two non-consecutive blocks. For instance, if (21a) was presented in the first block, it was presented again in the third block, while (21b) appeared in the second and the forth block. The blocks were separated by pauses, from 5 to 15 minutes, depending on the participant. The entire experiment lasted on average between 70 and 90 minutes. The recording sessions took place individually in a quiet room in Siena (Italy).Footnote 12

Participants were asked to produce each stimulus twice. Since each stimulus was presented in two distinct blocks, we thus obtained a total of 4 repetitions per stimulus. Speakers did not receive any kind of feedback concerning the sentences produced. In few cases, however, the speakers spontaneously asked to repeat a sentence since they judged it unnatural or it was marked with clear segmental disfluencies. In these cases, we allowed the speaker to repeat the sentence and we discarded the first production.

From the sentences collected, we analysed a total of 478 target sentences, i.e. 10 speakers * 12 items * 2 conditions (short vs. long) * 2 disfluency-free repetitions.Footnote 13 The phonetic analyses were carried out using Praat (Boersma and Weenink 2018). The sentences were entirely segmented into phonemes by the first author, and intonationally transcribed independently by the first and third author, adopting a ToBI-like transcription system, within the theoretical framework of the Autosegmental-Metrical Theory of intonation (Beckman and Pierrehumbert 1986; Ladd 1996).

In order to transcribe the NPA, we followed the definition proposed in Gili Fivela et al. (2015:156), according to which the NPA in Italian should identified as “rightmost fully-fledged pitch accent within an intermediate or intonational phrase” (see also Grice et al. 2005). While in English the NPA is generally defined as the rightmost PA in the prosodic phrase after which only edge tones can occur, such a purely positional definition proved to be inadequate for Italian since in several Italian varieties the NPA may be followed by subordinate (postnuclear) PAs, which are realized with a very compressed pitch range (see D’Imperio 2002; Grice et al. 2005). In our data, in line with other findings on Tuscan Italian, however, no postnuclear compressed PAs were observed (Bocci 2013) so that the NPA we identified corresponded to the rightmost PA.

Comparing the individual prosodic transcriptions revealed that the agreement between the two annotators was almost perfect for identifying the location of the NPA. We obtained 97.7% of raw agreement (Cohen’s Kappa=.95 for nominal values).Footnote 14 As for the few cases of disagreement, the first author’s transcriptions were retained for the analyses.

From the annotated data, we automatically extracted for the stressed vowel of the lexical verb in the matrix and the embedded clause the values of duration and F0 standard deviation.

3.3 Results

All the wh-questions we analysed are phrased into a single intermediate phrase and a single intonational phrase that overlap. The perceptual analysis and the instrumental analysis of F0 did not reveal any presence of edge tones separating the sentences into distinct intermediate phrases. Given this phrasing, the NPA we identified in our data corresponds at once to the most prominent PA in the intermediate phrase and the intonational phrase.

The distribution of the NPA observed in our data is illustrated in Fig. 1.

Fig. 1
figure 1

First prosodic experiment: NPA distribution in root wh-questions with short- and long-distance wh-movement

In both conditions, the NPA is never (0%) assigned to the rightmost element of the sentence, which is the default position for NPA assignment in Italian declaratives. This confirms that Italian does not follow Ladd’s first prosodic pattern. The association of the NPA with the wh-phrase is very marginal (less than 2%, independently of the type of movement), showing that Italian does not follow Ladd’s second pattern either.

As expected, the NPA virtually always falls on a lexical verb. However, there is a clear asymmetry between the long-distance and the short-distance condition. The NPA associates with the matrix verb in 96.6% of the cases under short-distance movement, and in 37.3% of the cases under long-distance movement. The NPA associates with the embedded lexical verb in less than 2% of the cases under short-distance movement, but in 61% of the cases under long-distance movement.

These experimental findings show that when the wh-element is extracted from the matrix clause via short-distance movement, as in (19a) and (21a), the NPA is virtually never assigned to the lexical verb of the embedded clause (only 1.7%), and is consistently assigned to the lexical verb of the matrix clause. Figure 2 illustrates a pitch contour of a sentence produced with this pattern.

Fig. 2
figure 2

First prosodic experiment. Pitch contour of an utterance produced after (19a): wh-question with short-distance movement. NPA associated with the lexical verb in the matrix clause

By contrast, with long-distance movement, as in (19b) and (21b), the NPA is much more likely to be associated with the lexical verb of the embedded clause (cf. Fig. 3.1), although in a minority of cases it is assigned to the matrix lexical verb (cf. Fig. 3.2).

Fig. 3a
figure 3

First prosodic experiment. Pitch contour of an utterance produced after (19b): wh-question with long-distance movement. NPA associated with the lexical verb in the embedded clause

Fig. 3b
figure 4

First prosodic experiment. Pitch contour of an utterance produced after (19b): wh-question with long-distance movement. NPA associated with the lexical verb in the matrix clause

We statistically tested the NPA distribution observed in our transcriptions. For all the statistical analyses, we only took into consideration the cases in which the NPA is assigned to a lexical verb, discarding the residual cases in which the NPA was assigned to the wh-phrase (8 datapoints, corresponding to 1.7% of the total observations). This allowed us to reduce the association site of the NPA to a binary variable: NPA on the lexical verb in the matrix or in the embedded clause. We then built a multi-level mixed effects regression with the log odds of NPA on the embedded lexical verb as the dependent variable, using the package lme4 in R (Bates et al. 2014). We specified ‘movement type’ (short- vs. long-distance movement) as a fixed factor. The error structure included crossed by-participant and by-item random intercepts and slopes.

The model showed that the NPA is significantly more likely to fall on the embedded verb in the long-distance condition than in the short-distance condition: Estimate = 6.2768, Std. Error = 1.8642, z-value = 3.367, p<.001.Footnote 15 Thus, the analysis of the intonational transcriptions clearly shows that NPA distribution differs between the two experimental conditions.

3.4 Phonetic analyses

In order to support the NPA-distribution analysis based on the phonological transcriptions, we carried out quantitative phonetic analyses of F0 and of vowel duration. We will go back to the main argument in Sect. 3.5.

3.4.1 Phonetic analyses of F0

Assuming that the NPA corresponds to the rightmost pitch accent after which the pitch contour is compressed (Gili Fivela et al. 2015), we reasoned as follows:

  1. (a)

    If the NPA is virtually always associated with the lexical verb of the matrix clause in the short-distance condition, then in this condition, the stressed vowel of the embedded verb should be characterized by a post-focal F0 contour, that is a very compressed (i.e. nearly flat) contour in which no fully-fledged pitch accent occurs.

  2. (b)

    In the long-distance condition, by contrast, we observed two distinct patterns: a prevailing pattern in which the NPA is assigned to the lexical verb of the embedded clause, and a secondary pattern in which the NPA is assigned to the lexical verb of the matrix clause (cf. Fig. 1). We therefore expect that the amount of F0 movement on the stressed vowel of the embedded lexical verb should be overall greater in the long-distance condition than in the short-distance condition. To give an example, the F0 excursion realized on the stressed vowel of present[a]re ‘introduce’ (cf. ‘V2’) should be greater in Fig. 3.1 than Fig. 2.

To test this hypothesis, we computed the values of standard deviation for F0 (in semitones, normalized over duration) for the stressed vowel of the embedded lexical verb. F0 standard deviation directly quantifies the amount of F0 protrusion: a higher F0 standard deviation corresponds to a higher degree of F0 movement.

We took F0 standard deviation values as a rough estimate of the presence of a pitch accent, although micro-prosodic consonantal effects may introduce a certain amount of noise in the values, and the presence of a tonal specification does not necessarily imply F0 protrusion. We then entered these values as dependent factors in a linear mixed effects model with crossed by-item and by-participant random intercepts and slopes, and ‘movement type’ as a predictor. P-values were obtained via the package lmerTest (Kuznetsova et al. 2017) with Satterthwaite’s approximation.

The model revealed that the F0 standard deviation values on the stressed vowel of the embedded lexical verb, corresponding to the amount of F0 movement, are significantly higher in the long-distance condition than in the short-distance condition: Estimate = .313, Std. Error = .106, t value = −2.939, p = .014. To put it differently, the pitch contour on the stressed vowel of the embedded verb is significantly “flatter” when the wh-element is extracted from the matrix clause than when it is extracted from the embedded clause.

The estimated values and their confidence intervals were extracted from the model via the effects package (Fox and Weisberg 2019) and plotted in Fig. 4. This finding, directly based on a quantitative analysis of F0, independently confirms the conclusion that the intonational contours differ across the two syntactic conditions, supporting thus the results from the analysis of the phonological transcriptions.

Fig. 4
figure 5

First prosodic experiment. Estimated F0 standard deviation values in semitones (st.) for the stressed vowel of the lexical verb in the embedded clause across type of syntactic movement (long-distance vs. short-distance movement)

The previous analysis of the F0 standard deviation values on the stressed vowel of the embedded verb used the factor ‘movement type’ as a predictor. Considering the analysis of NPA distribution based on the annotations, however, we should be able to specify a better model to predict the amount of F0 movement. In fact, while we observed that the NPA is virtually always assigned to the matrix verb in the short-distance condition, two patterns occur in the long-distance condition. For the prevailing pattern, in which the NPA is assigned to the embedded verb, we expect higher F0 standard deviation values—corresponding to the F0 movement realizing the NPA—than in the sentences realized with the secondary pattern, in which the NPA is assigned to the matrix verb; in this subset of long-distance extractions, the embedded verb should be characterized by a post-focal contour, analogously to what we observe in the short distance condition. Therefore, for this subset of sentences with long-distance movement, the F0 standard deviation values should be as low as in the short-distance condition, since in both cases the NPA is assigned to the matrix lexical verb.

In this sense, modelling F0 movement as a function of the wh-extraction site implies collapsing into the same ‘long-distance movement’ condition sentences that are characterized by distinct phonological properties according to our transcriptions. If our phonological interpretation of the pitch contours is valid, we then expect to obtain a better model for F0 movement if we use as a predictor the location of NPA identified in our transcription, rather than the factor ‘movement type.’

Model comparison via likelihood ratio test (see Pinheiro and Bates 2000) proved that this hypothesis is correct. First, we built a complex model (via maximum likelihood) for the F0 standard deviation values with crossed by-participant and by-item intercepts. This complex model included two predictors: ‘movement type’ and ‘NPA location’ (with two levels: NPA on the matrix vs. the embedded verb). We then compared the complex model with two nested models that included only one fixed factor each. While the complex model explains the data better than the nested model with the factor ‘movement type’ only (\(\chi^{2} (1)=15.469\), p<.001), the complex model is equivalent to the nested model with the factor ‘NPA location’ (\(\chi^{2} (1)=1.3803\), p<.05). In other terms, adding the predictor ‘movement type’ to a model that already includes ‘NPA location’ does not help in accounting for the dependent variable, since the explicative power of the former is entirely encompassed by the latter.

We report the results of the model with ‘NPA location’ as unique fixed factor: Estimate = .4790, Std. Error = .120, t value = −3.98, p<.003. For this model, the best error structure justified by the data included crossed by-participant and by-item intercepts and by-participant slopes. The estimated values and their confidence intervals are plotted in Fig. 5.Footnote 16

Fig. 5
figure 6

First prosodic experiment. Estimated F0 standard deviation values in semitones for the stressed vowel of the lexical verb in the embedded clause as a function of NPA placement (lexical verb in the matrix clause vs. in the embedded clause)

In conclusion, the first phonetic analysis of F0 standard deviation values as a function of ‘type of movement’ provided independent evidence in favour of the conclusion that the pitch contours overall differ between the two syntactic conditions. This finding—obtained with no reference to our phonological interpretation of the pitch contour—is consistent with the results obtained from the analysis of NPA distribution based on our annotations. Furthermore, we showed that F0 contours can be better understood by taking into consideration the transcriptions. In fact, the NPA location turned out to be a better predictor for the amount of F0 movement on the stressed vowel of the embedded verb. This quantitative result supports the validity of our annotations and corroborates our analysis of NPA placement.

3.4.2 Phonetic analyses of vowel duration

Having discussed the intonational aspects of the data, let us now briefly consider the metrical dimension. In our transcriptions, the element associated with the NPA is always the most prominent at the metrical level: its stressed vowel and syllable are characterized by perceivable lengthening and hyper-articulation. In other words, if our interpretation is correct, the location of NPA and main sentence stress always coincide in our data: main stress is systematically assigned to the matrix lexical verb in the short-distance condition, while in the long-distance condition it is mostly assigned to the embedded verb, although it may occur also on the matrix verb.

In order to provide quantitative evidence in support of our metrical annotations, we extracted and analysed the duration values of the lexically stressed vowels for the lexical verb in the matrix clause and in the embedded clause. Recall that the sentences are near-minimal pairs: we can thus legitimately compare the duration values within items. The very same line of reasoning we have discussed for the F0 standard deviation values can be applied to the duration analyses.

Let us first consider the length of the stressed vowel of the matrix lexical verb, that is the stressed vowel indicated as ‘V1’ Fig. 2, Fig. 3.1 and Fig. 3.2. We first investigated whether the length of this vowel differs as a function of ‘movement type’. We thus constructed a mixed regression model with the duration values of this stressed vowel (in ms.) and ‘movement type’ as a predictor. The error structure included crossed by-participant and by-item intercepts and slopes. The model showed that the stressed vowel of the matrix verb in the short-distance condition is significantly longer (around 17 ms. on average) than in the long-distance condition: Estimate 16.519, Std. Error = 1.871, t value = 8.831, p<.001. This result, which is completely independent from our phonological annotations, proves that the type of syntactic movement has a significant impact on the distribution of the metrical heads.

It is worth pointing out that the verb in the matrix clause always bears a pitch accent in our data: even when the NPA is assigned within the embedded clause, the matrix verb associates with a prenuclear pitch accent (cf. Fig. 3.1). Consequently, this duration analysis does not compare unaccented stressed vowels vs. stressed vowels bearing the NPA, but two groups of stressed vowels that, by hypothesis, both associate with pitch accents, although of a different hierarchical level: prenuclear vs. nuclear. This suggests that the vowel lengthening effect observed in the short-distance condition does not merely result from the need to accommodate the F0 trajectory that implements an intonational specification, since the vowel is accented in both conditions. In our opinion, the observed difference in vowel duration is a very plausible reflex of main sentence stress (cf. Bocci and Avesani 2011, 2015).

However, if our phonological interpretation is correct, the factor ‘movement type’ is not the best predictor for the duration values. As we have mentioned, in our transcription the NPA and sentence stress always coincide: this means that while main stress is nearly always assigned to the matrix clause in the case of short-distance movement, in the case of long-distance movement, sentence stress may appear either on the embedded verb (the prevailing pattern) or on the matrix verb (the secondary pattern): therefore, as already discussed for the F0 standard deviation values, the factor ‘movement type’ collapses two distinct patterns into the ‘long-distance movement’ condition. If this interpretation is correct, we should be able to obtain a better model for the duration of the stressed vowel by using our phonological transcriptions as a predictor. Model comparison showed that this hypothesis is correct: the explicative power of the factor ‘movement type’ is completely subsumed by the factor ‘NPA location’.Footnote 17

The best model justified by the data for the duration values (in ms.) of the stressed vowel in the matrix verb with ‘NPA location’ as a fixed factor included crossed by-participant and by-item intercepts and slopes. It revealed that the stressed vowel of the lexical verb in the matrix clause is significantly longer (26 ms., on average) when the NPA is transcribed as associated with the matrix rather than with the embedded verb: Estimate 25.573, Std. Error = 3.153, t value = 8.111, p<.001. The estimated values are plotted in Fig. 6.

Fig. 6
figure 7

First prosodic experiment. Estimated duration values (ms.) for the stressed vowel of the lexical verb in the matrix clause as a function of NPA placement (lexical verb in the matrix clause vs. lexical verb in the embedded clause)

For the duration values of the stressed vowel of the embedded lexical verb, we obtained the very same results, but with the lengthening effect in the opposite direction. Independently of any phonological consideration, the factor ‘movement type,’ taken as the only predictor, has a significant impact on the length of the stressed vowel of the embedded verb: the vowel in the long-distance condition is significantly longer (20 ms. on average) than in the short-distance condition (Estimate 20.413, Std. Error = 7.592, t value = −2.689, p<.024). However, model comparison indicates once more that the factor ‘NPA location’ is more explicative than ‘movement type’ since the predictive power of latter can subsumed by the former.Footnote 18

The model fitted with ‘NPA location’ as a fixed factor showed that the embedded lexical verb is characterized by a stressed vowel that is significantly longer (38 ms, on average) when this element is annotated as associated with NPA (and sentential stress): Estimate 37.931, Std. Error = 8.369, t value = 4.533, p = .001. The estimated values are plotted in Fig. 7.

Fig. 7
figure 8

First prosodic experiment. Estimated duration values (ms.) for the stressed vowel of the lexical verb in the embedded clause as a function of NPA placement (lexical verb in the matrix clause vs. lexical verb in the embedded clause)

Like the phonetic analysis of the pitch movement, the phonetic analysis of duration values shows that the extraction site of the wh-element, from the matrix vs. the embedded clause, significantly correlates with lengthening effects observed on the stressed vowel of the matrix vs. the embedded verb. Notably, this result is completely independent of our phonological interpretation of the metrical structure. Therefore, we have provided evidence that the duration values can be better modelled on the basis of our phonological interpretation.

3.5 Assessment of the results

In conclusion, our evidence proves that the placement of the NPA and of main stress is sensitive to the derivational history of the wh-element: the NPA can be assigned to the embedded verb only if the wh-element has been extracted from the embedded clause via long-distance movement. In the same condition, however, NPA and main stress may be assigned to the matrix lexical verb in a non-marginal number of cases (37.3%); we will return to this secondary pattern in Sect. 6.

We close this section with two considerations on the theoretical implications of these experimental findings. First, the distribution of the NPA in the two experimental conditions confirms Ladd’s intuition that in wh-questions there is no direct correlation between NPA and focal interpretation: the NPA systematically falls on to the verb (either matrix or embedded), but this is by no means interpreted as focal. As mentioned above (cf. Sect. 1.2), the non-default assignment of NPA and main stress to the lexical verb in non-final position cannot be imputed to a constraint that prevents given constituents from being accented: the constituent following the verb is not informationally given. The non-default prominence assignment must be attributed to some type of Stress Shift mechanism. This predicts a narrow focus interpretation of the lexical verb. However, the contexts that introduced the target sentences did not support this interpretation; moreover, the same contexts were used for the short-distance and the long-distance conditions: the different prominence patterns observed in the two conditions cannot be ascribed to the context, since the context was identical in the two conditions. This point will be corroborated by the results of the second experiment (cf. Sect. 4.1).

Second, the predominant assignment of the NPA on the embedded verb in the long-distance movement condition (61.07%) is inconsistent with Marotta’s hypothesis that the NPA falls on to the lexical verb that is immediately adjacent to (the final landing site of) the wh-phrase.

Recall from Sect. 1.1 that a main issue in the literature is whether focus prominence and focal interpretation are mediated by a syntactic focus feature. Our findings support a positive answer, since in Italian direct wh-questions, there is no direct association between prosodic prominence on the verb and a focal interpretation of the latter. Moreover, the fact that NPA assignment is sensitive to the extraction site of the wh-phrase suggests that the [focus] feature is initially associated with the wh-phrase itself.

4 The second production experiment

A possible account of the NPA distribution in Italian direct wh-questions relies on the first-merge (i.e. external merge) position of the wh-phrase. Suppose that the wh-phrase bears the [focus] feature in its first-merge position: when it moves to the edge of vP, the lower copy in the first-merge position undergoes phonological deletion: [ whP ... [ V <whPFocus > PP]]. We can then hypothesize that since a phonologically deleted copy cannot bear the NPA, the latter is shifted to the closest phonologically realized element, that is, the lexical verb adjacent to the deleted copy. At the conceptual level, this hypothesis implies that a phonologically null element such as a deleted copy (a trace, in older terminology) is somehow visible to the phonological component for the shaping of the prosodic structure, and in particular, for prominence assignment, in contrast to what is commonly assumed (see, e.g. Nespor and Vogel 1986).

In order to test the first-merge hypothesis, we conducted a second production experiment in which we systematically compared two configurations: one in which the first-merge position of the wh-phrase is adjacent to the verb, and one where it is not. This second configuration was obtained by extracting the wh-phrase from within the direct object of the verb: in this way, the closest element to the deleted copy was the selecting noun head, as schematically represented in (22). For the reasons discussed above, a sentence final PP was inserted so that the deleted copy did not coincide with the rightmost element of the clause:

  1. (22)
    figure al

The first-merge hypothesis predicts that in (22) the NPA should be assigned to the nominal head N, that is, the phonologically non-null element adjacent to the first-merge position of the wh-phrase.

4.1 Procedure and materials

Eight native speakers of Tuscan Italian,Footnote 19 6 women and 2 men ranging from 21 to 37 y.o., took part in this second production experiment, which involved a reading task. We tested two experimental conditions, corresponding to the extraction site of the wh-element:

  1. (i)

    wh-extraction of a nominal complement, corresponding to configuration (22) (cf. (24a));

  2. (ii)

    wh-extraction of a verbal complement (cf. (24b)).Footnote 20

  1. (23)
    figure am
  1. (24)
    figure an

The factor ‘wh-extraction site’ was manipulated within participants and within items. The experimental stimuli consisted of 7 items that we manipulated for the binary factor ‘wh-extraction site.’ We obtained thus 14 experimental stimuli. A pair of stimuli is exemplified in (24). The sentences were near-minimal phonological pairs, since they only differed with respect to the monosyllabic preposition of the wh-element. In 6 items out of 7, the noun phrase following the verb was indefinite (as this favours extraction of the noun complement).

The experimental stimuli were presented within short dialogues between two fictional characters (A and B). Participants were asked to read the entire dialogue, taking the role of both characters alternately. In order to control for information structure, the two experimental versions of each item were presented in the same dialogue, e.g. both (24a) and (24b) were introduced by the dialogue in (23).

As in the first prosodic experiment, the 14 experimental trials, along with14 filler trials, were presented twice to the participants (for a total of 56 trials). To prevent carry-over effects, the total number of 28 experimental trials was divided into 4 blocks and pseudo-randomized following the same procedure described in Sect. 3.2 for the first experiment. More specifically, each block included 3 or 4 experimental trials with wh-extraction of a nominal complement and 3 or 4 experimental trials with wh-extraction of a verbal complement, alongside 7 fillers, for a total of 14 trials per block. Within a block, each experimental item was presented only once, under a single condition. The condition under which an item was presented alternated across subsequent blocks.Footnote 21 The order of the trials was shuffled within each block. Filler and experimental trials rigidly alternated through the entire experimental. The blocks were separated by short pauses, 5 to 10 minutes. On average, the experiment lasted between 45 and 60 minutes. The recording sessions took place individually in a quiet room in Siena (Italy).Footnote 22 As in the first experiment, again, speakers were asked to repeat each stimulus twice. Since each stimulus was presented twice, we obtained a total of 4 repetitions for each stimulus. Of the sentences recorded, we analysed 321 target sentences: 7 items ∗ 8 speakers ∗ 2 conditions ∗ 3 disfluency-free repetitions.Footnote 23

The sentences were segmented into phonemes and intonationally transcribed. In particular, we annotated the location of the NPA. As in the first experiment, we labelled as NPA the rightmost pitch accent after which the pitch contour is completely compressed and no subsequent fully-fledged pitch is observable (Gili Fivela et al. 2015).

4.2 Results: NPA distribution

In our transcriptions, the NPA is distributed as summarized in Fig. 8.

Fig. 8
figure 9

Second prosodic experiment. Distribution of NPA in wh-questions with extraction of noun complement vs. verb complement

These results show that in the case of extraction of a noun complement, NPA assignment never targets the noun head which selects the wh-phrase (0%): again, the NPA falls on the lexical verb, even though this is not adjacent to the first-merge position of the wh-phrase. Crucially, in most of the items (6 out of 7), the lexical verb was followed by a non-presuppositional indefinite object and an additional prepositional phrase: this rules out an alternative interpretation of our results according to which the systematic assignment of the NPA to the verb would result from right dislocation of the postverbal constituents coupled with the default prominence assignment to the rightmost non-right dislocated element (cf. the discussion of (12)).

The pitch contour of a sentence produced under the first condition (24a) is illustrated in Fig. 9, where an H+L* NPA aligns with the stressed syllable of the verb and the pitch contour of what follows, including the selecting noun head, is characterized by post-nuclear compression.

Fig. 9
figure 10

Second prosodic experiment. Pitch contour of an utterance produced after (24a): with extraction of a nominal complement

A very similar intonational contour is realized under the second syntactic condition, that is, when the wh-phrase is the complement of a verb (24b), cf. Fig. 10.

Fig. 10
figure 11

Second prosodic experiment. Pitch contour of an utterance produced after (24b): with extraction of a verbal complement

This similarity shows that NPA assignment is not sensitive to the first-merge position of the wh-phrase: in both conditions, the NPA predominantly—if not exclusively—falls on the lexical verb (96% with extraction of a noun complement, and 97% with extraction of a verbal complement). This allows us to reject the first-merge hypothesis.

Note that the first-merge account presupposes a syntactic [focus] feature, but is consistent with the assumption that it is not involved in a syntactic cyclic derivation. In the next section, we propose an analysis that substantially relies on the role of the [focus] feature in the successive-cyclic syntactic derivation.

5 At the syntax-prosody interface: A successive cyclicity account

Ever since Chomsky (1973, 1977), it has been assumed that long-distance movement crossing clause boundaries does not take place in one fell swoop, but consists of a sequence of smaller movement steps. In more recent years, this constraint has been defined in terms of the notion of phase. A phase is a minimal chunk of syntactic computation consisting of:

  1. (i)

    an internal domain, comprising a lexical head and possibly other syntactic objects first merged with it;

  2. (ii)

    a phase edge, consisting of a head H that selects for the internal domain, and one or more elements either first merged or attracted from within the internal domain by the movement-attracting features of H.

The locality of movement is captured by the Phase Impenetrability Condition (25), stating that only elements merged into (or attracted to) the edge of a phase are available for further computation in the next phase up:

  1. (25)
    figure ar

The phase heads are, by hypothesis, v (defining a domain of predication) and C (defining a clausal domain). Following this approach, we assume that wh-movement proceeds through the edge of every vP and CP between the first-merge position and the final landing site of the wh-phrase. This is illustrated in (25) with an example of long-distance movement:

  1. (26)
    figure as

More specifically, we assume that in a direct question, the wh-phrase bears a {wh, focus} feature bundle (we will return to this feature combination in Sect. 7, where we will also provide a motivation for it). Furthermore, we assume that the same bundle is borne by every probing phase head (v° or C°) that the wh-phrase crosses on its way to the final landing site. It is immaterial for our purposes whether the whole feature bundle constitutes the probe, or rather the [focus] feature is transmitted by the wh-phrase to the phase head via dynamic agreement in the sense of Rizzi (1996).

In addition to feature-based cyclic movement though the phase edges, we also need a set of syntax-prosody interface principles that determine how a syntactic structure is mapped onto a prosodic structure. We will specify here only the ingredients that are strictly necessary for our analysis, namely a set of principles and rules that determine an algorithm for NPA assignment:

  1. i.

    The NPA must be assigned to an element that is phonologically overt (and non-clitic; see Calabrese 1982; Nespor and Vogel 1986). Thus, among the wh-copies in a wh-movement chain, only the highest copy is eligible for NPA assignment, the lower ones being subject to phonological deletion.

  2. ii.

    When the syntactic structure contains one or more occurrences of the [focus] feature, the NPA must be assigned to a syntactic element that is marked with this feature (irrespective of whether the feature is interpretable or not on that element).

  3. iii.

    The NPA is assigned to the rightmost element that satisfies (i) and (ii). If the sentence does not contain any occurrence of the [focus] feature, the NPA is assigned to the rightmost element by default (see Katz and Selkirk 2011).

In other words, the algorithm that we propose assumes that, at the interface with prosody, the NPA is assigned to the rightmost occurrence of the {wh, focus} feature bundle on a phonologically visible element, if there is one. Importantly, the prosodic computation does not differentiate between interpretable and uninterpretable instances of the [focus] feature for the purposes of NPA assignment.

Short-distance wh-movement as in (19a), repeated here as (27), is analysed as in (28):Footnote 24

  1. (27)
    figure at
  1. (28)
    figure au

The wh-phrase starts from within the vP of the matrix clause and shares its [focus] feature with the matrix phase heads v° and C°.Footnote 25 Since traces are phonologically deleted, by (i) they are not possible targets for NPA assignment. The rightmost phonologically realized position that is specified for the [focus] feature is the v° in the matrix clause (even though the feature here is only a reflex of successive cyclic movement through the phase edge). This head incorporates the matrix lexical verb, so the NPA is associated with the latter.Footnote 26 Crucially, the v° and C° heads of the embedded clause do not bear the [focus] feature, and hence do not qualify for NPA assignment.

By contrast, in cases of long-distance movement like (21b), repeated as (29), the wh-element is cyclically extracted from the vP of the embedded clause and, on its way to the CP of the matrix clause, it shares its {wh, focus} bundle with the head of each higher phase, as schematically represented in (30):

  1. (29)
    figure av
  1. (30)
    figure aw

As a result, the embedded clause v° is the rightmost element that is endowed with the [focus] feature and is also phonologically contentful: the NPA is thus associated with the lexical verb of the embedded clause.

This analysis predicts that an embedded verb can be associated with the NPA only in the case of long-distance movement, inasmuch as the wh-phrase must move through the edge of the embedded vP phase. This accounts for the prevailing pattern of NPA assignment in our long-distance stimuli; we will return in Sect. 6 to the less common pattern in which, despite long-distance movement, the NPA is assigned to the matrix verb.Footnote 27

Note that it is crucial for this analysis that the same {wh, focus} bundle be involved in all the movement steps. If intermediate steps only involved a general edge feature on the phase heads, we could not account for the fact that v° qualifies for NPA assignment (since the same edge feature would presumably be involved in other movement dependencies which do not affect NPA assignment, e.g. relativization).Footnote 28

6 The optionality problem

The principles for NPA assignment that we have proposed predict that in case of long-distance movement, the lexical verb of the embedded clause invariably qualifies for NPA assignment, since it incorporates the rightmost pronounced syntactic element endowed with the {wh, focus} bundle (i.e. the embedded v°). However, the data reported in Fig. 1 show that in about 37% of instances of long-distance movement, the NPA falls on the verb of the matrix clause, a possibility that is exemplified in (31) (cf. also the corresponding pitch contour in Fig. 3.2 above):

  1. (31)
    figure bf

One possibility that we cannot totally exclude at the present stage is that in reading the long-distance items, the participants may have experienced some sort of parsing difficulties analogous to a garden-path effect. As is well known, in parsing a wh-dependency subjects postulate the gap as early as possible, i.e. as soon as a gap position is licensed by subcategorization (Clifton and Frazier 1989). Although in our long-distance items all the argument positions subcategorized in the matrix clause were saturated, and hence no gap for the wh-dependency could be grammatically postulated, it may be the case that at the stage when they planned the prosodic pattern of the sentence they were reading, the participants were expecting an early resolution of the dependency within the matrix clause; this may have led them to anticipate an NPA placement on the matrix verb. If this were the case, the productions in which the NPA falls on the verb of the matrix clause under long-distance extraction should be interpreted as an instance of systematic observational error. In order to verify this possibility, in future work we intend to test the production of long-distance wh-dependencies featuring only matrix verbs that disallow a matrix clause interpretation for the wh-phrase: if the less frequent pattern is due to a garden-path effect, we predict that in such items the NPA should be consistently realized on the embedded verb. Pending further investigation, however, we tentatively assume for the time being that the less frequent pattern constitutes a grammatical option requiring explanation at the grammatical level.

One relevant consideration is that in well-studied cases of successive cyclicity effects under long-distance movement, the effect can be suspended in the embedded clause(s). For instance, Torrego (1984) reported that in Spanish, wh-movement triggers subject inversion not only in the clause containing the final landing site of the wh-phrase, but also in the lower clauses from which the wh-phrase is extracted, as exemplified in (32) (the inverted subjects are in italics). However, subject inversion is mandatory only in the highest clause, and optional in the lower clauses, as exemplified in (33).

  1. (32)
    figure bg
  1. (33)
    figure bh

Torrego argued that subject inversion in a clause is triggered by wh-movement though the local CP, and assumed that in (33), long-distance movement could skip the CP of the embedded clause. This view, however, is not consistent with the Phase Impenetrability Condition (25), whereby no phase edge can be skipped along the movement path.

Concerning successive cyclic movement, three different views have emerged in the minimalist literature. According to the first view, intermediate movement steps are triggered by an (uninterpretable) instance of the very same “substantive” feature (e.g. [wh]) that attracts the moved element to the final landing site (McCloskey 2002). According to a second view, intermediate movement steps are instead triggered by a general edge feature (Chomsky 2008) distinct from the “substantive” active feature on the moved phrase (see in particular Georgi 2017). Finally, in the most recent approach based on the Labeling Algorithm of Chomsky (2013), intermediate steps of movement are not feature-driven at all, but they are triggered by the need to get rid of a symmetric configuration: in the intermediate links, the moved phrase and its phrasal sister do not share any relevant feature that can label the mother node, and neither of the two can be selected as closer to the mother node, resulting in an unlabelled configuration. Movement solves the labelling problem by making the copy of the moved phrase invisible to the Labeling Algorithm.

In all of these approaches, a basic distinction is drawn between intermediate movement steps and the last step targeting the final landing site. This distinction is not helpful for our data, because according to our analysis, in a sentence such as (31) the [focus] feature seems not to be visible on the v head of the embedded clause, whose edge hosts an intermediate chain link; yet it is visible on the v head of the matrix clause, whose edge hosts another intermediate link.

The only solution that we can envisage is in terms of partial deletion of the copies of the wh-phrase within the movement chain. As a starting assumption, we decompose a bare wh-phrase like chi ‘who’ into a wh-determiner and a silent restriction:

  1. (34)
    figure bi

The wh-chain for our example (31) can be schematically represented as follows (with the lower copies indicated between angled brackets):

  1. (35)
    figure bj

Building on Reinhart (1997:377-379), we assume that the wh-determiner in the highest chain link is interpreted as an existential quantifier binding a choice function (CF) variable, whereas its lowest copy in interpreted as the bound CF variable.Footnote 29 The CF variable must compose with a set-denoting expression; therefore, the lowest copy of the wh-determiner must have as sister the NP restriction, which denotes a set. On the other hand, the higher occurrences of the NP constituent undergo selective deletion (here and below strikethrough indicates syntactic deletion of parts of a chain link:

  1. (36)
    figure bk

Note now that the lowest copy of the <D NP> complex need not be in the first-merge position of the wh-chain: in fact, the <D NP> complex has an entity-type denotation, and it can bind an entity-type variable. Following Heim (1987) and Frampton (1991), we assume that an entity-type variable corresponds to a coindexed empty category from which both the content of D and of the NP restriction have been syntactically deleted:

  1. (37)
    figure bl

Specifically, we assume that in a phase-by-phase derivation, when the wh-phrase is internally merged in a phase edge, the copy in the edge of the immediately lower phase can undergo deletion of its internal content.

With these mapping principles in place, it is possible to delete the content of all the copies from the first-merge position and other intermediate chain links. Crucially for our purposes, at least the copy immediately below the final landing site (i.e. the one in Spec,vP1) must be preserved, in order to provide the CF variable to be bound from CP (and the accompanying NP restriction). Thus, the following wh-chains are all licit at the C-I interface:

  1. (38)
    figure bm

We assume also that when the copy in a phase edge undergoes deletion, the agreeing {wh, focus} bundle is also deleted on the corresponding phase head. It follows that in (38d), the \({\mbox{C}_{2}}^{\circ}\) and \({\mbox{v}_{2}}^{\circ}\) heads of the embedded clause will have the {wh, focus} bundle deleted before being transferred to the PF interface. Consequently, the rightmost element that bears {wh, focus} feature bundle is the \({\mbox{v}_{1}}^{\circ}\) head of the matrix clause, and the latter is selected for NPA assignment. In (38c), the feature bundle is deleted from the embedded v°, but since the embedded C° and the trace in its Spec are phonologically null, the rightmost phonologically contentful element that bears the feature bundle is again the matrix v°. This accounts for the minority pattern of NPA assignment in case of long-distance movement. On the other hand, in (38a) the feature bundle is preserved on the embedded v° and this is selected for NPA assignment, accounting for the prevailing pattern.

Although admittedly rather baroque, this system does make one testable empirical prediction. Following Frampton (1991) and Rizzi (2001b), only a chain involving an entity-type variable can cross a weak island: we therefore predict that extraction from a weak island should block NPA assignment to the lexical verb that is adjacent to the first-merge position of the wh-phrase. This prediction remains to be verified in future research.

7 The {wh, focus} feature bundle

In the preceding discussion, we have been assuming a syntactically active {wh, focus} feature bundle in Italian direct wh-questions. In this section we review some syntactic evidence suggesting that the [focus] feature is bundled with the [wh] feature in a single probe, given the impossibility of a narrow focus co-occurring with the wh-phrase. We then discuss a possible semantic motivation for such bundling which, contrary to much previous literature, does not assume that wh-phrases are inherently focal. In fact, such bundling does not obtain in Italian indirect wh-questions, which can unproblematically host a narrow focus distinct from the wh-phrase.

7.1 Syntactic and prosodic evidence

The initial empirical motivation for associating wh-phrases with the [focus] feature comes from an observation by Rizzi (1997, 2001a): in Italian direct wh-questions, a fronted focus cannot co-occur with the wh-phrase in either order, as exemplified in (39). (See Bocci et al. 2018 for experimental evidence on this point.)

  1. (39)
    figure bn

Rizzi proposes that in direct questions, the fronted focus and the wh-phrase compete for the same landing site, the left-peripheral focus position. Abstracting from his “cartographic” approach, this idea can be rephrased in the following terms: the focussed phrase and the wh-phrase compete to check the same [focus] feature against the C phase head.

This raises the question of why the wh-phrase should check the [focus] feature. One analytical possibility would be to assume that wh-phrases are inherently focal (see Beck 2006; Eckardt 2007; and Cable 2010; see also Haida 2007 and Kotek 2014, among others), and that the [focus] feature can probe at most one element (at least in Italian: see Rizzi 1997; Bocci 2013). It follows that in (39), either the wh-phrase or the narrowly focussed constituent matches the [focus] probe in C, but not both.

But this solution goes too far because, as noted in Rizzi (2001a), Italian indirect wh-questions do not display the same restriction: here a narrow focus can appear, and it can be fronted to the left of the wh-phrase, as exemplified in (40) (we refer to Bocci et al. 2018 for experimental evidence on this point).

  1. (40)
    figure bo

In principle, we could stipulate that in indirect wh-questions the [focus] feature can probe twice; but clearly, this stipulation would be at best a restatement of the problem. In addition, empirical evidence on the prosodic side tells a completely different story.

A production experiment reported in Bocci and Cruschina (2018) investigated the distribution of the NPA in Italian indirect wh-questions featuring a bare wh-phrase. The procedure was analogous to the one described in Sect. 3 for our first experiment. Interestingly, the NPA distribution turned out to be completely different from direct wh-questions: while in the latter the NPA associates with a lexical verb in the majority of cases (either the matrix or the embedded verb depending on the type of movement; cf. Sect. 3.3), in indirect wh-questions the NPA falls on a non-final lexical verb in a very marginal number of cases, namely in 3% of cases on the matrix verb and 1.3% of cases on the embedded verb. By contrast, the NPA associated with the rightmost constituent in 95.7% of the cases. An example of this pattern is given in Fig. 11, where it is evident that the H+L* NPA is associated (and aligned) with (the stressed syllable of) the rightmost constituent of the sentence prepensionamento ‘early-retirement’.

Fig. 11
figure 12

Pitch contour of an utterance produced after the indirect wh-question in (41) (from Bocci and Cruschina 2018)

  1. (41)
    figure bq

On the basis of these experimental results, Bocci and Cruschina (2018) conclude that in neutral indirect wh-questions (i.e. not hosting a narrow focus), the NPA is assigned to the rightmost constituent of the sentence. This corresponds to the default prominence placement for Italian broad focus declaratives (see Nespor and Vogel 1986; Avesani 1990; Gili Fivela et al. 2015): in other terms, the prosody of indirect wh-questions patterns with that of declarative sentences rather than with the prosody of direct wh-questions.

From the perspective of our approach, these findings imply that in indirect wh-questions the wh-chain does not carry the [focus] feature, and hence it is invisible to the algorithm of NPA assignment discussed in Sect. 5. This conclusion is incompatible with the widespread assumption that wh-phrases are inherently specified for the [focus] feature.Footnote 30

The contrast between (39) and (40) is also relevant to an alternative interpretation of our experimental data suggested to us by an anonymous reviewer. It is possible to assume that the marked NPA placement in Italian direct wh-questions is not triggered by the [focus] feature, but rather, it is a scope-marking device, which makes the extension of the wh-chain “perceptible.” Such a scope-marking has been described by Ishihara (2004) for Tokyo Japanese wh-questions: a wh-phrase in situ is marked by a higher F0 peak than on a corresponding non-interrogative phrase, and the material following the wh-phrase undergoes peak reduction until the boundary of the clause over which the wh-phrase takes scope.

As for Italian, this view does not explain the NPA pattern in indirect questions: there too the wh-phrase is assigned scope by overt wh-movement, yet its scope would not be marked by NPA shift. It is not clear to us how the scope-marking mechanism could be defined so as to apply in direct questions, but not in indirect ones: indeed, in Tokyo Japanese the same prosodic pattern in observed in both (Ishihara 2004).

In the next subsections, we provide an account of the contrast between (39) and (40) in terms of our focus-based approach.

7.2 The contribution of [focus]

The preceding discussion raises three related issues. First, what is the semantic role of the [focus] feature in direct wh-questions? Secondly, why is such a role not deployed in indirect wh-questions? And finally, why do the latter require a narrow focus to scope above the wh-phrase, as exemplified in (40)?Footnote 31 Starting from the first question, we cannot adopt the hypothesis that wh-phrases contribute to interpretation just a focus semantic value (i.e. a set of alternatives), as proposed by Beck (2006), Eckhardt (2007), Cable (2010) among others, in the framework of a Hamblin semantics à la Kratzer and Shimoyama (2002): in fact, such an approach requires that wh-phrases be endowed with the [focus] feature both in direct and in indirect questions, contrary to the evidence discussed in Sect. 7.1.

On the other hand, the main competitor analysis, the partition semantics approach, does not leave room for any possible role of the [focus] feature. In this approach, the wh-phrase simply induces functional abstraction over its trace, and the abstract thus obtained is used to partition the set of accessible possible worlds into disjoint subsets, corresponding to the potential complete answers to the question. From this perspective, it is not clear what could possibly be contributed by the [focus] feature on the wh-phrase.

One possible solution emerges in the approach to wh-questions elaborated by AnderBois (2012) in the framework of inquisitive semantics.Footnote 32 In a nutshell, AnderBois proposes that the role of focus in wh-questions is to introduce an existential presupposition, which is necessary in order to yield a proper question denotation. We summarize his approach first, and then we turn to the remaining issues related to indirect wh-questions.

In inquisitive semantics (Groenendijk and Roelofsen 2009), a proposition is defined as a non-empty set of possibilities, where each possibility is a set of indices/possible worlds; the maximal possibilities are called the alternatives in the proposition. An existentially quantified proposition like (42) denotes a set of alternatives—intuitively, one for each possible value of the variable x in the subformula [x drank vodka]:

  1. (42)
    figure br
  1. (43)
    figure bs

In a domain with more than one individual that x can range over, this will give rise to more than one alternative. A proposition containing more than one alternative is said to be inquisitive. On the other hand, the proposition denoted by (42) does not contain the possibility that (i.e. the set of indices at which) nobody drank vodka. Since it excludes at least one possibility, the proposition is also informative, in addition to being inquisitive.

To illustrate with a toy example, assume a domain with two individuals, Al and Ben: sentence (42) will denote the two possibilities corresponding to the shaded areas in Fig. 12 (where a indicates indices at which Al drank vodka, b indicates indices at which Ben did, –a indicates indices at which Al did not drink vodka, –b indices at which Ben did not drink vodka); the non-shaded area corresponds to the excluded possibility that nobody drank vodka.

Fig. 12
figure 13

Denotation of (42)

Crucially, a sentence expresses a proper question only if the proposition it denotes is inquisitive but not informative. AnderBois’s insight is that a wh-question is just an existentially quantified sentence that is interpreted in a context in which its informativeness is wiped out, namely, in a context that does not contain the “nobody possibility” to begin with. This can be obtained by combining the denotation of the existentially quantified sentence with an existential presupposition: the latter ensures that the context of interpretation does not contain any nobody-index—i.e. it entails the union of the possibilities denoted by (42)Footnote 33—so that the existentially quantified sentence is uninformative relative to it.Footnote 34

The context reduced by the existential presupposition corresponds to (a subset of) the shaded area in Fig. 13: it only contains indices at which Al or Ben or both drank vodka. It is easy to see that with respect to such a reduced context, the proposition represented in Fig. 12 is no longer informative, i.e. it does not exclude any possibility. The proposition, however, is still inquisitive, hence it conveys a felicitous question in this context.

Fig. 13
figure 14

Effect of the existential presupposition !(∃x)(x drank vodka)

AnderBois proposes that the role of focus on the wh-phrase is precisely to introduce the required existential presuppositionFootnote 35 (see Abusch 2010 on the default presupposition introduced by focus). We adopt this proposal for Italian direct wh-questions, so that the [focus] feature on the wh-chain is not semantically idle, but it plays a role in the question interpretation. Footnote 36

Notice that the adoption of an existential presupposition does not support an alternative interpretation of our data, according to which the NPA shift observed in direct wh-questions would be the result of deaccenting of presupposed (and hence discourse-given) material. The reason is that the existentially presupposed part of the sentence corresponds to the scope of the wh-phrase, while in our findings only the part of it that follows the lexical verb undergoes deaccenting. The mismatch is particularly evident in a long-distance extraction case like (19b) with NPA placement on the lexical verb of the embedded clause: here both the main clause and part of the embedded clause, though by assumption part of the presupposition, do not undergo deaccenting.Footnote 37

The next issue to be addressed is the interpretation of indirect wh-question, for which we conclude, based on the above evidence, that the wh-chain does not carry a [focus] feature. From the present perspective, the problem is how the potential informativeness of the wh-clause can be eliminated without having recourse to the focus-induced existential presupposition.

In fact, in the inquisitive semantics framework there is exactly one other way in which the potential informativeness of a clause can be eliminated: this is the so called non-informative closure operator (notated as ?), defined in (45). Intuitively, this operator takes in input a nonempty proposition (inquisitive or not), and adds to it the possibility consisting of all the indices that are not included in any possibility of the original proposition.

  1. (44)
    figure by
  1. (45)
    figure bz

The resulting proposition denoted by ?φ is automatically inquisitive (containing at least two non-overlapping possibilities) and non-informative, since the added possibility will cover whichever indices were not covered by the original proposition. When applied to a non-inquisitive proposition (e.g. the one denoted by Al drank vodka), the ?-operator returns the polar question consisting of the possibility denoted by the original proposition and its complement possibility (corresponding to the polar question: Did Al drink vodka?). When applied to the proposition denoted by (42), whose denotation is represented by the shaded area in Fig. 12, the ?-operator will add to it the nobody-possibility, as depicted in Fig. 14, yielding a non-informative and inquisitive proposition.

Fig. 14
figure 15

Denotation of ?(∃x)(x drank vodka)

We propose that in Italian, a verb selecting for an indirect wh-question licenses an ?-operator on top of the wh-clause, which eliminates any potential informativeness, without having recourse to the focus-related existential presupposition.

One drawback of this move is that in Italian there is no morphosyntactic evidence for this extra compositional layer in indirect wh-questions. We speculate that the ?-operator can have zero exponence because its presence is predictable from the embedding context. As anecdotical support for our assumption, we mention that in some Romance varieties, indirect wh-questions are in fact introduced by the same interrogative particle that introduces polar questions. We may take this particle to be a morphologically overt ?-operator: as expected in our approach, the particle is higher than the landing site of the wh-phrase.

  1. (46)
    figure cb

The last issue to be considered is the obligatory wider scope of a narrowly focussed phrase co-occurring with a wh-phrase in indirect questions, as exemplified in (40) above. In this regard, note that if the focus took scope below the wh-phrase, no well-formed presupposition could be generated, since the presupposed clause would contain an unbound variable, corresponding to the wh-trace. If, on the other hand, the focus takes scope above the wh-phrase, we obtain a well-formed presupposition, in which the focus trace is existentially quantified on top of the wh-phrase (another existential quantifier).Footnote 38

In sum, adopting the perspective of inquisitive semantics, we have proposed that there are two possible routes to get to a proper question denotation for a wh-clause (where the wh- phrase is, by assumption, an existential quantifier). The first possibility is to endow the wh-phrase with the [focus] feature: this introduces an existential presupposition that makes the wh-clause contextually non-informative (following AnderBois 2012). The second possibility is to stick an ?-closure operator on top of the wh-clause, which makes the latter semantically non-informative. Italian exploits the first route for direct wh-questions, and the second route for indirect ones. The reason for this asymmetry remains an open issue.

8 Theoretical consequences and future directions

In this paper, we have addressed the issue of the relationship between wh and focus in Italian wh-questions featuring bare wh-elements. In future research we intend to address aspects of language-internal variation in Italian, with respect to other types of interrogatives, as well as comparative work on the typology of NPA assignments outlined in Ladd (1996).

As for the first line of inquiry, the Italian wh-element perché ‘why’ constitutes an interesting case. Rizzi (2001a) argues, on the basis of syntactic evidence, that perché is directly merged into the CP layer. We therefore expect that the prosodic pattern exhibited by why-questions should differ from that of other wh-questions. Indeed, Marotta (2001) reports that perché systematically associates with the NPA. The fact that the lexical verb does not attract the NPA here is compatible with our analysis, under the assumption that perché is not extracted from within the vP phase, hence there is no occurrence of the {wh, focus} feature bundle on the v phase head.

Another possible extension concerns D-linked wh-elements. In Italian, these differ syntactically from bare wh-elements in that they need not be adjacent to the finite verb and they can be extracted from weak islands (see Pesetsky 1987; Rizzi 2001c); we should therefore investigate whether they pattern with bare wh-elements with respect to NPA assignment.

Finally, the question arises of whether yes/no questions display a similar or different pattern compared to wh-questions: indeed, the data reported in Savino (2012) and Gili Fivela et al. (2015) suggest that yes/no questions in Italian do not display the same prosodic pattern as wh-questions with respect to the placement of the NPA.

With regard to Ladd’s typology of NPA assignments (cf. Sect. 2), interesting questions arise at the theoretical level. Recall that in languages like English, direct wh-questions follow the same prosodic pattern as statements, i.e. the default prosodic principle assigns the NPA to the stressed syllable of the rightmost constituent. Prima facie, this suggests that the prosodic component does not “see” a [focus] feature in wh-questions in these languages. This might be related to a different encoding of focus (see e.g. Kratzer and Selkirk 2007, according to whom only contrastive focus is syntactically encoded in English). Symmetrically, in languages like Romanian the NPA is systematically associated with the wh-element in (direct) questions: from our perspective, it seems that here only the interpretable occurrence of the [focus] feature in the scope position is visible to the prosodic component, whereas the intermediate occurrences are not visible (perhaps because of a lack of agreement in the phase edge, along the lines suggested by Georgi 2017). We leave these cross-linguistic issues for future research.

In conclusion, we wish to highlight some theoretical consequences of the proposed analysis. In spite of the vast literature on the reflexes of focus in various languages and at different levels of the grammar, its role as an active feature in the syntactic computation is still controversial and has been disputed both in the Minimalist framework and in interface-driven approaches. Recently, for example, Chomsky et al. (2019) state that information-structure notions and discourse-related features such as topic and focus do not play any active role in the syntactic derivation: the phenomena that are commonly associated with focus should instead be regarded as post-syntactic operations. Similarly, interface-driven approaches reject the postulation of a syntactically encoded focus feature, as well as the existence of the corresponding projection in the syntactic structure, attributing the syntactic and prosodic phenomena that are generally associated with focus to independent pragmatic or prosodic requirements (see, e.g. Szendrői 2001, 2017; Horvath 2007, 2010).

Against this view, and with the support of experimental data, we have argued that the [focus] feature plays an active role both in the syntactic derivation and at the interface with the prosodic component. It is in fact this feature that, bundling with the [wh] feature in Italian direct wh-questions, triggers a successive cyclic derivation through the edge of every phase intervening between the first-merge position of the wh-phrase and its final landing site. Successive cyclic movement and a mechanism of feature sharing with the intervening phase heads determine a configuration in which the NPA is assigned to the rightmost element marked with the {wh, focus} feature bundle. Phonological evidence for successive cyclic movement has been provided from tonal alternations in tonal languages such as Kikuyu (Clements 1984) and Asante Twi/Akan (Korsah and Murphy 2016); our experimental data show that phonology, in particular prosody, can reflect the derivational history of a wh-phrase also in an intonational language such as Italian, where the assignment of the NPA appears to track the intermediate steps of wh-movement.

The placement of the NPA in Italian direct wh-questions also casts doubt on the direct association between prosodic prominence and focal interpretation. We have seen that the phonological correlates of the {wh, focus} bundle can be dissociated from the position where the feature is interpretable at the C-I interface. Hence, focus is not directly encoded at PF (pace Reinhart 2006), and the prosodic computation—at least in Italian—is not sensitive to the distinction between interpretable and uninterpretable instances of the [focus] feature, in the final landing site and in intermediate positions respectively.

One further theoretical consequence of the analysis presented above concerns cyclic spell-out (Chomsky 2000). Note that the distribution of the NPA in Italian wh-questions cannot be accounted for within a cyclic spell-out, bottom-up approach to the syntactic derivation: in order to capture the observed distribution, prosody must rather operate on a global representation (cf. Cheng and Downing 2016).Footnote 39 Incontrovertible evidence in this regard comes from Italian wh-questions with short-distance movement, like (19a) above and (47):

  1. (47)
    figure cc

If we adopt a cyclic spell-out approach to these structures, nothing could prevent the default assignment of the NPA in the vP phase of the embedded clause, because this vP does not contain any occurrence of the {wh, focus} feature that could block the default NPA assignment to the rightmost element of the sentence. However, this type of default assignment makes incorrect predictions. As we saw in Sect. 2, in wh-questions with short-distance movement the NPA is systematically assigned to a higher phase, i.e. to the lexical verb in the matrix clause. In other words, only when all phases are merged in the syntactic structure can the prosodic component detect the possible occurrences of the {wh, focus} features and thus determine, operating on the whole structure, where the NPA must be assigned.

Finally, our analysis is consistent with the traditional view that traces are not visible for the prosodic computation (Nespor and Vogel 1986), and is also compatible with the indirect reference approach to the syntax-phonology interface, according to which mapping rules mediate between the syntactic and phonological structure (see Elordieta 2008 for discussion), inasmuch as the specification of the {wh, focus} features may be read by mapping rules and then affect the prosodic structure, i.e. the assignment of main prominence.Footnote 40