1 Introduction

There is considerable variation in the extent to which managers provide voluntary disclosure. Perhaps for this reason, a significant portion of the academic literature explores the determinants of managers’ disclosure decisions. Extant theories in this literature tend to link managers’ disclosure decisions to two independent economic forces: the existence of an adverse selection problem (e.g., the manager’s private information) and the cost of ameliorating the problem (e.g., the proprietary costs associated with disclosure). These theories, in turn, provide the basis for a burgeoning number of empirical studies. The conventional approach in the empirical literature is to study a single theoretical determinant of the manager’s disclosure decision in isolation, to develop an empirical measure of the unobserved theoretical construct, and to estimate the linear relation between the empirical proxy of interest and voluntary disclosure. However, despite compelling intuition and unambiguous theoretical predictions, the literature commonly reports mixed results (Berger 2011).Footnote 1 In this paper, we conjecture that much of the prior empirical literature reports mixed results for two reasons: (i) the assumption that disclosure costs are independent of the manager’s private information, and (ii) the focus on estimating linear relations.

Following Einhorn and Ziv (2008) and Cheynel and Liu-Watts (2020), we develop a parsimonious model that combines elements of Dye (1985), Jung and Kwon (1988), and Verrecchia (1983). Our model links the probability that the manager has some private information with the cost of disclosing that information. We show that allowing for joint determination of private information and disclosure costs leads to a set of empirical predictions that are richer and more nuanced than the standard linear predictions in the empirical literature. For example, when private information and disclosure costs share a common empirical determinant, our model predicts that the relation between the empirical determinant and voluntary disclosure will take a specific non-linear shape. While prior work often frames empirical predictions in terms of linear relations, our empirical predictions focus on the non-linear shape of the relation. We find that the non-linearities predicted by the model are empirically descriptive of multiple measures of voluntary disclosure in multiple empirical settings that feature both a greater likelihood that the manager has some private information and greater proprietary costs of disclosure.

Classical models of voluntary disclosure feature two independent economic forces: adverse selection and disclosure costs. For example, adverse selection plays a prominent role in the models of Dye (1985) and Jung Kwon (1988), hereafter DJK. In these models and their antecedents (e.g., Bertomeu et al. 2019, 2020b), the manager is privately informed with some positive probability and disclosure is costless. The potential existence of private information creates an adverse selection problem. Shareholders seek to address this problem by discounting the firm’s shares, which in turn motivates a price-maximizing manager to disclose. In extant theoretical models, the notion that shareholders are uncertain whether the manager has private information is known as the “uncertain information endowment” friction. This friction prevents unravelling (i.e., full disclosure) because it allows managers who are informed with negative information to pool with managers who are legitimately uninformed (i.e., neither type will disclose).

Adverse selection is also present in Verrecchia (1983), hereafter V83. In V83, managers are always informed, and the qualities of public and private information prior to the voluntary disclosure decision are treated as separate parameters. An increase in the quality of private (public) information exacerbates (ameliorates) the adverse selection problem. Similar to DJK, adverse selection increases the likelihood that the manager discloses. However, V83 posits that disclosure is costly. A common interpretation of these costs is that they represent “proprietary costs,” or a loss in competitive position as a result of “giving away company secrets or otherwise harming the firm’s competitive position” (Graham et al. 2005). Disclosure costs also prevent unravelling because they allow managers with negative information to pool with managers whose information is positive but not sufficiently positive to offset the costs of disclosure (i.e., neither type will disclose).

Subsequent literature has extended these classical models to a variety of settings, including multiple periods (Guttman et al. 2014), endogenous proprietary costs (Wagenhofer 1990; Bertomeu and Liang 2015; Cheynel and Ziv 2019), interactions with other forms of disclosure (Frenkel et al. 2020; Michaeli and Wiedman 2020; Friedman et al. 2020, 2021), and additional frictions (e.g., Einhorn 2007; Marinovic and Varas 2016). Nonetheless, throughout all of this literature, there are two pervasive economic forces––adverse selection and disclosure costs.

We develop a simple model of voluntary disclosure where adverse selection and disclosure costs are unobserved and jointly determined, and use the insights from this model to motivate our empirical tests. We consider a setting where a manager is privately informed with some positive probability (from DJK) and where disclosure entails a cost (from V83). We link these two unobserved constructs by assuming they both share a common observable determinant, denoted ‘x’. Here, one can think of x as an empirical proxy (or setting) that is positively related to the two unobserved theoretical constructs: the possibility that the manager possesses a piece of private information, and the cost of revealing that information. For example, prior literature suggests that large capital projects entail both greater private information and greater proprietary costs of disclosure (e.g., Graham and Harvey 2001; Boone et al. 2016). Figure 1 illustrates the economic forces present in our model. In such a setting, we show that there are two countervailing effects on disclosure: greater private information (disclosure costs) implies a higher (lower) probability of disclosure. In the presence of these countervailing effects, we show that the probability of voluntary disclosure is unimodal: it initially increases, reaches a unique peak, and then decreases (see e.g., Fig. 2).Footnote 2 This insight informs the design of our empirical tests.

Fig. 1
figure 1

Theoretical relations. This figure illustrates how adverse selection and disclosure costs affect the probability of voluntary disclosure. Adverse selection arises as a consequence of the manager possessing private information unavailable to the market. In our model, q(x) is the probability that the manager is privately informed, and c(x) represents disclosure costs; these two forces are jointly determined through x. Higher values of x correspond to greater adverse selection and higher disclosure costs

In principle, the underlying economic theory that motivates our analysis is very broad, and potentially applies to a variety of empirical proxies and/or settings that feature both greater adverse selection and greater costs of disclosure. To illustrate the breadth of the economic forces we study, and to minimize concern that any of our empirical findings are setting-specific, we test for unimodal voluntary disclosure in two distinct empirical settings––capital investments and major customers. Importantly, our prediction of a non-linear relation that flips sign explains why some studies of voluntary disclosure in these settings find evidence of a positive linear relation (e.g., Cao et al. 2013), while others find evidence of a negative linear relation (e.g., Crawford et al. 2020).

Our first setting examines the relation between capital investments and voluntary disclosure. Managers undertaking large, capital-intensive projects are more likely to be privately informed, especially concerning forecasted benefits and costs of the project (e.g., Graham and Harvey 2001). However, at the same time, disclosure surrounding large, capital-intensive projects could reveal proprietary information to competitors and be detrimental to project success (e.g., Boone et al. 2016). Indeed, the popular press is replete with cases where large capital projects are kept under wraps for this reason.Footnote 3 Consistent with this, Boone et al. (2016) find that firms with large capital-intensive projects are more likely to redact their material contracts.Footnote 4 Thus, because capital investments are a setting that features both greater adverse selection and greater proprietary costs of disclosure, we predict a unimodal relation between capital investment and voluntary disclosure.

Our second setting examines the relation between major customers and voluntary disclosure. One stream of literature suggests that managers are more likely to possess private information about future performance if the firm’s sales are dominated by a few major customers, either as a result of better internal information (e.g., Samuels 2020) or close supplier-customer relationships (e.g., Crawford et al. 2020). This reasoning suggests a positive association between major customers and disclosure. However, another stream posits that the dependency on a few large customers imposes proprietary costs on the firm as a result of competitors using sales and product market disclosures to identify and expropriate a firm’s major customers (e.g., Verrecchia and Weber 2006; Ellis et al. 2012).Footnote 5 Thus, because sales to major customers are another setting that features both greater adverse selection and greater proprietary costs of disclosure, we predict a unimodal relation between major customers and voluntary disclosure.

In each of the two empirical settings, we estimate the shape of the relation using three measures of voluntary disclosure and three distinct sets of tests. Following a large body of prior literature, we measure voluntary disclosure using the probability of a management forecast. Additionally, in our analysis of capital investment (major customers), we measure voluntary disclosure using forecasts of capital expenditures (sales). The latter measures focus on specific elements of the settings we study and capture a particular type of voluntary disclosure that is more germane to each setting. Finally, we supplement our forecast-based measures of disclosure using a text-based measure of voluntary product-market disclosures (e.g., Kepler 2021).

For each empirical setting, we follow Samuels et al. (2020) and estimate the shape of the relation using three distinct sets of tests. First, we present the shape graphically. In particular, we sort firms into quintiles based on capital expenditure (major customers) and present average values of each of our measures of voluntary disclosure across the five quintiles. Consistent with unimodality, across all measures of voluntary disclosure, we find that voluntary disclosure monotonically increases in the first two quintiles of capital expenditure, plateaus in the third quintile, and then declines sharply in the fifth quintile. We find similar results for major customers. In both settings, the non-linearities are highly statistically and economically significant, representing increases (decreases) well in excess of 50% (−50%).

Second, we estimate polynomial regression models that express voluntary disclosure as a function of linear and 2nd-order polynomial terms. If the relation is unimodal, we expect the 2nd-order polynomial terms to load incremental to the linear term, and the coefficient to be negative. In estimating these specifications, we allow for ad hoc non-linear relations between voluntary disclosure and our control variables. Consistent with our predictions, we find robust evidence of a negative coefficient on the 2nd-order polynomial term for capital expenditures and major customers.

Third, we estimate spline regression models that treat the shape of the relation between capital investment (major customers) and voluntary disclosure as piecewise linear. Specifically, we estimate linear regressions on either side of a threshold level of capital investment (major customers) and test whether the slope coefficient is positive for observations below the threshold and negative for observations above the threshold. Consistent with a unimodal relation, we find a positive relation for firms with below-threshold levels of capital investment (major customers), and a negative relation for firms with above-threshold levels of capital investment (major customers).

Collectively, our analysis provides robust empirical evidence that the probability of voluntary disclosure is unimodal, and shows that straightforward modifications to classical models of voluntary disclosure can accommodate this feature of the data. The empirical evidence is consistent with the notion that private information and disclosure costs are jointly determined. Allowing for such joint determination and the associated non-linear relations sheds light on why prior work finds mixed results.

The remainder of our paper proceeds as follows. Section 2 formally develops our empirical predictions in the context of a parsimonious model of voluntary disclosure. Section 3 describes our empirical tests related to capital investment. Section 4 describes our empirical tests related to major customers. Section 5 provides concluding remarks.

2 Model

2.1 Overview

In this section, we use a simple model of voluntary disclosure based on DJK and V83 to motivate our empirical predictions. Our model links the probability that the manager is privately informed (from DJK) with the costs associated with disclosing that information (from V83). We consider a setting where both the probability that the manager has some private information and the cost of disclosure are unobserved by the empirical researcher but share a common observable determinant, denoted ‘x’. In this regard, one can think of x as an empirical construct (or setting) that is positively associated with both the possibility that the manager has some piece of private information and the cost of revealing that information. While the model does not speak to the specific origin of disclosure costs––a broad concept that encompasses information production costs, audit costs, and proprietary costs––our two empirical settings focus on proprietary costs.

2.2 Setup

We assume that firm value, \( \overset{\sim }{v} \), is uniformly distributed between 0 and 1: \( \overset{\sim }{v}\sim Unif\left[0,1\right] \).Footnote 6 We assume that x ranges from xL to xH; the manager privately observes firm value with probability q(x), where q(x) is increasing as a function of x; and disclosure entails a disclosure cost of c(x) that reduces firm value, where c(x) is continuous and increasing in x. Assuming the manager knows the value of the firm, which occurs with probability q(x), the manager will disclose the value if the value net of the cost of c(x) exceeds the market’s valuation in the absence of disclosure. Finally, let t(x) describe the disclosure threshold. Above this threshold, the manager elects to disclose the value of the firm; below it, the manager elects to withhold this information.

As is standard in the literature where investors hypothesize whether a manager is informed, the derivation of t(x) is endogenous and results from considering three mutually exclusive events. We denote these events A, B, and C. Let A describe the event where the manager knows v and, in addition, v is above the disclosure threshold t(x); here the manager ends up disclosing. Let B describe the event where the manager knows v but v is below t(x); here the manager ends up not disclosing. Finally, let C describe the event where the manager has no knowledge of v and thus does not disclose irrespective of the actual realization of v. One can express the probability of each event occurring as follows (note that the three probabilities sum up to one).

$$ {\displaystyle \begin{array}{l}\Pr (A)=q(x)\left(1-t(x)\right)\\ {}\Pr (B)=q(x)t(x)\\ {}\Pr (C)=1-q(x)\end{array}} $$
(1)

Given these probabilities, one can calculate the expected firm value conditional on withholding information as follows.

$$ E\left(v| withholding\right)=\frac{E\left(v|B\right)\Pr (B)+E\left(v|C\right)\Pr (C)}{\Pr (B)+\Pr (C)}=\frac{q(x)t{(x)}^2-q(x)+1}{2q(x)t(x)+2-2q(x)} $$
(2)

2.3 Equilibrium

Next, we derive the equilibrium disclosure threshold, t(x); the equilibrium likelihood of disclosure, Pr(A); and the change in the likelihood of disclosure as a function of \( x,\frac{d\Pr (A)}{dx} \). To simplify the exposition, we provide the formal derivations and proofs in Appendix 1.

The disclosure threshold t(x) represents the point at which the manager is indifferent between disclosing and absorbing a cost of c(x), versus withholding and eschewing the cost. Equation (3) describes that point:

$$ t(x)=\frac{c(x)q(x)+q(x)-1+\sqrt{c{(x)}^2q{(x)}^2-q(x)+1}}{q(x)}. $$
(3)

This allows us to express the likelihood of disclosure as followsFootnote 7:

$$ \Pr (A)=q(x)\left(1-t(x)\right)=1-\sqrt{c{(x)}^2q{(x)}^2-q(x)+1}-c(x)q(x). $$
(4)

An increase in x increases simultaneously both the probability that the manager is privately informed, q(x), and the cost of disclosure, c(x). As such, our chief interest is in examining how Pr(A) changes as x increases. Mathematically, we study this relation by examining the sign of \( \frac{d\Pr (A)}{dx} \). As we illustrate in detail in Appendix 1, it is straightforward to show that the sign of \( \frac{d\Pr (A)}{dx} \) is given by the sign of the expression:

$$ {q}^{\prime }(x)-2\left({c}^{\prime }(x)q(x)+c(x){q}^{\prime }(x)\right)\left(c(x)q(x)+K(x)\right), $$
(5)

where \( K(x)=\sqrt{c{(x)}^2q{(x)}^2-q(x)+1} \).Footnote 8 Broadly speaking, one can interpret the first term in Eq. (5) as the effect arising from an increase in the probability that the manager is privately informed, while the second term is the effect arising from an increase in disclosure costs. When the first term exceeds the second term, we will observe an increase in the likelihood of disclosure as x increases: \( \frac{d\Pr (A)}{dx}>0 \). However, when the second term is greater than the first term, we will observe a decrease in the likelihood of disclosure as x increases: \( \frac{d\Pr (A)}{dx}<0 \).

We show that the relation between x and Pr(A) is unimodal for a wide variety of probability functions and cost functions. In other words, there exists a unique peak, say x, below which the likelihood of disclosure is increasing in x (i.e., \( \frac{d\Pr (A)}{dx}>0 \)) and above which the likelihood of disclosure is decreasing in x (i.e., \( \frac{d\Pr (A)}{dx}<0 \)).Footnote 9 In Appendix 1, we derive two sufficiency conditions for q(x) and c(x) that guarantee unimodality, and show that a wide variety of functional form assumptions for q(x) and c(x), including linear functions, satisfy these conditions. We emphasize that these are sufficient (not necessary) conditions. Undoubtedly, unimodality can also be attained under alternative information structures. Indeed, Richardson (2001) is perhaps the first to suggest the possibility of a unimodal relation. Richardson (2001) considers a version of V83 where the disclosure cost is increasing in the precision of the manager’s private information. In such a circumstance, he suggests that linking private information and disclosure costs introduces a “countervailing force” such that greater private information does not necessarily lead to more disclosure.

Nonetheless, our approach of solving the model for general forms for q(x) and c(x) makes it clear that our results do not obtain for all functional forms. Figure 2 plots the derived relation between the empirical construct x and Pr(A) for a variety of probability functions and cost functions. In each case, the relation is unimodal; the likelihood of disclosure first increases, reaches a peak, and then decreases.Footnote 10 Our subsequent empirical tests validate this approach and suggest that a unimodal relation is empirically descriptive of the relation between voluntary disclosure and two empirical candidates for x––capital investments and major customers.

Fig. 2
figure 2

Unimodal relation. This figure plots the theoretical relation between x and the likelihood of voluntary disclosure, Pr(A), for a variety of probability functions and cost functions. The solid line illustrates the shape of the relation with a linear probability that the manager is informed and a linear disclosure cost: \( q(x)=x;\mathrm{and}\ c(x)=\frac{1}{2}x \). The dashed line illustrates the shape of the relation with a linear probability that the manager is informed and a convex disclosure cost: \( q(x)=x;\mathrm{and}\ c(x)=\frac{1}{2}{x}^2 \). The dotted line illustrates the shape of the relation with a concave probability that the manager is informed and a convex disclosure cost: \( q(x)=\sqrt{x};c(x)=\frac{1}{2}{x}^2 \)

2.4 Discussion

Generally speaking, there are three categories of joint theory-empirical papers: (i) a structural paper that seeks to estimate the underlying model parameters (Bertomeu et al. 2020a, b; Zhou 2020); (ii) a theoretical paper that includes a simple empirical analysis to illustrate the empirical implications (e.g., Smith 2020); and (iii) an empirical paper that might offer a simple, short model as a “thought experiment” to illustrate the economic intuition (e.g., Chen et al. 2005; Ferri et al. 2018). Each type of paper plays a different role in the discovery and dissemination of knowledge.Footnote 11 One can generally assess the type of paper by examining the complexity and length of the formal theory being invoked. In the case of this paper, our theory is a straightforward modification of the existing models, and our primary purpose is motivational. We do not claim to offer a theoretical contribution.

Our contribution is instead to apply existing theory (and slight changes to these models) to explain observed patterns in the data. In our paper, the role of theory is to offer a simple “thought experiment.” Without such a thought experiment, it would be difficult to intuit why the observed empirical relations might flip sign––why we would observe a unimodal relation. Perhaps for this reason, prior empirical work focuses on monotone comparative statics that yield unambiguous sign predictions. We show that straightforward changes to existing theory models can explain an otherwise puzzling and anomalous empirical finding.Footnote 12 Indeed, the notion that the data bears out our predictions suggests that our assumptions are descriptive. We must prefer assumptions that are transparent and that produce empirically descriptive results over assumptions (i) that are not transparent (some reduced-form empirical papers) or (ii) that are transparent but do not produce descriptive results (some theory models). In this sense, if we have sufficient variation in the empirical proxies and if falsification is straightforward––i.e., voluntary disclosure is linear (or not unimodal)––then the theory is falsified.

When combining theory and empirical work, one necessarily has to make tradeoffs in the complexity of the underlying theory model. This tradeoff should be informed by judgments about the value added from additional complexity and the objectives of the paper. For example, in formulating our model, we could have endogenized the disclosure cost in a competitive industry entry and exit game (similar to Wagenhofer 1990); added noise to the manager’s private information (similar to Verrecchia 1990); added an endogenous investment decision (similar to Heinle et al. 2020); or assumed a normal rather than uniform distribution of firm value. Such features are not necessary to motivate a unimodal relation. The purpose of our model is not to offer the most realistic model possible, but rather to parsimoniously illustrate the intuition (in abstract form) for how economic forces might interact. Consequently, our subsequent empirical tests should not be viewed as tests of a literal interpretation of the model.

3 Empirical setting: capital investments

3.1 Sample and variable measurement

An important distinguishing feature of our analysis is that it concerns the shape of the relation between capital investment (major customers) and voluntary disclosure. Our focus on the shape of the relation should mitigate concerns about alternative explanations (e.g., omitted variables and reverse causality). For example, an alternative explanation for a unimodal relation would need to posit an alternative economic theory for why the relation flips sign at a similar point in the distribution of capital investment (major customers). In the economics literature, this approach is known as “identification by functional form” (e.g., Lewbel 2019). However, this approach is not without limitations. Indeed, one concern with it is that the sampling variation in capital investments may not be sufficiently large to replicate the theoretical shape of the joint distribution of capital investments and voluntary disclosure.

Consider how this concern might affect our empirical analysis. Figure 2 shows the shape of the relation predicted by theory. However, ex ante, we do not know where firms in our sample will fall on the x-axis. For example, it could be that all firms fall to the right (left) of the “peak.” In such a circumstance, we would only observe a negative (positive) relation. Thus, for our tests of the shape of the functional form to be meaningful, we need to maximize sampling variation in capital investment, so that we can observe firms on both sides of the peak. This requires having as broad a sample as possible. In this regard, our empirical tests are joint tests of a unimodal shape and of a sample that is sufficiently broad for us to observe firms on both sides of the peak.

We construct our sample using data from Compustat, I/B/E/S, and SEC filings from 2004 to 2016 (2014). Our sample begins in 2004, when the data for the management’s capital expenditure forecasts becomes widely populated in the I/B/E/S database. Our sample ends in 2016, except for the analyses of voluntary product market disclosure, the data for which ends in 2014. We require firms to have positive total assets, sales, and book value of equity on Compustat, and we exclude utilities (SIC codes 4000–4999) and financial services (SIC codes 6000–6999). The resulting sample consists of 45,401 (39,184) firm-year observations from 2004 to 2016 (2004 to 2014).

We employ three different measures of voluntary disclosure. Our first measure of voluntary disclosure, Pr(Forecast), is an indicator variable that equals one if a firm issues a management forecast in a given fiscal year (including forecasts of EPS, sales, capital expenditures, etc.) and zero otherwise. It is well established that management forecasts represent an important source of firm disclosure (Beyer et al. 2010; Hirst et al. 2008). As such, we expect Pr(Forecast) to effectively capture voluntary disclosure and ensure consistency and comparability with a large body of prior work.Footnote 13

Our second measure of voluntary disclosure, Pr(Forecast_cpx), is an indicator variable that equals one if a firm issues a capital expenditure forecast in a given fiscal year, and zero otherwise. This measure focuses on the capital investment aspect of our setting and captures a specific type of voluntary disclosure that closely relates to the underlying capital investment decision that motivates such disclosure.

Our third measure of voluntary disclosure is based on the textual analysis of corporate 8-K filings related to the firm’s product market environment. Specifically, Pr(ProdMktDisc) is an indicator variable that equals one if, in a voluntary 8-K filed in a given fiscal year, a firm mentions “product,” “quantity,” “quantities,” “pricing,” “strategy,” “capital expenditure,” “demand,” “business conditions,” or “market conditions,” and zero otherwise (Kepler 2021).Footnote 14 Using this measure of voluntary disclosure allows us to closely capture the proprietary aspect of the information being disclosed, because detailed information about the firm’s product market will be directly useful to the firm’s competitors and other external parties.

Table 1 presents descriptive statistics for the variables used in our analysis. Detailed variable definitions can be found in Table 1, and all continuous variables are winsorized at the 1st and 99th percentiles each year. On average, 53% of firm-years in our sample provide management forecasts (mean Pr(Forecast) = 0.53), 29% provide capital expenditure forecasts (mean Pr(Forecast_cpx) = 0.29), and 35% provide voluntary disclosure that discusses important aspects of the firm’s product market (mean Pr(ProdMktDisc) = 0.35). Average (median) value of capital investment in our sample is 28% (22%) of net property, plant, and equipment (mean Capex = 0.28, median Capex = 0.22).

Table 1 Descriptive statistics: capital expenditure

3.2 Results

3.2.1 Univariate sorts

We begin our analysis by graphically presenting the shape of the relation between capital investment and voluntary disclosure in our sample. In Fig. 3, we plot average voluntary disclosure (in excess of the industry-year mean) for each Capex quintile. Consistent with a unimodal relation, we find that average voluntary disclosure is monotonically increasing in the first three quintiles of Capex, then decreases in the 4th and 5th quintile. This relation is evident across all three measures of voluntary disclosure.

Fig. 3
figure 3

Voluntary disclosure by capital expenditure quintile. This figure presents average voluntary disclosure, in excess of the industry-year mean, by capital expenditure (Capex) quintile. See Table 1 for variable definitions

Table 2 reports the corresponding averages for each quintile. Consistent with a unimodal relation, we find that voluntary disclosure is monotonically increasing in the 1st, 2nd, and 3rd quintiles, then declines sharply in the 4th and 5th quintiles. Moreover, the increase from the 1st to the 2nd quintile is economically and statistically significant (p value <0.01), as is the decrease from the 4th to the 5th quintile (p value <0.01); each exceeds 50%.

Table 2 Voluntary disclosure by capital expenditure quintile

Our univariate analyses consistently find a unimodal relation in the data. This suggests that evidence of a unimodal relation is not an artefact of our subsequent regression tests but rather a pattern that can be clearly observed in the raw data. We next conduct formal regression tests.

3.2.2 Polynomial regression

We formally test for a unimodal relation between capital investment and voluntary disclosure by estimating polynomial regression models that include both linear and 2nd-order polynomial terms (e.g., Capex and Capex2). Specifically, we estimate the following polynomial regression specification:

$$ VolDisc={\beta}_1{Capex}^2+{\beta}_2 Capex+\gamma Controls+\varepsilon, $$
(6)

where VolDisc and Capex are as previously defined and Controls is a vector of control variables. If the relation is unimodal, we expect the coefficient on the 2nd-order polynomial term to be statistically significant and negative (β1 < 0).

Following prior literature (e.g., Lang and Lundholm 1993; Bamber and Cheon 1998; Guay et al. 2016), we include the following control variables: firm size (Size), leverage (Leverage), growth opportunities (Mtb), firm performance (Roa and Loss), special items (SpecItems), earnings volatility (Earnvol), and industry and year fixed effects.Footnote 15 All variables are as defined in Table 1. Throughout our regression analyses, we estimate linear probability models and base inferences on standard errors clustered by firm and year.Footnote 16

Table 3 presents results from estimating Eq. (6). Columns (1), (3), and (5) present results with linear controls, and columns (2), (4), and (6) present results with both linear and polynomial control variables (i.e., including Controls2). The latter specification controls for the possibility of heretofore undocumented non-linear relations between voluntary disclosure and our control variables. While we are not aware of any theory that would predict a unimodal relation between voluntary disclosure and our control variables, we nonetheless control for this possibility. Consistent with our predictions, we find negative and significant coefficient estimates on Capex2 across all six specifications (t-stats range between −2.73 and − 9.16). In addition, the signs and significance of our controls (e.g., positive coefficients on Size, Roa, and Leverage and negative coefficients on Mtb and Earnvol) are generally consistent with those in prior research.

Table 3 Polynomial regression results: capital expenditure

3.2.3 Spline regression

We provide further evidence on the shape of the relation by estimating spline regressions that treat the relation as piecewise linear. Specifically, we estimate the relation on either side of a threshold level of capital investment, τ, and test whether the relation below (above) the threshold is positive (negative):

$$ VolDisc={\beta}_1\left( Capex-\tau <0\right)+{\beta}_2\left( Capex-\tau \ge 0\right)+\gamma Controls+\varepsilon, $$
(7)

where all variables are as previously defined and the set of control variables mirrors that of Eq. (6). We predict β1 > 0 and β2 < 0. As before, we include year fixed effects and industry fixed effects, and cluster standard errors by firm and year.

Table 4 presents results from estimating Eq. (7) using the mean value of Capex as the threshold (τ = 0.276). Consistent with a unimodal relation, across all three measures of voluntary disclosure, we find positive and significant slope estimates below the threshold (t-stats 8.44, 1.88, and 2.51), and negative and significant slope estimates above the threshold (t-stats −2.97, −6.81, and − 3.87). Collectively, the results in Tables 2, 3, and 4 provide robust empirical evidence that the shape of the relation is consistent with our theoretical predictions.

Table 4 Spline regression: capital expenditure

4 Empirical setting: major customers

4.1 Sample and variable measurement

We construct our sample using data from Compustat, Compustat Customer Segments, I/B/E/S, and SEC filings from 2004 to 2016 (2014). As before, our sample begins in 2004 and ends in 2016, except for the analyses that use product market disclosures as a measure of voluntary disclosure, the data for which ends in 2014. We require firms to have positive total assets, sales, and book-value-of-equity on Compustat, and we exclude utilities (SIC codes 4000–4999) and financial services (SIC codes 6000–6999). The resulting sample consists of 44,235 (38,189) firm-year observations from 2004 to 2016 (2004 to 2014).

We use three different measures of voluntary disclosure: Pr(Forecast), Pr(Forecast_sale), and Pr(ProdMktDisc). Pr(Forecast) and Pr(ProdMktDisc) are as previously defined. Pr(Forecast_sale) is an indicator variable that equals one if a firm issues a sales forecast in a given fiscal year and zero otherwise. This measure focuses on the customer sales aspect of our setting and captures a specific type of voluntary disclosure that closely reflects the underlying relation between major customers and voluntary disclosure.

We next construct our measure of major customers. Publicly traded firms are mandated by SFAS No. 131 to report their sales amount to any “major customer,” defined as a customer that accounts for 10 % or more of their consolidated sales revenue. We obtain data on major customers from the Compustat Customer Segments dataset, which provides annual information on the dollar amount of sales generated from each major customer. Using this information, we calculate MajorSale as the percentage of total sales made to major customers (Banerjee et al. 2008; Gosman and Kohlbeck 2009; Huang et al. 2016). If a firm does not have a major customer, we treat it as having MajorSale = 0. Given the prevalent use of this measure in the customer relationship literature, we expect MajorSale to effectively capture the supplier firm’s dependence on major customers.Footnote 17

Table 5 presents descriptive statistics for the variables used in the analysis. Detailed variable definitions can be found in Table 5, and all continuous variables are winsorized at the 1st and 99th percentiles each year. On average, 53% of firm-years in our sample provide management forecasts (mean Pr(Forecast) = 0.53), 36% provide sales forecasts (mean Pr(Forecast_sale) = 0.36), and 35% provide voluntary disclosure that discusses important aspects of the firm’s product market (mean Pr(ProdMktDisc) = 0.35). Average MajorSale in our sample is 0.25, and the median is 0.12.

Table 5 Descriptive statistics: major customers

4.2 Results

4.2.1 Univariate sorts

We begin our analysis by graphically presenting the shape of the relation between major customers and voluntary disclosure. In Fig. 4, we plot average voluntary disclosure (in excess of the industry-year mean) for each MajorSale quintile. Consistent with a unimodal relation, we find that average voluntary disclosure is monotonically increasing in the first two quintiles of MajorSale, then decreases in the 3rd, 4th, and 5th quintiles. This relation is evident across all three measures of voluntary disclosure.

Fig. 4
figure 4

Voluntary disclosure by major customer quintile. This figure presents average voluntary disclosure, in excess of the industry-year mean, by major customer (MajorSale) quintile. See Table 5 for variable definitions

Table 6 reports the corresponding averages for each quintile. Consistent with a unimodal relation, we find that voluntary disclosure is monotonically increasing in the 1st and 2nd quintiles and declines sharply in the 4th and 5th quintiles. Moreover, the increase from the 1st to the 2nd quintiles is economically and statistically significant (p value <0.01), as is the decrease from the 4th to the 5th quintiles (p value <0.10).

Table 6 Voluntary disclosure by major customer quintile

Our univariate analyses consistently find a unimodal relation in the data. This suggests that evidence of a unimodal relation is not an artefact of a specific measurement choice or the specification of our subsequent regression tests, but rather a pattern that can be clearly observed in the raw data. Next, we conduct formal regression tests.

4.2.2 Polynomial regression

We formally test for a unimodal relation between major customers and voluntary disclosure by estimating polynomial regression models that include both linear and 2nd-order polynomial terms (e.g., MajorSale and MajorSale2). Specifically, we estimate the following polynomial regression specification:

$$ VolDisc={\beta}_1{MajorSale}^2+{\beta}_2 MajorSale+\gamma Controls+\varepsilon, $$
(8)

where all variables are as previously defined and the regression specification mirrors that in Eq. (6). If the relation is unimodal, we expect the coefficient on the 2nd-order polynomial term to be statistically significant and negative (β1 < 0).

Table 7 presents results from estimating Eq. (8). Columns (1), (3), and (5) present results with linear controls, and columns (2), (4), and (6) present results with both linear and polynomial control variables (i.e., including Controls2). Consistent with our predictions, we find negative and significant coefficient estimates on MajorSale2 across all six specifications (t-stats range between −3.06 and − 6.63). In addition, the signs and significance of our controls (e.g., positive coefficients on Size, Roa, and Leverage and negative coefficients on Mtb and Earnvol) are generally consistent with those in prior research.

Table 7 Polynomial regression results: major customers

4.2.3 Spline regression

We provide further evidence on the shape of the relation by estimating spline regressions that treat the relation as piecewise linear. Specifically, we estimate the relation on either side of a threshold level of major customers, τ, and test whether the relation below (above) the threshold is positive (negative):

$$ VolDisc={\beta}_1\left( MajorSale-\tau <0\right)+{\beta}_2\left( MajorSale-\tau \ge 0\right)+\gamma Controls+\varepsilon, $$
(9)

where all variables are as previously defined and the regression specification mirrors that in Eq. (7). We predict β1 > 0 and β2 < 0.

Table 8 presents results from estimating Eq. (9) using the mean value of MajorSale as the threshold (τ = 0.25). Consistent with a unimodal relation, across all three measures of voluntary disclosure, we find positive and significant slope estimates below the threshold (t-stats 7.02, 7.19, and 3.69) and negative and significant slope estimates above the threshold (t-stats −4.16, −2.29, and − 2.26). Collectively, the results in Tables 6, 7, and 8 provide robust empirical evidence that the shape of the relation is consistent with our theoretical predictions.

Table 8 Spline regression: major customers

5 Conclusion

Classical models of voluntary disclosure feature two independent economic forces: the existence of an adverse selection problem (e.g., a manager possesses some private information) and the cost of ameliorating the problem (e.g., costs associated with disclosure). In this paper, we develop a parsimonious model where these forces are jointly determined––the greater the adverse selection problem, the greater the cost of ameliorating the problem. We show that allowing for joint determination of adverse selection and disclosure costs leads to a richer and more nuanced set of empirical predictions than the standard linear predictions that have come to dominate the empirical literature.

We show that the predictions from the model empirically describe multiple measures of voluntary disclosure in two empirical settings that feature both greater adverse selection and greater proprietary costs––capital investments and major customers. In each setting we measure voluntary disclosure three ways and employ three sets of tests. Consistent with our theoretical predictions and the pervasiveness of the economic forces we study, across all measures and test specifications, we find a unimodal relation between capital investment (major customers) and voluntary disclosure. The probability of voluntary disclosure initially increases, reaches a peak, and then decreases. Taken together, our analysis provides robust empirical evidence that the probability of voluntary disclosure is unimodal. It also shows that modifying classical models of voluntary disclosure to allow for joint determination of private information and disclosure costs can accommodate this non-linear feature of the data.

The academic literature suggests that the world is best described with linear functions, but real world phenomena are rarely linear. We encourage future work to explore the non-linear relations present in the data and to develop hypotheses and predictions that push beyond the simple linear relations of past work. Descriptive evidence on the shape of the data-generating processes can help us understand the fundamental (unobserved) forces that are present in the data.