Biases from Non-simultaneous Regression with Correlated Covariates: A Case Study from Supernova Cosmology

Samantha Dixon

doi:10.1088/1538-3873/abef78

1. Introduction

Properties of Type Ia supernovae (SNe Ia) have been observed to be correlated with their absolute luminosities. Before accounting for these properties, the absolute brightnesses of typical SNe Ia vary by ∼0.4 mag. After accounting for correlations with the decay time of the light curve and the color of the object, their corrected absolute brightnesses are consistent to within ∼0.14 mag (Phillips 1993; Hamuy et al. 1996; Riess et al. 1996; Perlmutter et al. 1997; Tripp 1998). These calibrated brightness estimates make SNe Ia powerful cosmological distance indicators, and when combined with redshift measurements, allow us to map out the expansion history of the universe. This technique was instrumental in the discovery of the accelerating expansion of the universe (Riess et al. 1998; Perlmutter et al. 1999), and continues to serve as a powerful probe of the nature of the dark energy driving this acceleration.

A common analysis method for standardizing supernova brightnesses uses the SALT2 spectral model (Guy et al. 2007; Betoule et al. 2014; Mosher et al. 2014) to parameterize SN Ia light curves. The model parameters represent an individual supernova's peak apparent brightness in the Bessell B-band ( ${m}_{B}^{* }$ ), temporal width (x₁), and observed color (c). The distance modulus, μ, to each object i at redshift z_i is then modeled as a linear combination of these parameters:

$\begin{eqnarray}&&{\mu }_{i}({z}_{i})={m}_{B\ i}^{* }({z}_{i})-M+\alpha {x}_{1i}-\beta {c}_{i}.\end{eqnarray} \tag{ 1 }$

Typically, we would find the values of M, α, and β by minimizing the following quantity with respect to these parameters as well as the cosmological parameters of interest:

$\begin{eqnarray}&&{\chi }^{2}=\displaystyle \sum _{i}\displaystyle \frac{{\mu }_{i}({z}_{i};{m}_{B,i}^{* },{x}_{1,i},{c}_{i})-{\mu }_{\mathrm{cosmo}}({z}_{i};{\boldsymbol{\Theta }})}{{\sigma }_{\mathrm{obs},i}^{2}+{\sigma }_{\mathrm{int}}^{2}},\end{eqnarray} \tag{ 2 }$

where ${\mu }_{\mathrm{cosmo}}({z}_{i};{\boldsymbol{\Theta }})$ is the distance modulus-redshift relation determined by the cosmological parameters ${\boldsymbol{\Theta }}$ , and ${\sigma }_{\mathrm{obs},i}$ is the observational uncertainty of the measurements. ${\sigma }_{\mathrm{int}}$ is the intrinsic dispersion of standardized magnitudes, usually found by iteratively calculating the value of ${\sigma }_{\mathrm{int}}$ that ensures the minimum value of ${\chi }^{2}$ is equal to 1. This process is effectively a familiar linear regression.

The need to add an additional uncertainty term in the form of ${\sigma }_{\mathrm{int}}$ in Equation (2) suggests that the linear relationship between SALT2 parameters and absolute magnitude may not capture all of the variation in supernova magnitudes, or that the parameterization provided with SALT2 may not capture all of the information that is needed to fully standardize supernova magnitudes (Saunders et al. 2018). This motivates the search for other observable properties of SNe Ia that might explain this remaining variation, as well as the use of these other properties for standardization. One way to search for such properties is to measure correlations between these properties and the Hubble residuals ${\mu }_{i}({z}_{i};{m}_{B}^{* },{x}_{1},c)-{\mu }_{\mathrm{cosmo}}({z}_{i};{\boldsymbol{\Theta }})$ . A number of studies (Kelly et al. 2010; Lampeitl et al. 2010; Sullivan et al. 2010; Childress et al. 2013) have observed such a correlation with the host galaxy stellar mass: supernovae in galaxies with $\mathrm{log}({ \mathcal M }/{{ \mathcal M }}_{\odot })\gt 10$ are ∼0.1 mag brighter after standardization than supernovae in galaxies with $\mathrm{log}({ \mathcal M }/{{ \mathcal M }}_{\odot })\lt 10$ . Rigault et al. (2013, 2015), Childress et al. (2014) show that this effect could be due to similar correlations with host galaxy age. However, the significance of some of these correlations has been debated (e.g., Jones et al. 2015, 2018), indicating that care must be taken in making these measurements.

Reporting the size of correlations with the linear regression residuals is mathematically well-motivated if the covariate used to predict these residuals is not itself correlated with those used in the original regression (if, for example, host mass were not correlated with light curve parameters). However, if this key assumption is violated, we find ourselves in a situation referred to in the statistics and econometrics literature as multicollinearity (e.g., Farrar & Glauber 1967). Multicollinearity results in unreliable and biased estimates of effect sizes. A related concern discussed frequently in these fields is omitted variable bias, in which misspecification of the regression problem results in biased estimates of the true regression parameters (Clarke 2005; Wooldridge 2013). We can see an example of this in Rigault et al. (2018), where two different values of the size of the luminosity difference between supernovae in environments with differing star formation rates are found when measuring the step with a sequential regression versus a simultaneous regression.

This effect can also be seen in studies that use more sophisticated analysis techniques, like the BEAMS with Bias Corrections (BBC) technique of Kessler & Scolnic (2017). These methodologies begin by simulating the effects of changing populations of light curve parameters with redshift, selection effects, and contamination by core-collapse supernovae that were incorrectly typed by photometric classification algorithms. The results of these simulations are used to generate additional redshift-dependent corrections to the Hubble diagram and this corrected Hubble diagram is used to constrain cosmological parameters. The regression bias we discuss here would not be present in the results of these studies as long as the covariances between all parameters are included in the analysis. Indeed, Smith et al. (2020) found that the underestimate of the host galaxy mass step presented in Brout et al. (2019) could be rectified by including the correlation between host galaxy mass and SALT2 x₁ in the bias correction framework.

This work aims to explore and quantify the general impact of the non-simultaneous regression methodology used in some SN Ia analyses on reported effect sizes for both linear and step-function residual trends when multicollinearity exists. In Section 2, we work through an example using a generalized two-dimensional linear regression problem with correlated covariates. In Section 3, we analyze a similar model that includes a step function and compare the results to those obtained in the linear case. We then calculate the effect that this general mathematical model has in the particular case of measuring the host galaxy mass step using literature data of SALT2 parameters and host galaxy masses in Section 4 and conclude in Section 5 by identifying previous results that have overlooked this effect and recommending that future analyses use fully simultaneous regression techniques.

2. Toy Model: Two-dimensional Linear Regression with Correlated Covariates

We consider the following toy model: A series of n observations³ $\{({x}_{1}^{(1)},{x}_{2}^{(1)}),\cdots ,\,({x}_{1}^{(n)},{x}_{2}^{(n)})\}$ is drawn from a two-dimensional Gaussian distribution with $\mu =(0,0)$ and a covariance matrix given by

$\begin{eqnarray}{\rm{\Sigma }}=\left(\begin{array}{cc}{\sigma }_{1}^{2} & \rho {\sigma }_{1}{\sigma }_{2}\\ \rho {\sigma }_{1}{\sigma }_{2} & {\sigma }_{2}^{2}\end{array}\right)\end{eqnarray} \tag{ 3 }$

${\sigma }_{1}$ and ${\sigma }_{2}$ are the standard deviations of the observations in the x₁ and x₂ dimensions, respectively, and ρ is the Pearson correlation coefficient between them. They are not measurement errors, but measures of the natural spread in the distributions. We then define

$\begin{eqnarray}&&{y}_{i}={\beta }_{1}{x}_{1}^{(i)}+{\beta }_{2}{x}_{2}^{(i)}+{\epsilon }^{(i)}\end{eqnarray} \tag{ 4 }$

where ${\beta }_{1}$ and ${\beta }_{2}$ are the regression coefficients, and is a noise vector drawn from a univariate normal distribution ${ \mathcal N }(0,{\sigma }_{\mathrm{int}}^{2}+{\sigma }_{\mathrm{obs}}^{2})$ . This noise vector represents a combination of the intrinsic scatter in the model, as well as the observational measurement error. We can reformulate this as a matrix equation by denoting the data matrix as ${\boldsymbol{X}}=({{\boldsymbol{x}}}_{1},{{\boldsymbol{x}}}_{2})$ and the coefficient vector as ${\boldsymbol{\beta }}=({\beta }_{1},{\beta }_{2})$ , giving ${\boldsymbol{Y}}={\boldsymbol{X}}\beta +{\boldsymbol{\epsilon }}.$

Standard simultaneous two-dimensional least-squares regression gives us the estimated coefficient vector $\hat{{\boldsymbol{\beta }}}$ which minimizes the square of the residuals between the values predicted by the model ( $\hat{{\boldsymbol{Y}}}\equiv {\hat{\beta }}_{1}{{\boldsymbol{x}}}_{1}+{\hat{\beta }}_{2}{{\boldsymbol{x}}}_{x}\equiv {\boldsymbol{X}}\hat{\beta }$ ) and the data. As we show in Appendix A, these estimated values are

$\begin{eqnarray}&&\hat{{\boldsymbol{\beta }}}={\left({{\boldsymbol{X}}}^{T}{\boldsymbol{X}}\right)}^{-1}{{\boldsymbol{X}}}^{T}({\boldsymbol{X}}\beta +{\boldsymbol{\epsilon }}).\end{eqnarray} \tag{ 5 }$

Since the expectation value of ${\boldsymbol{\epsilon }}$ is 0 by definition, the expectation value of the recovered coefficients from simultaneous regression is identical to the coefficients ( $\langle \hat{{\boldsymbol{\beta }}}\rangle ={\boldsymbol{\beta }}$ ) regardless of the values of the regression coefficients, the covariance matrix components, or the size of the intrinsic scatter. We also show in Appendix A that the standard deviation of the residuals ( ${\boldsymbol{r}}\equiv \hat{{\boldsymbol{Y}}}-{\boldsymbol{Y}})$ is simply $\sqrt{{\sigma }_{\mathrm{int}}^{2}+{\sigma }_{\mathrm{obs}}^{2}}$ .

In summary, when treating this data set with a simultaneous linear regression, we can reliably recover both the true regression coefficients and intrinsic dispersion. Though there is some uncertainty on the values of the regression coefficients that does depend on the correlation between the covariates, this uncertainty is also inversely proportional to the number of samples fit in the regression and is therefore able to be controlled in the case where N is sufficiently large (see Equation (A2) in Appendix A).

However, oftentimes in SN Ia studies, we do not perform a full simultaneous fit of all of our regression parameters when including additional corrections. Instead, we fit the distance modulus as a linear function of SALT2 parameters and then add a correction to these distance moduli by fitting the distance modulus residuals as a function of some other parameter. This is conceptually analogous to performing this multivariate linear regression one covariate at a time.

We will show that in this case, no biases are introduced if there there is no correlation between the parameters used in the first regression and second regressions (i.e., $\rho =0$ ). However, if there is some correlation, we find that both the regression coefficients and the estimated scatter on the residuals are biased.

We introduce the notation we will use to treat this situation in our toy example. Without loss of generality, we can first fit ${\boldsymbol{Y}}$ as a function of ${{\boldsymbol{x}}}_{1}$ . The estimate of the slope will be denoted $\hat{{\beta }_{1}}^{\prime}$ (the prime serves to differentiate this value from the coefficient estimated from the full two-dimensional regression). The residuals of this regression will be denoted ${{\boldsymbol{r}}}_{1}$ . We then perform a second regression, predicting the residuals of the first regression ${{\boldsymbol{r}}}_{2}$ as a function of ${{\boldsymbol{x}}}_{2}$ . The slope in this case will similarly be denoted $\hat{{\beta }_{2}}^{\prime}$ , and the residuals will be denoted by ${{\boldsymbol{r}}}_{2}$ .

In Appendix B, we obtain the forms of the expectation values for the regression coefficients resulting from this process, finding that

$\begin{eqnarray*}&&\langle {\hat{\beta }}_{1}^{\prime} \rangle ={\beta }_{1}+\displaystyle \frac{{\beta }_{2}\rho {\sigma }_{2}}{{\sigma }_{1}}\quad \mathrm{and}\quad \langle {\hat{\beta }}_{2}^{\prime} \rangle ={\beta }_{2}-{\beta }_{2}{\rho }^{2}.\end{eqnarray*}$

As we can see, both slopes are biased if $\rho \ne 0$ . The size of the bias on both parameters is proportional to the size of the effect and the correlation between the covariates. Additionally, we can recognize that the bias on the first slope is identical to the omitted variable bias. This is expected, as performing this first part of the non-simultaneous regression perfectly simulates the textbook situation presented to describe the omitted variable bias.

We also calculate the spread of the final residuals in Appendix B, finding

$\begin{eqnarray}&&{\sigma }_{{{\boldsymbol{r}}}_{2}}^{2}={\beta }_{2}{\rho }^{2}{\sigma }_{2}^{2}(1-{\rho }^{2})+{\sigma }_{\mathrm{int}}^{2}+{\sigma }_{\mathrm{obs}}^{2}.\end{eqnarray} \tag{ 6 }$

The standard deviation on the residuals from this analysis, often reported as the root-mean-squared residuals, is in fact inflated by a value that scales quadratically with the correlation between the parameters and linearly with the size of the secondary effect. This bias is maximized for a given slope when $\rho =\sqrt{1/2}\approx 0.707$ .

3. Step Function Corrections

Many common analyses used in supernova cosmology do not use a linear model to correct the Hubble diagram residuals for host mass; they use a step function, motivated by the evolution of host galaxy stellar populations with redshift.⁴ We will modify the toy model presented in Section 2, and consider instead

$\begin{eqnarray}&&{y}_{i}=\alpha {x}_{1}^{(i)}+\displaystyle \frac{\gamma }{2}\mathrm{sgn}({x}_{2}^{(i)}).\end{eqnarray} \tag{ 7 }$

In the simultaneous case, the expected values of the best-fit regression coefficients $\hat{\alpha }$ and $\hat{\gamma }$ are equivalent to the true values. The proof of this is very similar to the proof for the bilinear toy model presented in Appendix A, so we do not present any further details here.

In Appendix C, we have worked through the non-simultaneous case where we fit the linear relationship first, followed by the step function correction to the resulting residuals. The expectation value of the best-fit linear slope ( $\hat{\alpha }^{\prime}$ ) is

$\begin{eqnarray}&&\langle \hat{\alpha }^{\prime} \rangle =\alpha +\displaystyle \frac{\gamma \rho }{{\sigma }_{1}\sqrt{2\pi }}.\end{eqnarray} \tag{ 8 }$

The expected step size obtained from the residuals after correcting for the linear relationship is

$\begin{eqnarray}&&\langle \hat{\gamma }^{\prime} \rangle =\gamma \left(1-\displaystyle \frac{2{\rho }^{2}}{\pi }\right),\end{eqnarray} \tag{ 9 }$

and the spread of the remaining residuals is

$\begin{eqnarray}&&{\sigma }_{{{\boldsymbol{r}}}_{\beta }}^{2}=\displaystyle \frac{{\gamma }^{2}{\rho }^{2}}{2\pi }\left(1-\displaystyle \frac{2{\rho }^{2}}{\pi }\right)+{\sigma }_{\mathrm{int}}^{2}+{\sigma }_{\mathrm{obs}}^{2}.\end{eqnarray} \tag{ 10 }$

So, using a step-function secondary correction gives us similar biases to the linear secondary correction. The size of the step is underestimated by a factor that scales quadratically with the correlation coefficient between covariates and linearly with the true step size. Additionally, the size of the linear correction term is overestimated by a factor that scales linearly with the step size and the correlation coefficient. Finally, the variance of the residuals after correction is inflated by a factor that scales similarly. The bias on the variance of the residuals is maximal when $\rho =\sqrt{\pi /4}\approx 0.886$ .

4. Comparison to Data

The actual distributions of x₁, c, and ${{ \mathcal M }}_{\mathrm{host}}$ found in the data are not purely Gaussian, as they are in our toy models. This means that we can no longer derive closed-form relations describing the impact of non-simultaneous fitting. We can, however, simulate the effects. We do so using the published values of x₁, c, and $\mathrm{log}({{ \mathcal M }}_{\mathrm{host}}/{{ \mathcal M }}_{\odot })$ from the low- and mid-redshift samples of supernovae from the first three years of the Dark Energy Survey (Abbott et al. 2019, hereafter referred to as the Low-z and DES subsamples), along with the Pantheon data set (Scolnic et al. 2018), which combines spectroscopically classified supernovae from PanSTARRS supernovae (PS1; Rest et al. 2014; Scolnic et al. 2014b) with supernovae from the SuperNova Legacy Survey (SNLS; Conley et al. 2011; Sullivan et al. 2011) and the Sloan Digital Sky Survey (SDSS; Frieman et al. 2008; Kessler et al. 2009; Sako et al. 2018).⁵ Each of these data sets shows a fairly strong correlation between x₁ and host mass, as seen in Table 1, so we can expect to find non-simultaneous regression biases.

Table 1. Pearson Correlation Coefficients between SALT2 Parameters and Host Galaxy Masses (Measured as $\mathrm{log}({{ \mathcal M }}_{\mathrm{host}}/{{ \mathcal M }}_{\odot })$ )

Data Set	${\rho }_{{x}_{1},c}$	${\rho }_{{x}_{1},\mathrm{mass}}$	${\rho }_{c,\mathrm{mass}}$
DES	−0.087	−0.371	0.1811
PS1	−0.041	−0.248	0.0610
SDSS	−0.035	−0.297	0.0002
SNLS	0.016	−0.304	0.0629
Low-z	0.130	−0.347	−0.1052

Note. Each data set shows a relatively strong correlation between x₁ and mass, indicating that biases can be introduced from non-simultaneous regression.

Download table as: ASCII Typeset image

To simulate the magnitude of these effects with the non-Gaussian distributions found in the data, we begin by modeling δ, a quantity akin to the Hubble residuals without any corrections for the light curve shape or color parameters and assuming a fixed cosmology:

$\begin{eqnarray}&&\delta =\alpha {x}_{1}-\beta c+\displaystyle \frac{\gamma }{2}\mathrm{sgn}\left[\mathrm{log}\left(\displaystyle \frac{{{ \mathcal M }}_{\mathrm{host}}}{{{ \mathcal M }}_{\odot }}\right)-10\right]+\epsilon \end{eqnarray} \tag{ 11 }$

here is a Gaussian distributed noise vector with variance ${\sigma }_{\mathrm{noise}}^{2}$ . We use each data set's values of x₁, c, and ${{ \mathcal M }}_{\mathrm{host}}$ to calculate 50 instances of δ for each of approximately 12,000 different combinations of the standardization parameters α, β, γ, and ${\sigma }_{\mathrm{int}}$ in the ranges listed in Table 2. We are motivated to simulate various combinations of the regression coefficients and noise levels by our toy model, which showed that each of these values is intrinsically linked to the others. With each of these data sets, we performed a fully simultaneous regression of all parameters, as well as a non-simultaneous regression of the host galaxy mass correction parameter γ after a simultaneous regression of the light curve parameters α and β, in order to infer the size of the biases.

Table 2. Ranges for the Standardization Hyperparameters Used in the Simulation Analysis

Parameter	Range
α	$(0.05,0.25)$
β	$(2.5,3.5)$
γ	$(-0.1,0.1)$
${\sigma }_{\mathrm{noise}}$	$(0,0.2)$

Download table as: ASCII Typeset image

Each of these simulations gives us a table of true values of α, β, and γ (used in the calculation of δ), the simultaneous best-fit values $\hat{\alpha }$ , $\hat{\beta }$ , and $\hat{\gamma }$ , as well as the non-simultaneous best-fit values $\hat{\alpha }^{\prime}$ , $\hat{\beta }^{\prime}$ , and $\hat{\gamma }^{\prime}$ . Regardless of true parameter values, the simultaneous fit parameters all match the true parameters. The magnitude of the error on the non-simultaneous best-fit parameters depends on the data subset in question, because of the differing distributions of the light curve parameters and host galaxy masses, as well as on the true values of the regression parameters α, β, and γ. We note that the relationships between the true standardization parameters and the non-simultaneously determined standardization parameters are all linear, i.e.,

$\begin{eqnarray}&&\gamma ={c}_{\gamma ,0}+\displaystyle \sum _{i\in \{\hat{\alpha }^{\prime} ,\hat{\beta }^{\prime} ,\hat{\gamma }^{\prime} \}}{c}_{\gamma ,i}i,\end{eqnarray} \tag{ 12 }$

where the c values are linear coefficients that can be measured from the simulation data sets.⁶ Similar relationships exist for α and β as well. This is not unexpected; we can find this linear relationship by rearranging the toy model results of Equations (8) and (9):

$\begin{eqnarray}&&\alpha =\hat{\alpha }^{\prime} -\sqrt{\displaystyle \frac{\pi }{2{\sigma }_{1}^{2}}}\displaystyle \frac{\rho }{1-{\rho }^{2}}\hat{\gamma }^{\prime} \qquad \gamma =\displaystyle \frac{\pi }{1-2{\rho }^{2}}\hat{\gamma }^{\prime} .\end{eqnarray} \tag{ 13 }$

Non-zero values of coefficients other than ${c}_{x,0}$ indicate that there is "leakage" from one standardization parameter to the other; for example, if ${c}_{\gamma ,\hat{\alpha }^{\prime} }\ne 0$ , then the size of the α correction impacts the reported size of the γ corrections. Moreover, these coefficients define a linear transformation between the true regression parameters and those coming from a non-simultaneous fit, so the inverse of these transformations can be used to correct previous non-simultaneous regressions. The transformations we obtained from our simulations are presented in Table 3, where we have omitted columns that are expected to be 0 (e.g., ${c}_{\alpha ,\hat{\beta }^{\prime} }$ ) or 1 (e.g., ${c}_{\alpha ,\hat{\alpha }^{\prime} }$ ).

Table 3. Linear Transformation Coefficients (see Equation (12)) between the Standardization Hyperparameter α, Representing the Light Curve Shape-Luminosity Correction, obtained with a Non-simultaneous Fit and the True Values

Data Set	${c}_{\alpha ,\hat{\gamma }^{\prime} }$	${c}_{\beta ,\hat{\gamma }^{\prime} }$	${c}_{\gamma ,\hat{\gamma }^{\prime} }$
DES	0.335	−0.702	1.302
PS1	0.135	−0.607	1.111
SDSS	0.125	−0.134	1.237
SNLS	0.203	−0.565	1.14
Low-z	0.194	$+1.258$	2.072

Download table as: ASCII Typeset image

These values that are found from our simulations are roughly consistent with the closed-form solutions (Equation (13)). Data sets with the largest correlations between light curve parameters and host galaxy mass also exhibit the most leakage between the corresponding standardization parameters and host mass step size. The remaining differences between the values predicted by Equation (13) and the values in Table 3 are due to the non-Gaussianity of the x₁, c, and ${{ \mathcal M }}_{\mathrm{host}}$ distributions.

We can see that there is significant leakage between the size of the host mass step and the stretch and color standardization parameters α and β. Multiplying the coefficients relating the non-simultaneously obtained step-size by the typical size of the measured step (0.07 mag.), we can see that this leakage results in a 5%–10% error on the typical size (0.14) of the stretch parameter α and a ∼1% error on the typical size (3.0) of the color parameter β.

More importantly, the coefficients relating the non-simultaneous step size to the true step size are greater than one for each data set. This means that by fitting the step function separately from other corrections, the true size of the step is underestimated by 10%–30%, and by a factor of two for the Low-z subsample.

The linear transformations presented here can be used directly to correct step sizes obtained from simple non-simultaneous regressions. These simulations do not, however, account for selection effects, redshift-dependent drifts in the populations of the light curve parameters, or models of intrinsic scatter. A number of simulation studies (e.g., Marriner et al. 2011; Scolnic et al. 2014a, 2014b; Scolnic & Kessler 2016) have shown that measurements of the standardization parameters (i.e., α and β) can be biased by inaccurate specifications of the distributions of the supernova parameters (x₁ and c). These biases can be corrected using simulations that include these selection effects or avoided using a full Bayesian methodology (as is done in Rubin et al. 2015, for example). A complete treatment of all of these effects is outside the scope of this work, but the results we have shown here, along with the discussion of the relationship between the host galaxy mass step and bias corrections in Smith et al. (2020), emphasize that a complete supernova cosmology analysis must specify all covariances completely and determine the luminosity correction parameters simultaneously to avoid bias.

5. Conclusions

We have worked through a pedagogical example to show that performing linear regression one covariate at a time produces biased estimates of both the regression coefficients and spread of residuals when the covariates are correlated. The sizes of these biases depend directly on the magnitude of the correlation, and there are linear relationships between the error on the estimated slopes and the size of the factor that inflates the estimate of the spread of the remaining scatter. We have proven that similar relationships also hold when fitting step functions to the residuals of a linear regression (as is sometimes done in supernova cosmology) if there are correlations between the parameters being fit in each step.

We have also presented numerical simulations based on observed data to find corrections to the biases that are introduced from non-simultaneous regression methods. Each data set studied shows the possibility of a large underestimate of the size of the host mass step regardless of values of other nuisance parameters. There are also minor biases in the model parameters governing the relationship between luminosity and light curve width (SALT2 α) and luminosity and color (SALT2 β).

Biases are introduced when the assumptions underlying an analysis method are overlooked. In this particular case, there is an implicit assumption that all covariates must be uncorrelated in order to prevent biases from performing a two-step regression. A number of studies (e.g., Kelly et al. 2010; Sullivan et al. 2010; Childress et al. 2013; Jones et al. 2015, 2018; Rose et al. 2019; Kelsey et al. 2020) have neglected this effect, leading to underestimated sizes and significances of the effect sizes they report. Both Rigault et al. (2018) and Jones et al. (2018) find that simultaneous fits of the light curve and host galaxy standardization parameters result in a larger step size and reduced Hubble residual dispersion, though the latter analysis reports the non-simultaneous fits as its main result. This effect can be entirely explained by the effect we have discussed in this work.

For the most part, recent cosmology analyses, (e.g., Betoule et al. 2014; Scolnic et al. 2018), do properly account for this effect by fitting for the host mass step size simultaneously with the other standardization parameters. It is additionally worth noting that correcting for this bias in reported step sizes is unlikely to resolve the current tension in measurements of the Hubble constant—a 20% underestimate of the size of the mass step propagates to $\lt 1$ % change in the value of H₀. However, it is not yet clear if the host mass correlations are properly accounted for in the bias corrections, as suggested in Smith et al. (2020). Overall, care must be taken in presenting the size and significance of these relationships and propagating these correlations throughout the analysis. The biases presented here can be easily avoided by fitting all nuisance parameters simultaneously and fully specifying all covariances when presenting measurements of luminosity corrections beyond light curve parameters.

The author would like to thank Saul Perlmutter, Greg Aldering, and Ben Rose for comments on this manuscript and helpful discussions. We also thank the anonymous referee for their time and attention. S.D. acknowledges support, in part, from NASA through grant NNG16PJ311I.

Software: Matplotlib (Hunter 2007), Numpy (Oliphant 2006), Pandas (McKinney 2010), Python, scikit-learn (Pedregosa et al. 2011), SciPy (Virtanen et al. 2020).

Appendix A: Derivation of Regression Parameters in the Simultaneous Case

In ordinary least-squares regression, our goal is to minimize the loss function, L, defined by the square of the residuals between the y values predicted by our model ( $\hat{{\boldsymbol{Y}}}\equiv {\hat{\beta }}_{1}{{\boldsymbol{x}}}_{1}+{\hat{\beta }}_{2}{{\boldsymbol{x}}}_{2}\equiv {\boldsymbol{X}}\hat{{\boldsymbol{\beta }}}$ ) and the data. This is equivalent to maximizing the likelihood assuming the residuals are Gaussian

$\begin{eqnarray*}\begin{array}{rcl}L & = & | | \hat{{\boldsymbol{Y}}}-{\boldsymbol{Y}}| {| }^{2}\\ & = & {\left({\boldsymbol{X}}\hat{{\boldsymbol{\beta }}}-{\boldsymbol{Y}}\right)}^{T}({\boldsymbol{X}}\hat{{\boldsymbol{\beta }}}-{\boldsymbol{Y}})\\ & = & {\hat{{\boldsymbol{\beta }}}}^{T}{{\boldsymbol{X}}}^{T}{\boldsymbol{X}}\hat{{\boldsymbol{\beta }}}-{\hat{{\boldsymbol{\beta }}}}^{T}{{\boldsymbol{X}}}^{T}{\boldsymbol{Y}}-{{\boldsymbol{Y}}}^{T}{\boldsymbol{X}}\hat{{\boldsymbol{\beta }}}+{{\boldsymbol{Y}}}^{T}{\boldsymbol{Y}}.\end{array}\end{eqnarray*}$

We can minimize this by taking the gradient as a function of $\hat{{\boldsymbol{\beta }}}$ and setting it equal to zero

$\begin{eqnarray*}\begin{array}{rcl}\displaystyle \frac{\partial L}{\partial \hat{{\boldsymbol{\beta }}}} & = & {\left({{\boldsymbol{X}}}^{T}{\boldsymbol{X}}\hat{{\boldsymbol{\beta }}}\right)}^{T}+{\hat{{\boldsymbol{\beta }}}}^{T}{{\boldsymbol{X}}}^{T}{\boldsymbol{X}}-{\left({{\boldsymbol{X}}}^{T}{\boldsymbol{Y}}\right)}^{T}-{{\boldsymbol{Y}}}^{T}{\boldsymbol{X}}\\ & = & 2{\hat{{\boldsymbol{\beta }}}}^{T}{{\boldsymbol{X}}}^{T}{\boldsymbol{X}}-2{{\boldsymbol{Y}}}^{T}{\boldsymbol{X}}\end{array}\end{eqnarray*}$

$\begin{eqnarray*}&&\displaystyle \frac{\partial L}{\partial \hat{{\boldsymbol{\beta }}}}=0\Rightarrow \hat{{\boldsymbol{\beta }}}={\left({{\boldsymbol{X}}}^{T}{\boldsymbol{X}}\right)}^{-1}{{\boldsymbol{X}}}^{T}{\boldsymbol{Y}}.\end{eqnarray*}$

Plugging in our definition of ${\boldsymbol{Y}}$ , we get

$\begin{eqnarray}&&\hat{{\boldsymbol{\beta }}}={\left({{\boldsymbol{X}}}^{T}{\boldsymbol{X}}\right)}^{-1}{{\boldsymbol{X}}}^{T}({\boldsymbol{X}}\beta +{\boldsymbol{\epsilon }}).\end{eqnarray} \tag{ A1 }$

Since, by definition, $\langle {\boldsymbol{\epsilon }}\rangle =0$ , $\langle \hat{{\boldsymbol{\beta }}}\rangle =\beta$ , we can show that the spread of the residuals ( ${\boldsymbol{r}}\equiv \hat{{\boldsymbol{Y}}}-{\boldsymbol{Y}}$ ) is simply $\sqrt{{\sigma }_{\mathrm{int}}^{2}+{\sigma }_{\mathrm{obs}}^{2}}$ :

$\begin{eqnarray*}\begin{array}{rcl}\mathrm{var}({\boldsymbol{r}}) & = & \langle {{\boldsymbol{r}}}^{2}\rangle -\langle {\boldsymbol{r}}{\rangle }^{2}\\ & = & \langle ({\boldsymbol{X}}\hat{{\boldsymbol{\beta }}}-{\boldsymbol{X}}\beta -{\boldsymbol{\epsilon }}){\left({\boldsymbol{X}}\hat{{\boldsymbol{\beta }}}-{\boldsymbol{X}}\beta -{\boldsymbol{\epsilon }}\right)}^{T}\rangle ]\\ & & -\langle ({\boldsymbol{X}}\hat{{\boldsymbol{\beta }}}-{\boldsymbol{X}}\beta -{\boldsymbol{\epsilon }}){\rangle }^{2}\\ & = & \langle {\boldsymbol{\epsilon }}{{\boldsymbol{\epsilon }}}^{T}\rangle -\langle {\boldsymbol{\epsilon }}{\rangle }^{2}\\ & = & \mathrm{var}({\boldsymbol{\epsilon }})={\sigma }_{\mathrm{int}}^{2}+{\sigma }_{\mathrm{obs}}^{2}.\end{array}\end{eqnarray*}$

The variance on these regression coefficients can also be calculated. First, we calculate $\langle {\hat{{\boldsymbol{\beta }}}}^{2}\rangle$ :

$\begin{eqnarray*}\begin{array}{rcl}\langle {\hat{{\boldsymbol{\beta }}}}^{2}\rangle & = & \langle \hat{{\boldsymbol{\beta }}}{\hat{{\boldsymbol{\beta }}}}^{T}\rangle \\ & = & \langle {\left({{\boldsymbol{X}}}^{T}{\boldsymbol{X}}\right)}^{-1}{{\boldsymbol{X}}}^{T}{\rm{\Gamma }}{{\boldsymbol{\Gamma }}}^{T}{\boldsymbol{X}}{\left({{\boldsymbol{X}}}^{T}{\boldsymbol{X}}\right)}^{-1}\rangle \\ & = & \langle {\left({{\boldsymbol{X}}}^{T}{\boldsymbol{X}}\right)}^{-1}{{\boldsymbol{X}}}^{T}({\boldsymbol{X}}\beta +{\boldsymbol{\epsilon }})({{\boldsymbol{\beta }}}^{T}{{\boldsymbol{X}}}^{T}+{\boldsymbol{\epsilon }}){\boldsymbol{X}}{\left({{\boldsymbol{X}}}^{T}{\boldsymbol{X}}\right)}^{-1}\rangle \\ & = & {\boldsymbol{\beta }}{{\boldsymbol{\beta }}}^{T}+{\sigma }_{\mathrm{int}}^{2}{\left({{\boldsymbol{X}}}^{T}{\boldsymbol{X}}\right)}^{-1}.\end{array}\end{eqnarray*}$

Then, by using the definition $\langle \hat{{\boldsymbol{\beta }}}{\rangle }^{2}={\boldsymbol{\beta }}{{\boldsymbol{\beta }}}^{T}$ , we have

$\begin{eqnarray*}&&\mathrm{var}(\hat{{\boldsymbol{\beta }}})=\langle {\hat{{\boldsymbol{\beta }}}}^{2}\rangle -\langle \hat{{\boldsymbol{\beta }}}{\rangle }^{2}=({\sigma }_{\mathrm{int}}^{2}+{\sigma }_{\mathrm{obs}}^{2}){\left({{\boldsymbol{X}}}^{T}{\boldsymbol{X}}\right)}^{-1}.\end{eqnarray*}$

Calculating the individual components of this variance matrix in our two-dimensional case gives

$\begin{eqnarray}\begin{array}{rcl}\mathrm{var}(\hat{{\beta }_{1}}) & = & \displaystyle \frac{{\sigma }_{\mathrm{int}}^{2}+{\sigma }_{\mathrm{obs}}^{2}}{N{\sigma }_{1}^{2}\left(1-{\rho }^{2}\right)}\quad \mathrm{and}\\ \mathrm{var}(\hat{{\beta }_{2}}) & = & \displaystyle \frac{{\sigma }_{\mathrm{int}}^{2}+{\sigma }_{\mathrm{obs}}^{2}}{N{\sigma }_{2}^{2}\left(1-{\rho }^{2}\right)}.\end{array}\end{eqnarray} \tag{ A2 }$

Appendix B: Derivation of Biases on Regression Parameters in the Non-simultaneous Case

We can modify Equation (5) to obtain the predicted value of the slope in the first fit:

$\begin{eqnarray*}\begin{array}{rcl}\langle {\hat{\beta }}_{1}^{\prime} \rangle & = & \langle {\left({{\boldsymbol{x}}}_{1}^{T}{{\boldsymbol{x}}}_{1}\right)}^{-1}{{\boldsymbol{x}}}_{1}^{T}{\boldsymbol{Y}}\rangle \\ & = & \langle {\left({{\boldsymbol{x}}}_{1}^{T}{{\boldsymbol{x}}}_{1}\right)}^{-1}{{\boldsymbol{x}}}_{1}^{T}{{\boldsymbol{x}}}_{1}{\beta }_{1}+{\left({{\boldsymbol{x}}}_{1}^{T}{{\boldsymbol{x}}}_{1}\right)}^{-1}{{\boldsymbol{x}}}_{1}^{T}{{\boldsymbol{x}}}_{2}{\beta }_{2}+{\left({{\boldsymbol{x}}}_{1}^{T}{{\boldsymbol{x}}}_{1}\right)}^{-1}{{\boldsymbol{x}}}_{1}^{T}{\boldsymbol{\epsilon }}\rangle \\ & = & {\beta }_{1}+{\beta }_{2}\langle {\left({{\boldsymbol{x}}}_{1}^{T}{{\boldsymbol{x}}}_{1}\right)}^{-1}{{\boldsymbol{x}}}_{1}^{T}{{\boldsymbol{x}}}_{2}\rangle \\ & = & {\beta }_{1}+\displaystyle \frac{{\beta }_{2}\rho {\sigma }_{2}}{{\sigma }_{1}}.\end{array}\end{eqnarray*}$

The residuals from this first regression are

$\begin{eqnarray*}\begin{array}{rcl}{{\boldsymbol{r}}}_{1} & = & {\boldsymbol{Y}}-{\hat{{\boldsymbol{Y}}}}_{1}\\ & = & {\beta }_{1}{{\boldsymbol{x}}}_{1}+{\beta }_{2}{{\boldsymbol{x}}}_{2}+{\boldsymbol{\epsilon }}-{\hat{\beta }}_{1}^{{\prime} }{{\boldsymbol{x}}}_{1}\\ & = & {\beta }_{2}{{\boldsymbol{x}}}_{2}-\displaystyle \frac{{\beta }_{2}\rho {\sigma }_{2}}{{\sigma }_{1}}{{\boldsymbol{x}}}_{1}+{\boldsymbol{\epsilon }}.\end{array}\end{eqnarray*}$

We can go through a similar analysis to find the predicted secondary effect from fitting the residuals of the first regression ${{\boldsymbol{r}}}_{1}$ as a function of ${{\boldsymbol{x}}}_{2}$ . This gives

$\begin{eqnarray*}\begin{array}{rcl}\langle {\hat{\beta }}_{2}^{\prime} \rangle & = & \langle {\left({{\boldsymbol{x}}}_{2}^{T}{{\boldsymbol{x}}}_{2}\right)}^{-1}{{\boldsymbol{x}}}_{2}^{T}{{\boldsymbol{r}}}_{1}\rangle \\ & = & \Space{0ex}{1.15ex}{0ex}\langle {\left({{\boldsymbol{x}}}_{2}^{T}{{\boldsymbol{x}}}_{2}\right)}^{-1}{{\boldsymbol{x}}}_{2}^{T}\left({\beta }_{2}{{\boldsymbol{x}}}_{2}-\displaystyle \frac{{\beta }_{2}\rho {\sigma }_{2}}{{\sigma }_{1}}{{\boldsymbol{x}}}_{1}+{\boldsymbol{\epsilon }}\right)\Space{0ex}{1.15ex}{0ex}\rangle \\ & = & {\beta }_{2}-{\beta }_{2}{\rho }^{2}.\end{array}\end{eqnarray*}$

Calculating the final residuals gives

$\begin{eqnarray*}\begin{array}{rcl}{{\boldsymbol{r}}}_{2} & = & {{\boldsymbol{r}}}_{1}-{\hat{{\boldsymbol{r}}}}_{1}\\ & = & {\beta }_{2}{{\boldsymbol{x}}}_{2}-\displaystyle \frac{{\beta }_{2}\rho {\sigma }_{2}}{{\sigma }_{1}}{{\boldsymbol{x}}}_{1}+{\boldsymbol{\epsilon }}-{\hat{\beta }}_{2}^{\prime} {{\boldsymbol{x}}}_{2}\\ & = & -\displaystyle \frac{{\beta }_{2}\rho {\sigma }_{2}}{{\sigma }_{1}}{{\boldsymbol{x}}}_{1}+{\beta }_{2}{\rho }^{2}{{\boldsymbol{x}}}_{2}+{\boldsymbol{\epsilon }},\end{array}\end{eqnarray*}$

and using the typical propagation of uncertainty formulae to find the variance of these residuals, we obtain

$\begin{eqnarray*}\begin{array}{rcl}{\sigma }_{{{\boldsymbol{r}}}_{2}}^{2} & = & \displaystyle \frac{{\beta }_{2}^{2}{\rho }^{2}{\sigma }_{2}^{2}}{{\sigma }_{1}^{2}}{\sigma }_{1}^{2}+{\beta }_{2}^{2}{\rho }^{4}{\sigma }_{2}^{2}-2\displaystyle \frac{{\beta }_{2}^{2}{\rho }^{3}{\sigma }_{2}}{{\sigma }_{1}}\rho {\sigma }_{1}{\sigma }_{2}+{\sigma }_{\mathrm{int}}^{2}\\ & = & {\beta }_{2}^{2}{\rho }^{2}{\sigma }_{2}^{2}\left(1-{\rho }^{2}\right)+{\sigma }_{\mathrm{int}}^{2}.\end{array}\end{eqnarray*}$

Appendix C: Step Function Correction Derivations

Again, we start with Equation (5) to obtain the expected value of the best-fit slope from the linear portion of the fit

$\begin{eqnarray*}\begin{array}{rcl}\langle \hat{\alpha }^{\prime} \rangle & = & \langle {\left({{\boldsymbol{x}}}_{1}^{T}{{\boldsymbol{x}}}_{1}\right)}^{-1}{{\boldsymbol{x}}}_{1}^{T}{\boldsymbol{Y}}\rangle \\ & = & \left\langle {\left({{\boldsymbol{x}}}_{1}^{T}{{\boldsymbol{x}}}_{1}\right)}^{-1}{{\boldsymbol{x}}}_{1}^{T}{{\boldsymbol{x}}}_{1}\alpha +{\left({{\boldsymbol{x}}}_{1}^{T}{{\boldsymbol{x}}}_{1}\right)}^{-1}{{\boldsymbol{x}}}_{1}^{T}\mathrm{sgn}({{\boldsymbol{x}}}_{2})\displaystyle \frac{\gamma }{2}+{\left({{\boldsymbol{x}}}_{1}^{T}{{\boldsymbol{x}}}_{1}\right)}^{-1}{{\boldsymbol{x}}}_{1}^{T}{\boldsymbol{\epsilon }}\right\rangle \\ & = & \alpha +\displaystyle \frac{\gamma }{2{\sigma }_{1}^{2}}\langle {{\boldsymbol{x}}}_{1}^{T}\mathrm{sgn}({{\boldsymbol{x}}}_{2})\rangle \\ & = & \alpha +\displaystyle \frac{\gamma \rho }{{\sigma }_{1}\sqrt{2\pi }}.\end{array}\end{eqnarray*}$

The proof of the final step is as follows, where $p({x}_{1},{x}_{2})$ is a bivariate Gaussian distribution with mean $(0,0)$ and covariance matrix like that in Equation (3)

$\begin{eqnarray}\begin{array}{rcl}\langle {{\boldsymbol{x}}}_{1}\mathrm{sgn}({{\boldsymbol{x}}}_{2})\rangle & = & {\int }_{-\infty }^{\infty }{\int }_{-\infty }^{\infty }{x}_{1}\mathrm{sgn}({x}_{2})p({x}_{1},{x}_{2}){{dx}}_{1}{{dx}}_{2}\\ & = & {\int }_{-\infty }^{0}{\int }_{-\infty }^{\infty }-{x}_{1}p({x}_{1},{x}_{2}){{dx}}_{1}{{dx}}_{2}\\ & & +\,{\int }_{0}^{\infty }{\int }_{-\infty }^{\infty }{x}_{1}p({x}_{1},{x}_{2}){{dx}}_{1}{{dx}}_{2}\\ & = & 2{\int }_{0}^{\infty }{\int }_{-\infty }^{\infty }{x}_{1}p({x}_{1},{x}_{2}){{dx}}_{1}{{dx}}_{2}\\ & = & \displaystyle \frac{1}{\pi {\sigma }_{1}{\sigma }_{2}\sqrt{1-{\rho }^{2}}}{\int }_{0}^{\infty }{\int }_{-\infty }^{\infty }{x}_{1}\\ & & \times \exp \left[-\displaystyle \frac{1}{2(1-{\rho }^{2})}\left(\displaystyle \frac{{x}_{1}^{2}}{{\sigma }_{1}^{2}}+\displaystyle \frac{{x}_{2}^{2}}{{\sigma }_{2}^{2}}-\displaystyle \frac{2\rho {x}_{1}{x}_{2}}{{\sigma }_{1}{\sigma }_{2}}\right)\right]{{dx}}_{1}{{dx}}_{2}\\ & = & \sqrt{\displaystyle \frac{2}{\pi }}\rho {\sigma }_{1}.\end{array}\end{eqnarray} \tag{ C1 }$

The residuals that remain after correcting for the linear slope are

$\begin{eqnarray*}\begin{array}{rcl}{{\boldsymbol{r}}}_{\alpha } & = & {\boldsymbol{Y}}-{\hat{{\boldsymbol{Y}}}}_{\alpha }\\ & = & \alpha {{\boldsymbol{x}}}_{1}+\displaystyle \frac{\gamma }{2}\mathrm{sgn}({{\boldsymbol{x}}}_{2})+{\boldsymbol{\epsilon }}-\hat{\alpha }^{\prime} {{\boldsymbol{x}}}_{1}\\ & = & \displaystyle \frac{\gamma }{2}\mathrm{sgn}({{\boldsymbol{x}}}_{2})-\displaystyle \frac{\gamma \rho }{{\sigma }_{1}\sqrt{2\pi }}{{\boldsymbol{x}}}_{1}+{\boldsymbol{\epsilon }}.\end{array}\end{eqnarray*}$

We can find what the step size γ would be when fit to these residuals by finding the value of $\hat{\gamma }^{\prime}$ that minimizes $L={\parallel {{\boldsymbol{r}}}_{\alpha }-\tfrac{\hat{\gamma }^{\prime} }{2}\mathrm{sgn}({{\boldsymbol{x}}}_{2})\parallel }^{2}$

$\begin{eqnarray*}\begin{array}{rcl}L & = & {\parallel {{\boldsymbol{r}}}_{\alpha }-\displaystyle \frac{\hat{\gamma }^{\prime} }{2}\mathrm{sgn}({{\boldsymbol{x}}}_{2})\parallel }^{2}\\ & = & {{\boldsymbol{r}}}_{\alpha }^{2}-\hat{\gamma }^{\prime} {{\boldsymbol{r}}}_{\alpha }\mathrm{sgn}({{\boldsymbol{x}}}_{2})+\displaystyle \frac{{\hat{\gamma }}^{{\prime} 2}}{4}\\ \displaystyle \frac{\partial L}{\partial \hat{\gamma }^{\prime} } & = & -{{\boldsymbol{r}}}_{\alpha }\mathrm{sgn}({{\boldsymbol{x}}}_{2})+\displaystyle \frac{\hat{\gamma }^{\prime} }{2}.\end{array}\end{eqnarray*}$

Setting this derivative to zero, we find

$\begin{eqnarray*}\begin{array}{rcl}\hat{\gamma }^{\prime} & = & 2{{\boldsymbol{r}}}_{\alpha }\mathrm{sgn}({{\boldsymbol{x}}}_{2})\\ & = & \gamma -\displaystyle \frac{2\gamma \rho }{{\sigma }_{1}\sqrt{2\pi }}{{\boldsymbol{x}}}_{1}\mathrm{sgn}({{\boldsymbol{x}}}_{2})+2{\boldsymbol{\epsilon }}\mathrm{sgn}({{\boldsymbol{x}}}_{2}).\end{array}\end{eqnarray*}$

The expectation value is

$\begin{eqnarray*}\begin{array}{rcl}\langle \hat{\gamma }^{\prime} \rangle & = & \gamma -\displaystyle \frac{2\gamma \rho }{{\sigma }_{1}\sqrt{2\pi }}\langle {{\boldsymbol{x}}}_{1}\mathrm{sgn}({{\boldsymbol{x}}}_{2})\rangle +2\langle {\boldsymbol{\epsilon }}\mathrm{sgn}({{\boldsymbol{x}}}_{2})\rangle \\ & = & \gamma -\displaystyle \frac{2\gamma {\rho }^{2}}{\pi }\end{array}\end{eqnarray*}$

where we used the result of Equation (C1) to evaluate $\langle {{\boldsymbol{x}}}_{1}\mathrm{sgn}({{\boldsymbol{x}}}_{2})\rangle$ . Our final residuals after the two-step regression are then

$\begin{eqnarray*}\begin{array}{rcl}{{\boldsymbol{r}}}_{\beta } & = & {{\boldsymbol{r}}}_{\alpha }-{\hat{{\boldsymbol{r}}}}_{\alpha }\\ & = & \displaystyle \frac{\gamma }{2}\mathrm{sgn}({{\boldsymbol{x}}}_{2})-\displaystyle \frac{\gamma \rho }{{\sigma }_{1}\sqrt{2\pi }}{{\boldsymbol{x}}}_{1}+{\boldsymbol{\epsilon }}-\displaystyle \frac{\gamma }{2}\mathrm{sgn}({{\boldsymbol{x}}}_{2})+\displaystyle \frac{\gamma {\rho }^{2}}{\pi }\mathrm{sgn}({{\boldsymbol{x}}}_{2})\\ & = & -\displaystyle \frac{\gamma \rho }{{\sigma }_{1}\sqrt{2\pi }}{{\boldsymbol{x}}}_{1}+\displaystyle \frac{\gamma {\rho }^{2}}{\pi }\mathrm{sgn}({{\boldsymbol{x}}}_{2})+{\boldsymbol{\epsilon }}.\end{array}\end{eqnarray*}$

The variance of these residuals is

$\begin{eqnarray*}\begin{array}{rcl}{\sigma }_{{{\boldsymbol{r}}}_{\beta }}^{2} & = & \displaystyle \frac{{\gamma }^{2}{\rho }^{2}}{2\pi {\sigma }_{1}^{2}}{\sigma }_{1}^{2}+\displaystyle \frac{{\gamma }^{2}{\rho }^{4}}{{\pi }^{2}}-\displaystyle \frac{2{\gamma }^{2}{\rho }^{3}}{{\sigma }_{1}\sqrt{2{\pi }^{3}}}\langle {{\boldsymbol{x}}}_{1}\mathrm{sgn}({{\boldsymbol{x}}}_{2})\rangle +{\sigma }_{\mathrm{int}}^{2}\\ & = & \displaystyle \frac{{\gamma }^{2}{\rho }^{2}}{2\pi }+\displaystyle \frac{{\gamma }^{2}{\rho }^{4}}{{\pi }^{2}}-\displaystyle \frac{2{\gamma }^{2}{\rho }^{4}}{{\pi }^{2}}+{\sigma }_{\mathrm{int}}^{2}\\ & = & \displaystyle \frac{{\gamma }^{2}{\rho }^{2}}{2\pi }\left(1-\displaystyle \frac{2{\rho }^{2}}{\pi }\right)+{\sigma }_{\mathrm{int}}^{2}.\end{array}\end{eqnarray*}$

Biases from Non-simultaneous Regression with Correlated Covariates: A Case Study from Supernova Cosmology

Article metrics

Permissions

Author e-mails

Author affiliations

ORCID iDs

Dates

Abstract

1. Introduction

2. Toy Model: Two-dimensional Linear Regression with Correlated Covariates

3. Step Function Corrections

4. Comparison to Data

5. Conclusions

Appendix A: Derivation of Regression Parameters in the Simultaneous Case

Appendix B: Derivation of Biases on Regression Parameters in the Non-simultaneous Case

Appendix C: Step Function Correction Derivations

Footnotes

Biases from Non-simultaneous Regression with Correlated Covariates: A Case Study from Supernova Cosmology

Article metrics

Permissions

Share this article

Author e-mails

Author affiliations

ORCID iDs

Dates

Abstract

1. Introduction

2. Toy Model: Two-dimensional Linear Regression with Correlated Covariates

3. Step Function Corrections

4. Comparison to Data

5. Conclusions

Appendix A: Derivation of Regression Parameters in the Simultaneous Case

Appendix B: Derivation of Biases on Regression Parameters in the Non-simultaneous Case

Appendix C: Step Function Correction Derivations

Footnotes