Stellar Population Inference with Prospector

Benjamin D. Johnson; Joel Leja; Charlie Conroy; Joshua S. Speagle

doi:10.3847/1538-4365/abef67

1. Introduction

Fitting galaxy spectral energy distributions (SEDs) is the primary method for converting galaxy observations into inferred physical properties. These physical properties can in turn be compared to predictions from theories of galaxy evolution. Accordingly, galaxy SED fitting has a rich history as one of the primary methods of hypothesis testing in extragalactic astronomy (see reviews by Faber 1977; Tinsley 1980; Walcher et al. 2011; Conroy 2013).

There are three key components of SED fitting: a physical model, a set of observations such as photometry or spectroscopy, and a statistical inference framework that connects the model to the observations. The physical models are typically chosen to be stellar population synthesis codes, which combine inputs such as isochrones, stellar spectra, an initial mass function (IMF), and a star formation and chemical evolution history to create the SEDs of complex stellar populations. There are many publicly available stellar population synthesis codes that treat these elements with varying degrees of complexity, including GALAXEV (Bruzual & Charlot 2003), PÉGASE (Fioc & Rocca-Volmerange 1997; Le Borgne et al. 2004; Maraston (2005), FSPS (Conroy et al. 2009), and BPASS (Eldridge et al. 2017). Additional nonstellar effects that are important to the SED such as dust attenuation, active galactic nucleus (AGN), and nebular and dust emission can be included as part of or separately from the core stellar populations synthesis codes.

The statistical inference framework performs the heavy task of matching complex physical models of galaxy emission to observations. This is usually accomplished through a likelihood function. Many techniques for SED fitting in the literature thus far have been based on maximum-likelihood optimization, a process sometimes called inversion (Walcher et al. 2011). The optimization may be through nonlinear or, for restricted models, linear methods, or a mix of both (Heavens et al. 2000; Cid Fernandes et al. 2005; Ocvirk et al. 2006; Tojeiro et al. 2007; Koleva et al. 2008). These techniques are popular in part because they are fast and simple. However, many of the likelihood spaces in SED fitting are both highly non-Gaussian and ill conditioned such that a small change in the input (e.g., noise in galaxy photometry) can lead to a large change in the output (e.g., star formation histories, SFHs; Ocvirk et al. 2006). In this case, the pure maximum-likelihood estimation and associated inversion techniques cause large noise amplification, resulting in unstable maximum-likelihood solutions and causing severe difficulties in accurately assessing the uncertainties. While regularization can mitigate noise amplification, it complicates interpretation. Furthermore, it can be difficult to incorporate important nonlinear model components such as dust attenuation and spectral calibration in a maximum-likelihood framework.

A solution to these problems can be found in Bayesian forward-modeling techniques. These techniques are able to accurately extract complicated, large, and correlated parameter uncertainties from galaxy observations. Early Bayesian codes such as Kauffmann et al. (2003), Salim et al. (2007), CIGALE (Burgarella et al. 2005), and MAGPHYS (da Cunha et al. 2008) were successful in this effort. These codes often worked by comparing precomputed parameter grids or "libraries" of SED models to the observations. While likelihood evaluation over a grid can be computationally very fast, grids grow in size exponentially with the number of parameters. As a result, grids only allow a relatively small number of parameters or coarse spacing. Furthermore, grids and libraries are large on disk and slow to regenerate, making it difficult in practice to test the effect of the grid assumptions on the inferred parameters.

The success of these early Bayesian forward-modeling codes motivated an expansion of the technique into gridless "on-the-fly" modeling coupled with Markov Chain Monte Carlo (MCMC) algorithms to efficiently explore the resulting parameter space. This approach allows priors and fit parameters to be changed quickly on an object-by-object basis, at the cost of longer per-object computational times. This approach was pioneered by GalMC (Acquaviva et al. 2011), followed later by codes such as BEAGLE (Chevallard & Charlot 2016) and BAGPIPES (Carnall et al. 2018). Here we present Prospector, a modular galaxy stellar populations inference code based on Bayesian forward modeling and Monte Carlo sampling of the parameter space using FSPS stellar populations. Prospector has been in development since 2014 and, in addition to extensive testing on nearby galaxies (Leja et al. 2017, 2018), has seen early use in modeling the SEDs of dwarf galaxies (Greco et al. 2018; Pandya et al. 2018), spatially resolved spectroscopy (Patrício et al. 2018; Oyarzún et al. 2019), the host galaxies of supernovae and gravitational wave events (Blanchard et al. 2017a, 2017b; Nicholl et al. 2017), and gravitationally lensed or massive high-redshift quiescent galaxies (Ebeling et al. 2018; Williams et al. 2021).

The paper is structured as follows. In Section 2, we describe the ingredients of the Prospector code, including the physical galaxy model, modeling of observational effects, and the statistical framework. We also discuss some general considerations when building models from these ingredients. In Section 3, we present several demonstrations of Prospector fits to mock data, including both photometry and spectroscopy. In Section 4, we give examples of Prospector operating on real spectroscopic and photometric data. In Section 5, we discuss lessons learned during the development of Prospector and several avenues for future improvement. Finally, the appendices provide implementation details. Prospector is available online at https://github.com/bd-j/prospector and 10.5281/zenodo.4586953, and the scripts to run the fits and make the figures in this paper are available in a code repository at https://github.com/bd-j/exspect.

2. Prospector Ingredients

The Prospector code works by forward modeling the observed spectrum and photometry of a galaxy given parameters describing the different galaxy components (stars, gas, and dust) and instrumental parameters, computing a likelihood and posterior probability for the model based on a noise model and prior distributions, and Monte Carlo sampling the posterior probability distribution. In this section, we describe, in turn, each of the ingredients necessary for this process, ending with some advice for building models from these ingredients. Readers interested in demonstrations of the code applied to mock and real data can skip to the next sections.

2.1. Stellar Population Sources

In Prospector, the generation of a basic spectrum and photometry is handled by stellar population sources. These source objects take a set of parameters and compute a mass-normalized rest-frame spectrum.

In this paper, we will model both simple and composite stellar populations, describing star clusters and galaxies, respectively. The spectra for these sources in Prospector are derived from the Flexible Stellar Population Synthesis (FSPS; Conroy et al. 2009, 2010) code. The FSPS code is accessed through the python-FSPS ⁹ (Foreman-Mackey et al. 2014) bindings. Given a set of stellar population parameters, python-FSPS is used to calculate the emergent galaxy spectrum by combining simple stellar populations (SSPs) with the SFH and absorption and emission by other galaxy components. If necessary, SSPs may be generated by FSPS during this computation; otherwise, the SSPs are cached by python-FSPS, and only the calculations required to produce the spectrum of the composite stellar population (as well as nebular emission, dust, and other effects) are made, significantly speeding up the calculation. This allows for the spectrum of a stellar population to be constructed efficiently on the fly; no parameter grids are necessary, except those used internally by FSPS.

The FSPS code includes detailed stellar populations with a variety of choices for stellar isochrones and stellar spectral libraries, self-consistent absorption and emission by dust, self-consistent nebular emission, and intergalactic medium (IGM) attenuation. More details about the ingredients of FSPS, especially those that have been added or modified since Conroy et al. (2010), can be found in Appendix A. Every parameter available in the FSPS code is a potential parameter of a Prospector model and can either be free or fixed. This includes parameters that affect the computation of the SSPs like the slope of the IMF, shifts in the temperatures or luminosities of AGB isochrones, and changing blue-straggler fraction, as well as more traditional composite stellar population parameters like stellar metallicity and dust attenuation (including the slope of the attenuation curve and the age dependence of the optical depth). Nebular emission parameters may also be varied. Varying parameters that affect the SSPs will trigger their regeneration, which is computationally expensive. As will be discussed in Section 2.2, for several parameters available in FSPS—including the redshift and spectral smoothing—the computations are handled by Prospector externally to FSPS.

Prospector has access to all of the parametric SFHs built into FSPS. These forms include exponentially declining SFRs, exponentially declining SFRs with a linear rise (delayed-exponential), constant, single-age bursts, and a flexible four-parameter delayed-exponential model that transitions to a linear ramp at late times described in Simha et al. (2014). In addition, Prospector includes the ability to model linear combinations of these SFHs. For example, it is possible to model SFR(t) as the sum of an exponentially declining SFH and a constant SFH, each with separate dust obscuration, to represent the stellar population of the bulge and disk, respectively. It is advised that any user-defined SFH should be tested by generating and fitting mock data to ensure that the information content of the data is well matched to the complexity of the model (Section 2.5).

Through the use of tabular SFHs in FSPS, the Prospector code can also be used to model and fit more flexible or complicated SFHs. As currently implemented in Prospector, these are the so-called "nonparametric" SFHs, high-dimensional piecewise constant (or step-function) SFHs, the fundamental parameters of which are stellar mass formed in each of N bins of lookback time of user-defined width. Additional forms (e.g., piecewise linear, with parameters giving the SFR at particular times) may be included in the future. Nonparametric SFHs are highly flexible and capable of modeling SFHs with sharp features or unusual shapes that are forbidden by usual parametric forms. It is difficult to constrain flexible SFHs with typical galaxy observations, so the chosen prior is very important (Sections 2.4.2, 3.2 Leja et al. 2019). Through careful parameter transformation, a number of different priors on the amplitudes are possible; Prospector is packaged with four families of such priors, but the user may adjust these or explore more complicated priors.

Finally, the modular nature of Prospector allows for sources other than the ones available through FSPS to be used, in principle. All that is required is a Python object that, given a parameter dictionary and a set of wavelengths and filters, can compute and return the mass-normalized rest-frame spectrum.

2.2. Redshift, Filters, Line-spread Functions

When modeling data, the models need to be "projected" into the data space in various ways. These effects are handled by code that operates on the spectrum returned by any of the sources described above. They include redshifting, normalization by the total formed stellar mass and distance dimming, and filter projections. If required, filter projections are handled by the pure Python code sedpy ¹⁰ (Johnson 2019), where the addition of user-defined filters is straightforward.

To model observed spectra, the normalized, redshifted spectrum must be broadened to match the observed line-spread function (LSF) including physical and instrumental broadening. Prospector includes code for fast spectral broadening in velocity or wavelength space using fast Fourier transforms (FFTs). The broadening kernels are currently limited to Gaussians, but more complex kernels may be added in the future. Broadening with FFTs uses the analytic Fourier transforms of the broadening kernels (e.g., Cappellari 2017). A detailed discussion of the Prospector treatment of broadening due to the line-of-sight velocity distribution, the instrumental resolution, and considering the library resolution is given in Appendix C. At the time of writing, the highest resolution models available in FSPS and hence in Prospector are based on the empirical library of stellar spectra MILES (Sánchez-Blázquez et al. 2006), with FWHM ≈ 2.5 Å from 3525 to 7400 Å. Modeling even moderate-resolution spectroscopic observations of rest-frame wavelengths outside of this region, or high-resolution rest-frame optical data, will require new stellar libraries. Alternatively, the data may be smoothed to a lower resolution, but this induces correlated noise that would need to be accounted for in the likelihood (see Appendix D). Finally, because the kinematics of the gas and stars may be different, the nebular lines may need a different broadening and redshift from the stellar continuum. Prospector has this functionality: see Appendix E for details.

Adjustments to the wavelength calibration can be modeled using polynomials, following Koleva et al. (2009). Additional modeling of sky emission and transmission is under development.

2.3. Spectrophotometric Calibration

The treatment of spectrophotometric calibration and the uncertainty thereof is of critical importance when modeling spectra, and especially when combining spectroscopic and photometric information (Sections 2.4, 3.1.2, Appendix D). It is common when modeling spectroscopy to remove the sensitivity of the inferred physical parameters to the continuum shape, due to the difficulty of accurate spectrophotometry (e.g., Koleva et al. 2009; Cappellari 2017). Another way to think of this is that the information in the continuum shape of the spectroscopic data should not flow to the physical parameters of interest but rather to nuisance parameters describing the uncertain spectral calibration. Here we describe approaches to the modeling of spectrophotometric calibration that are available in Prospector. We assume throughout that the spectra and the photometry probe the same stellar populations, i.e., that there are no substantial color-dependent aperture corrections. The overall scaling of the spectroscopic calibration vector incorporates any gross aperture corrections, such that normalization-sensitive parameters (e.g., stellar mass) correspond to the population probed by the integrated photometry.

The most straightforward approach is to include a parameterized function of wavelength or pixel, typically a polynomial, that multiplies the model spectrum to produce the observed spectrum. One can then fit for the parameters of this calibration function simultaneously with the physical parameters (e.g., Lee et al. 2011; Becker et al. 2015). The calibration parameters can be marginalized over when considering physical parameter estimates. It is straightforward to include priors on these parameters to incorporate external knowledge about the spectrophotometric calibration accuracy. However, it can be technically difficult to make this simultaneous fit when the number of calibration function parameters is very large (e.g., for a very high-order polynomial). In Prospector, this difficulty can be avoided by optimizing the parameters of the calibration function at each model generation. In this case, the computational cost is small, but the resulting posterior distributions do not fully incorporate the uncertainties in the spectrophotometric calibration. It is possible—for certain prior distributions and calibration functions—to analytically marginalize over the parameters at each likelihood call, but this option is not yet available in Prospector. A different approach that avoids an explicit calibration function is to model calibration effects as covariant or correlated noise, which is discussed further in Appendix D.

The parameterized calibration function available by default in Prospector is a Chebyshev polynomial of the wavelength vector of user-defined order N_p. The scale of the calibration features that can be captured with this model is ∼ $({\lambda }_{\max }-{\lambda }_{\min })/{N}_{p}$ . Typically, optimization of the polynomial will serve most users' needs. When no photometry is available, the parameters of the calibration vector are unconstrained except by the space of possible continuum shapes allowed by the physical model. This will lead to unconstrained stellar mass and dust attenuation, as these parameters are typically constrained by the shape and normalization of the continuum.

2.4. Likelihood, Priors, and Sampling

2.4.1. Likelihood

The basic log-likelihood calculation in Prospector is effectively a χ² calculation for both the spectral and photometric data. The two log-likelihoods are then summed. This gives the following expression for the $\mathrm{ln}$ -likelihood:

$\begin{eqnarray}\begin{array}{rcl}\mathrm{ln}{ \mathcal L }({d}_{s},{d}_{p}\,| \,\theta ,\phi ,\alpha ,\beta ) & = & \mathrm{ln}{ \mathcal L }({d}_{s}\,| \,\theta ,\phi ,\alpha )\\ & & +\,\mathrm{ln}{ \mathcal L }({d}_{p}\,| \,\theta ,\beta ),\end{array}\end{eqnarray} \tag{ 1 }$

where d_s is the spectroscopic data, d_p is the photometric data, the parameters θ describe the physical model and generally include any of the python-FSPS parameters, the parameters ϕ describe spectroscopic instrumental effects and calibration, the parameters α describe the spectroscopic noise model, and the parameters β describe the photometric noise model.

By default, we assume known and independent Gaussian flux uncertainties. Under this assumption, the $\mathrm{ln}$ -likelihood for the spectroscopic and photometric terms is simply χ². More complex noise models are available in Prospector. These may be necessary to model a variety of instrumental artifacts and are discussed further in Appendix D.

2.4.2. Priors

Before each likelihood call, the prior probability of every free parameter is computed. A prior probability distribution must be specified for each of the parameters that is allowed to vary during a fit. Several simple distributions are provided including uniform, normal or Gaussian, clipped normal, log-normal, log-uniform, Student's t, and Beta distributions. The parameters describing these distributions—such as the minimum and maximum for the uniform distribution, or the mean and dispersion for the normal distribution—must be specified as well. Users may provide their own prior distributions. It is also possible in principle to specify joint priors on several parameters (e.g., that one parameter is less than another parameter), though in practice this is more easily achieved through parameter transformations (see below).

In the often highly degenerate space of SED modeling, priors can play a strong role in determining the shape of the posterior probability distribution (e.g., Carnall et al. 2019a; Leja et al. 2019). We therefore strongly encourage exploration of the sensitivity of the results from Prospector to choices of prior, as discussed further in Section 2.5.

Related to priors, Prospector offers methods for transforming variables. This works by labeling some parameters as dependent on other parameters via a given transformation function. This mechanism allows one to sample in transforms of the native parameters (e.g., in τ⁻¹ instead of τ) that may be more convenient, to tie parameters to other, varying, parameters (e.g., to tie the gas-phase metallicity to the stellar metallicity), and to avoid joint priors such as limits on population age that are a function of the varying redshift of the model.

2.4.3. Optimization and Sampling

The posterior probability distribution is sampled using Monte Carlo techniques, and optimization can also be performed (though is not recommended for the reasons described in the Introduction, except as an initial guess for sampling algorithms).

The primary inference technique used by Prospector is nested sampling. Nested sampling (Skilling 2004) is an algorithm based on successive draws from the prior distribution at increasing values of the likelihood. It uses the change in the effective prior volume of isolikelihood contours as a function of likelihood to estimate the Bayesian evidence, and the successive draws—along with their associated weights—can be used to estimate the posterior distribution of the parameters. Because of the way the prior volume is sampled, nested sampling is well suited to posteriors that are multimodal (e.g., for photometric redshifts). Furthermore, no optimization is required before sampling begins, and stopping criteria based on estimates of the remaining evidence can be defined. Nested sampling in Prospector uses the dynesty pure Python code package (Speagle 2020), an implementation of algorithms described in Feroz et al. (2009) and Higson et al. (2019).

Within Prospector, it is also possible to use the ensemble MCMC algorithm of Goodman & Weare (2010), as implemented in the emcee Python package (Foreman-Mackey et al. 2013). This algorithm is based on the use of many "walkers" in parameter space, the distribution of which is used to construct parameter proposals for the next Monte Carlo step in the chain.

The results of the Monte Carlo sampling are a list of parameter values, which represent (weighted) samples from the joint posterior probability distribution for all parameters, given the data. These samples provide a natural and useful tool for quantifying parameter uncertainties (marginalized over other parameters of the model) and the degeneracies between parameters. Given samples from the posterior probability distribution, it is up to the user to decide what to report. This may include a full set of samples or various projections of the posterior PDF. It is also possible to report moments or quantiles of the marginalized posterior PDF for each parameter, though care must be taken in the interpretation when the posteriors are very broad and/or non-Gaussian. For complicated posterior distributions, the set of marginalized one-dimensional posterior medians may describe a model that is itself very unlikely. For these reasons, it is often best to work with the entire posterior distribution (as approximated by the Monte Carlo samples) for subsequent analyses. For visualizing the model in the space of the data, the highest posterior probability sample¹¹ is also useful.

2.5. Model Building

Given all these ingredients in Prospector, it is the responsibility of the user to define a model that is appropriate for their data and scientific question. In Prospector this amounts to choosing a set of parameters, deciding which are free and specifying prior distributions for those, and specifying the values of the remaining fixed parameters. It may also require defining any parameter transformations to convert from the sampling parameters to the native parameters used to generate spectra and SEDs. Several examples of this process are presented in Sections 3 and 4. A significant advantage of on-the-fly model generation is that changing the model definition—the free parameters, the priors, and the parameterization—does not require rebuilding large grids of models.

When constructing a model for inference in Prospector, it is often useful to start with a simple model and to add complexity as required by the data and scientific question. A first step is to make predictions with a simple model and some fiducial or very approximate parameter values and to compare this to some representative data. Once the model seems like it has reasonable components, the next step is to perform a fit and then, crucially, to examine residuals. This can reveal missing components in the model that were not immediately apparent, such as unmodeled weak nebular emission lines, resolution mismatch, or the presence of hot dust emission. Examining the posterior PDFs can identify priors that are too restrictive or misspecified. Finally, with a basic model in hand, one can explore letting parameters that were previously fixed be free, though with informative priors.

There is a natural desire to avoid complexity in the model by limiting the number of free parameters. However, one of the principal advantages of Bayesian inference and lack of grid is the ability to add extra parameters that may affect the SED and marginalize over them if they are not scientifically interesting. Adding such parameters—the values of which are not extremely well known a priori but that might affect the SED—will cause other parameter estimates to become less certain and thus guard against overinterpreting the data. If the parameter is not well constrained by the data—that is, if it does not affect the data in a unique and significant way—then this will be reflected in the posterior PDFs, which will take the form of the prior for that parameter. Furthermore, the constraints on other parameters will only be affected if they are somehow degenerate with the new parameter; in this case, not including the additional parameter would bias the uncertainties low on these other parameters. Thus, one should not shy away from including poorly constrained parameters that are known to be physically important or to affect the SED, though in such cases, it is important to put thought into the adopted prior distribution. The only drawback to adding parameters is that MCMC sampling can become inefficient as the number of parameters increases.

When the data are only weakly informative about the parameters, as can often occur in SED fitting, the form of the priors can play a strong role in the posterior estimation. It is often useful to place strongly informative priors on some parameters, as opposed to weakly informative priors such as uniform distributions. If one has external information about the plausible values that a parameter might take—for example, that the shape of the attenuation curve is most likely to be a certain value but that there is scatter—then it is best to use that in the form of a prior. This is preferred to the typical approach of fixing the parameter to the single most likely value: that effectively constitutes an extremely strong prior and will often result in overly confident posteriors.

However, the importance of priors can also raise concerns about the interpretation of the results—did you actually learn anything from fitting your data, or are the results dominated by the prior assumptions? When this is a concern, we recommend testing the sensitivity of the results to the form of the prior. This can be done most directly by running fits with a different but still plausible prior and seeing if and how the results change. It can also be useful to generate SEDs or other derived quantities by directly sampling the chosen prior distribution and inspecting the distribution of those quantities to see what values would be preferred or allowed by the model.

3. Demonstrations with Mock Data

In this section, we provide demonstrations of some of the capabilities of the Prospector code for fitting galaxy spectra and/or photometry using mock data. These demonstrations include the combination of photometric and spectroscopic information in a single fit, the inference of flexible, many-parameter SFHs, and the constraints provided on a complex model with different combinations of photometric data, including a single optical photometric point.

Throughout these demonstrations with mock data, we use the MIST isochrones (Choi et al. 2016) and the MILES spectral library (Sánchez-Blázquez et al. 2006) within FSPS to construct both the mocks and the models with which they are fitted. Because we are fitting mock data with the same stellar population models used to generate that data, the results of this section allow us to demonstrate and test the Prospector methodology and examine the constraints that can be placed in the limit of perfect stellar population models.

The demonstrations in this section are not intended as exhaustive explorations of the information content of different kinds of data and the suitability of different kinds of models, but rather provide examples of the capabilities of the Prospector code for fitting different kinds or combinations of data with a variety of galaxy SED models.

3.1. A Simple Parametric SFH Model

Since the pioneering work of Tinsley (1980), it has been common to fit observed SEDs to galaxy models where the SFH is represented by a relatively simple functional form characterized by a few parameters, SFR(t) = f(t, θ). The commonly used exponentially declining SFH arises in closed-box evolutionary models with a constant star formation efficiency. In addition to the exponentially declining and the related delayed-exponential SFHs, recent studies have proposed a variety of functional forms thought to be more or less well matched to the SFHs of real galaxies (Gladders et al. 2013; Simha et al. 2014; Diemer et al. 2017; Carnall et al. 2018). Parametric SFH models attempt to describe the potentially complex evolution of galaxies with only a few numbers. This is very desirable for limiting the size of SED grids and making any Bayesian inference with MCMC more tractable.

Here we present a fit to mock data with a parametric SFH model. This provides a comparison point for the extensive literature using parametric SED models, and the simplicity of the model allows us to focus on different aspects of the Prospector fitting. For all of the fits presented in this section, we use a delayed-exponential SFH, where the SFR ψ as a function of lookback time t_ℓ is given by

$\begin{eqnarray}&&\psi ({t}_{{\ell }})\propto \displaystyle \frac{{t}_{\mathrm{age}}-{t}_{{\ell }}}{\tau }{e}^{-({t}_{\mathrm{age}}-{t}_{{\ell }})/\tau },\quad 0\lt {t}_{{\ell }}\lt {t}_{\mathrm{age}}\end{eqnarray} \tag{ 2 }$

with parameters t_age and τ. While a number of treatments of dust attenuation are available through FSPS (see Appendix A), we model the dust attenuation with a simple dust screen affecting stars of all ages equally and specify a fixed wavelength dependence of the optical depth based on Kriek & Conroy (2013). The normalization of this attenuation curve—the optical depth at 5500 Å—is left as a free parameter. We also include as a free parameter a single metallicity for all stars. For simplicity, we tie the metallicity of the nebular emission model (Appendix A) to this stellar metallicity, but this is not required in general. The ionization parameter U of the nebular emission is left as a free parameter. Finally, the overall normalization of the SFH is left as a free parameter, corresponding to the total mass of stars formed or the integral of the SFH. We include dust emission in the model, based on the three-parameter models of Draine & Li (2007; see Appendix A for details), but we do not fit for these parameters, as we do not include data at infrared wavelengths that are sensitive to these parameters. Unless otherwise stated, values of the fixed parameters used to generate the mock data are the same as the parameters used to fit the mock data.

The modeling of redshift is described and treated separately in the fits below. However, the redshift of the mock is set to 0.1, and the prior distribution on t_age does not allow solutions older than the age of the universe at this redshift. The values of the other mock parameters (and the priors used when modeling the SED) are given in Table 1.

Table 1. Summary of Parameters and Priors Used in Mock Demonstrations

Parameter	Mock Value	Prior Functions
Stellar Mass^a [M_⊙]	10¹⁰	LogUniform(10⁷, 10¹³)
Age t_age [Gyr]	12	Uniform(0.1, t_univ ^b)
τ_SF [Gyr]	4	LogUniform(0.01, 100)
z	0.1	${ \mathcal N }(0.1$ , 0.001)
$\mathrm{log}({Z}_{\star }/{Z}_{\odot })$	−0.3	Uniform(−2.0, 0.2)
τ₅₅₀₀	0.3	Uniform(0, 4)
$\mathrm{log}\,U$	−2	Uniform(−4, −1)
σ_v ^b [km s⁻¹]	200	Uniform(30, 400)

Notes. Uniform(x, y) indicates a uniform prior distribution from x to y, while ${ \mathcal N }(\mu ,\sigma )$ indicates a normal distribution for the prior, and LogUniform(x, y) indicates a distribution that is uniform in the log of the parameter in the parameter range (x, y)

^aThis is the integral of the SFH.^bThe maximum age is set by the age of the universe at the model redshift in a WMAP9 cosmology.^cVelocity dispersion is only included as a model parameter when fitting spectra

Download table as: ASCII Typeset image

3.1.1. Inference from Photometry

In this section, we demonstrate inference of the parameters of the simple model described above from only the mock photometric data. The mock photometry is generated using the same model used for the inference, with parameters given in Table 1. We generate photometry in the filters of several large-area sky surveys, including GALEX FUV and NUV, SDSS ugriz, and the 2MASS JHK_s bands. We assume uncorrelated uncertainties of 5% in the photometry in each band and add a realization of the noise to the mock photometry.

For this demonstration, we assume, when fitting the photometry, that the redshift is constrained to the true value to within 0.001. That is, we include the redshift as a free parameter, but with a very narrow prior distribution as might be expected from low signal-to-noise ratio (S/N) grism spectra, yielding a model with seven free parameters. We then use Prospector to explore the posterior probability distribution of the free parameters for this model. For the inference, we use nested sampling, with default parameters. In Figure 1, we show the mock spectrum and the mock photometry, the latter perturbed by a realization of the noise. We also show the inferred values for the true photometry as shaded boxes with heights indicating the 16th–84th percentile range of the posterior probability. Uncertainty-normalized residuals for the highest probability posterior sample are also shown. Examining such residuals is a key step in evaluating the performance of any model and identifying any deficiencies that require additional model components or flexibility to adequately describe the data.

In Figure 1, we also show projections of the posterior probability distribution for several of the free parameters (the redshift and the nebular ionization parameter are not constrained by the data, and the posterior distributions are indistinguishable from the prior). The input mock parameter values are also shown in black, and the adopted prior probabilities are indicated in the one-dimensional projections. The posterior PDFs for a number of the parameters are quite broad. Notably, there is a strong degeneracy between t_age and τ_SF such that while neither is well constrained, their ratio is reasonably well constrained. The ratio t_age/τ is closely related to the specific star formation rate. The well-known dust-age–metallicity degeneracy is mapped in complete detail.

It is worth pointing out that the peak of the marginalized posterior probability distributions are not generally centered on the true values. This is a common and expected occurrence, caused in part by the perturbation of the mock fluxes by a realization of the noise and in part by the complicated shape of the posterior in multidimensional parameter space combined with the adopted priors. This highlights the potential drawbacks of simply using the maximum a posteriori parameters.

3.1.2. Incorporating Information from Spectroscopy

We now turn to the inference of mock parameters from mock spectroscopic data, both alone and in combination with photometric data. We generate a mock spectrum from the model, smooth the mock spectroscopy by an additional velocity dispersion of σ_v = 200 km s⁻¹, and assume an instrumental dispersion of 2 Å per pixel from rest-frame 3800–7100 Å (corresponding to the region of the spectrum covered by the high-resolution empirical MILES stellar library). The applied physical velocity smoothing dominates over the intrinsic resolution of the MILES stellar library, but for clarity and simplicity, we also assume that the instrument has the same LSF as the MILES library redshifted to z = 0.1 (∼2.75 Å FWHM). See Appendix C for approaches to data where the instrumental or library broadening is a significant contributor to the final LSF of either the data or model. We assume 10% uncertainties in the flux of each pixel, and a realization of this noise is added to the data. To mimic realistic observations where the spectrophotometric calibration may not be known, and to highlight that we are not using information from the continuum shape, we divide the mock spectrum by a sixth-order polynomial fitted to regions of the spectrum free from strong absorption or emission lines.

To model the continuum-normalized mock spectrum, we include all free parameters used in the fit to the photometry (Table 1), and we also include the galaxy velocity dispersion as a free parameter. To account for any uncertainty in the spectroscopic calibration, we use the method of multiplying the model spectrum by the maximum-likelihood polynomial that describes the ratio between the spectroscopic data and the model before computing the likelihood of the data. We use a 12th order polynomial that is determined at each likelihood call. By including this polynomial multiplication, we do not use information from the spectral continuum shape; that information is supplied by the photometry if it is also being fit.

In Figure 2, we compare the posterior PDFs for several model parameters inferred from the continuum-normalized spectroscopy alone (in maroon contours) to those inferred from the photometry alone (blue contours, previous section). Given the strong degeneracy between t_age and τ when fitting to photometry alone, we instead show the posterior PDF for the ratio t_age/τ. In this comparison, we see that the posterior PDF for the dust attenuation is much broader when inferred from continuum-normalized spectroscopy than when inferred from the photometry, and is essentially the same as the prior. This is expected because all of the continuum shape information in the data is absorbed by the multiplicative polynomial. In other words, the dust attenuation is completely degenerate with the spectrophotometric calibration, which we have assumed to be completely unknown. The stellar mass is similarly unconstrained. The metallicity is tightly constrained by the spectroscopy; this is not surprising, as in addition to the metallicity information provided by absorption lines in the stellar continuum, we have tied the nebular emission model to the stellar metallicity and include the emission lines in the fitted data.

Figure 2 shows that broadband UV through NIR photometry and continuum-normalized spectroscopy carry different information about the model parameters. It is natural then to combine the constraints from each. We do this by simultaneously modeling and fitting the photometry and spectroscopy using the combined likelihood described in Section 2.4.¹² Because we have included a polynomial calibration or normalization vector when modeling the spectra, the information about the continuum shape and any associated parameter is entirely supplied by the photometric data. If we did not include this uncertainty in the spectroscopic calibration, then the much larger number of spectroscopic points with comparable S/N to the photometric data would overwhelm any information in the photometry, unless two models with equally likely spectra somehow had significantly different photometry. This latter situation would likely occur only if the SFH and dust model were very flexible, or the spectra were of much lower S/N than the photometry.

In Figure 2, we show the posterior probability distribution functions of the selected parameters obtained from a simultaneous fit to the photometry and the spectroscopy (gold contours). For all parameters, these are narrower than the posterior PDFs obtained from either photometry or spectroscopy alone. Interestingly, while the continuum-normalized spectroscopy places no constraint on the dust attenuation by itself, its combination with photometry results in even tighter constraints on the attenuation than from the photometry alone. This is because the precision on other parameters, such as $\mathrm{log}(Z/{Z}_{\odot })$ , which are partially degenerate with dust attenuation in the photometry alone, is increased by the inclusion of the spectroscopic data.

3.2. Inferring the SFHs of Illustris Galaxies

The assumption of a particular functional form for the SFH of a galaxy amounts to an extremely strong prior on that SFH. However, it is unclear how well real galaxies are represented by these parametric models (e.g., Gallagher et al. 1984). Deviations from the assumed forms affect the accuracy of derived parameters (e.g., Lee et al. 2009; Pforr et al. 2012; Wilkins et al. 2012; Pacifici et al. 2015; Lower et al. 2020), as well as estimates of the uncertainty on the SFH (Carnall et al. 2019a; Leja et al. 2019).

One way to address the complex SFHs in real galaxies is to increase the flexibility of the model SFHs. Flexibility has been introduced previously by adding additional components (i.e., recent bursts), by choosing more flexible forms for the parametric SFH (e.g., Salim et al. 2007; Carnall et al. 2018), or using a library of SFHs based on semianalytic or hydrodynamic simulations (e.g., da Cunha et al. 2008; Pacifici et al. 2013; Iyer et al. 2019). These approaches can require large library sizes to densely sample the prior SFH parameter space and also incorporate the simulation as an SFH prior.

Another possibility is to use piecewise constant or δ-function SFHs where the mass in each temporal bin becomes a parameter of the model. These so-called nonparametric—actually many-parameter—models have become the standard in SFH inference from (semi)resolved stellar color–magnitude diagrams (e.g., Dolphin 1997; Harris & Zaritsky 2001; Weisz et al. 2011; Cook et al. 2019) and have been used in maximum-likelihood "inversion" codes for unresolved spectra (e.g., Cappellari & Emsellem 2004; Cid Fernandes et al. 2005; Ocvirk et al. 2006; Tojeiro et al. 2007; Koleva et al. 2009). In principle, any basis function can be used (e.g., Iyer & Gawiser 2017) though the piecewise constant basis has the advantage of being readily interpreted.

Prospector supports nonparametric models using a piecewise constant basis (Section 2.1). In contrast to maximum-likelihood techniques, Prospector allows for the exploration of the full posterior PDF for the component amplitudes. Inference with this kind of nonparametric SFH was explored in detail by Leja et al. (2019), where a number of different priors on the amplitudes of each bin were considered. A prior based on the ratio of the amplitudes in adjacent bins has some desirable qualities, and we adopt such a "continuity" prior in this section. In this model, the sampled parameters are the log of the ratio of the SFRs in adjacent time bins and the log of the total stellar mass. The mass M_i formed in each time bin—the native parameters used to generate the spectrum—depend on these sampled parameters through a transform:

$\begin{eqnarray}\begin{array}{rcl}\mathrm{log}\,{r}_{j} & = & \mathrm{log}({s}_{j}/{s}_{j+1})\\ {s}_{0} & = & 1\\ {s}_{i} & = & \displaystyle \frac{{s}_{0}}{{\prod }_{j=0}^{j\lt i}{r}_{j}^{-1}}\\ {m}_{i} & = & {s}_{i}\,{\rm{\Delta }}{t}_{i}\\ {M}_{i} & = & \,{m}_{i}\,\displaystyle \frac{{10}^{\mathrm{logmass}}}{{\displaystyle \sum }_{i}{m}_{i}},\end{array}\end{eqnarray} \tag{ 3 }$

where Δt_i is the width of the ith bin of the lookback time, in years. We place a Student's t-distribution prior on each of the $\mathrm{log}\,{r}_{j}$ parameters, centered on 0 (i.e., constant SFR); the Student's t distribution is similar to a Gaussian prior but with more probability in the tails. The effect of this prior is that in the absence of a constraint from the data the bins default to a constant SFR.

To demonstrate the capabilities of Prospector to infer both parametric and nonparametric SFHs, we construct mock data based on several galaxies from the Illustris cosmological hydrodynamic simulation of galaxy formation (Vogelsberger et al. 2014; Diemer et al. 2017). The galaxies were selected to have sharp variations in both the recent and the extended SFH. This selection provides the greatest challenge for the inference machinery and are also disfavored by the chosen prior, ensuring that any agreement is truly driven by the data. These SFHs have a temporal resolution of 0.136 Gyr (i.e., t_univ/100), and we linearly interpolate between the provided SFRs to build the mock spectrum. We assume a single stellar metallicity and a simple foreground dust screen. The mock spectra are generated for a redshift z = 0.1 and smoothed by a velocity dispersion of σ_v = 100 km s⁻¹. As for the mocks in Section 3.1, we assume an observed-frame instrumental LSF equivalent to the MILES spectral library at the assumed redshift. We assume an S/N of 100 for each pixel. For simplicity, we do not include nebular emission. We infer the SFH from the mock spectra with both parametric and nonparametric SFH models. For both models, we simultaneously infer the SFH, the stellar metallicity, the dust attenuation optical depth for a simple foreground screen, the stellar velocity dispersion, and the redshift. To focus the analysis on the physical information provided by ideal data we assume that the spectrophotometric calibration is perfect.

For the parametric SFH, we use the same delayed-exponential model as in Section 3.1. For the nonparametric SFH model, we adopt 14 bins approximately evenly spaced in $\mathrm{log}\,{t}_{{\ell }}$ . This choice of the number of bins and their exact widths is driven by experimentation, the difficulty in sampling a very high-dimensional parameter space when using an extremely large number of bins, and considering the degree of difference between the spectra in adjacent bins.

In Figure 3, we show the input Illustris SFHs and the posterior PDFs for the SFHs inferred with both a parametric and nonparametric model. In these examples, we find that the parametric model will tend to reproduce the shape of the very recent SFH—which often accounts for the stars that dominate the optical light—even at the expense of reproducing the shape of the SFH at earlier epochs if that is not allowed by prior. In contrast, the flexibility of the nonparametric SFH model is better able to recover the shape of the recent and older SFH. However, the influence of the adopted prior on SFR ratios remains evident, though at a lower level—the recovered nonparametric SFH will tend toward approximately constant SFR when the data are not constraining, and obviously, changes in the SFR within a given temporal bin are unrecoverable. We also find that the constraints on the SFH when using a parametric model are too strong when compared to the nonparametric SFH and less able to encompass the true range of SFHs consistent with the data (including the true input mock SFH). Finally, these differences in the inferred SFHs also affect inferred stellar masses, mass-weighted stellar ages, and star formation rates. For more discussion on these points, we refer the reader to Carnall et al. (2019a), Leja et al. (2019), and Lower et al. (2020).

3.3. A Complex Model: The Effect of More Data

In this section, we explore how the inference of a complex SED model is affected by increasing the number of photometric bands used to constrain it. The inferences we have shown to this point have been of relatively simple models, with few parameters. However, more complex models may be required to adequately describe the increasing quality of observations. Furthermore, making robust inferences from even a small number of low-quality data—and accurately representing the uncertainties on the underlying parameters—requires modeling all the physical properties that might affect that data. For example, there is evidence that the effective attenuation curve in galaxies is not fixed, so any quantities inferred assuming a fixed attenuation curve will tend to have underestimated uncertainties.

One reason to make a number of simplifying assumptions about the relevant physical ingredients is that generating and storing grids for high-dimensional parameter spaces is computationally challenging. One of the most important features of on-the-fly model generation with Monte Carlo sampling of the posterior is the ability to infer a larger number of physical parameters than can be considered in grid-based approaches, because models are only generated in regions of high probability as needed, instead of over the whole parameter space.

The model we explore is described by a nonparametric SFH with six temporal bins, a single stellar metallicity for all stars, a dust screen attenuation curve affecting all stars and described by a free power-law index and a V-band optical depth, an extra attenuation toward young stars described by a power-law attenuation curve with fixed power-law index, nebular emission described by the gas-phase metallicity and ionization parameter, AGN dust torus emission described by two parameters (Nenkova et al. 2008a; Leja et al. 2018, A) and dust emission described by the three parameters of the Draine & Li (2007) model. In total, this model has 16 free parameters, listed in Table 2. We assume a known redshift, z = 0.1. We adopt informative priors on many of these parameters, particularly the dust attenuation parameters and the AGN parameters. These prior distributions favor typical dust attenuation curves and disfavor large AGN contributions to the mid-IR.

Table 2. Summary of Parameters and Priors Used in the Complex Model

Parameter	Mock Value	Prior Functions
SFH

$\mathrm{log}{M}_{\star }$ ^a [M_⊙]	10	Uniform(7, 13)
$\mathrm{log}{r}_{i}$ ^b	0	StudentT(0, 0.3, 2)
$\mathrm{log}({Z}_{\star }/{Z}_{\odot })$	−0.3	Uniform( − 2.0, 0.2)

Dust Attenuation

τ_5500,diffuse	0.5	${{ \mathcal N }}_{c}(0.3,1.0,0,4)$
τ_young/τ_diffuse	1	${{ \mathcal N }}_{c}(1,0.3,0,1.5)$
Γ_dust	0	${{ \mathcal N }}_{c}(0,0.5,-1,0.4)$

Nebular Emission

$\mathrm{log}\,{U}_{\mathrm{neb}}$	−2	Uniform( − 4, − 1)

Dust Emission^c

${U}_{\min ,\mathrm{dust}}$	2	Uniform(0.1, 25)
Q_PAH	1	Uniform(0.5, 7)
γ_dust	10⁻³	LogUniform(10⁻³, 0.5)

AGN^d
f_AGN	0.05	LogUniform(10⁻⁵, 3)
τ_AGN	20	LogUniform(5, 150)

Notes. Uniform(x, y) indicates a uniform distribution from x to y, while ${{ \mathcal N }}_{c}(\mu ,\sigma ,x,y)$ indicates a normal distribution, clipped to the range (x, y), and LogUniform(x, y) indicates distribution that is uniform in the log of the parameter in the parameter range (x, y).

^aThis is the integral of the SFH.^b r_i is the ratio of the SFR in temporal bin i to that in the adjacent bin. There are five such parameters that describe the SFH.^cParameters for the Draine & Li (2007) dust emission model; see Appendix A.^dParameters for the Nenkova et al. (2008a) AGN torus emission model; see Appendix A.

Download table as: ASCII Typeset image

We generate broadband photometry in standard UV-FIR filters for a mock SED generated for a particular value of the model parameters. We then infer the model parameters using increasingly large subsets of the photometry, beginning with a single SDSS r-band data point. The other filter sets add SDSS optical bands, then 2MASS near-IR bands, then GALEX UV bands, then WISE mid-IR bands, and finally Herschel far-IR bands.

In Figure 4, we show the resulting posterior distributions for parameters of the model and for the SED. To summarize the inferred SFHs, we derive and present the recent SFR and the mass-weighted stellar age of the population. We see that for a small number of photometric bands, the posteriors of nearly every parameter are determined by the prior distribution. Nevertheless, even with a single photometric point, we are able to obtain a (poor) constraint on the total stellar mass. For this particular mock SED, the posterior PDF is shifted to a slightly larger stellar mass than the true value. This is largely because the true dust attenuation—which is entirely unconstrained except by the prior—is at the lower end of the prior distribution. Therefore, the prior (and posterior) distribution for the SED is redder and has a larger mass to light ratio than the true SED. This exercise demonstrates how through the use of informative priors and Bayesian reasoning even a single data point can be informative in the context of a very high-dimensional model.

Figure 4. Change in posterior PDF for the SED and parameters of a complex model as the number of photometric bands increases. The model includes a continuity SFH with six temporal bins, a complex dust attenuation model, dust emission parameters, and AGN contribution to the MIR. There are 16 parameters in total. Nebular emission is also included. The model is fit to a varying number of mock photometric data points indicating how the model becomes more tightly constrained with the addition of multiwavelength data. The static figure shows the case for a single photometric band. (This figure is available as an animation showing, in a 6 s video, the change in posterior when going from 1 photometric band, to 2, 5, 10, 14, and finally to 17 photometric bands.) Left panel: the posterior prediction for the SED (blue shaded region, indicating the 16th–84th percentile range for the flux at every wavelength), resulting from a fit to photometry in the indicated bands (gray circles) and compared to the true input SED (black line). Right panels: the marginalized posterior (blue shaded regions) and prior (dotted green lines) probability distributions for selected parameters of the model, compared to the true input parameters (dashed black lines). Several of the parameters shown are derived from the fundamental fitted model parameters, see the text for details. We show parameters of the stellar population (top row), the dust attenuation model (second row), the dust emission model (third row), and the AGN parameters (bottom row).

(An animation of this figure is available.)

Download figure:

Video Standard image High-resolution image

As the number of photometric bands increases, the posterior constraints on many parameters become stronger and, more importantly, distinct from the prior distribution. Adding even a single additional band leads to a much stronger constraint on the dust attenuation and hence mass-to-light ratio than a single band (e.g., Bell & de Jong 2001). With the full SED, nearly every parameter has a posterior PDF different from the prior.

4. Demonstrations with Real Data

We now turn to demonstrations of Prospector usage with real data.

4.1. Globular Cluster Spectroscopy and Photometry

Globular clusters have long been used as a strong test of stellar population models (e.g., Bruzual & Charlot 2003; Koleva et al. 2008). They are generally single-age, single-metallicity populations, making them real-world examples of SSPs. Many clusters in the Milky Way have high-quality ages based on the main-sequence turnoff and metallicities determined from high-resolution spectroscopy of individual stars. However, globular clusters present unique challenges—at low metallicities, they have very prominent blue horizontal branches (BHBs), which may be a consequence of the unusual light-element abundance variations that have been detected in nearly all clusters (Carretta et al. 2009). If these unusual features are confined to the cluster population, then they may be less useful as tests of stellar population models.

Here we fit the integrated spectra and photometry of a sample of Milky Way globular clusters with Prospector. The sample is drawn from the spectral atlas of Schiavon et al. (2005), who conducted drift-scan long-slit spectroscopy of 40 globular clusters. These spectra span the wavelength range 3400–6400 Å, with a spectral resolution that is wavelength dependent but averages 3.2 Å FWHM. The nominal S/N is often greater than 100. We take UBVRI photometry from Harris (1996), who provides the total (integrated) magnitudes of clusters. We assign uncertainties of 10% to the photometric fluxes.

To model this data, we treat each cluster as a single-age stellar population. The free parameters of this population are its mass, age, and stellar metallicity $\mathrm{log}(Z/{Z}_{\odot })$ (assuming scaled solar abundances). We fix the distance to the value given by Harris (1996) because for integrated SEDs at noncosmological distances, the distance and mass are perfectly degenerate. We broaden the model library spectra to match the instrumental resolution as reported by Schiavon et al. (2005) and fit for an additional stellar velocity dispersion σ_v as well as a systemic redshift. Extinction is modeled with the attenuation curve of Cardelli et al. (1989), with the normalization of this curve left as a free parameter. We also multiply the model spectrum by a maximum-likelihood 15th order Chebyshev polynomial at each likelihood call to account for imperfections in the spectrophotometric calibration.

In Figure 5, we show an example of the spectroscopic and photometric data, the residual of the posterior prediction for the SED from the data, the posterior prediction for the calibration vector, and the inferred posterior probability distribution functions for the key physical parameters. The photometry is typically well fit by the models, and the spectra can be fit simultaneously when accounting for offsets from the photometry that vary smoothly with wavelength by ∼10% and inflation of the nominal noise estimates by factors of ∼2.

In Figure 6, we show a comparison of the parameters that we infer for many globular clusters, compared to literature values as compiled by Koleva et al. (2008) and to values inferred from the same spectra using the UlySS code (Koleva et al. 2009). The literature values are based on stellar CMD fitting and metallicity measurements of individual stars. We find that the inferred metallicities from combined spectroscopy and photometry with Prospector are in good agreement with literature values, even in the presence of unmodeled α-element enhancements. The median offset is 0.04 dex and the rms scatter is ∼0.2 dex, including outliers at very low metallicity where the stellar libraries in our models are incomplete. Good agreement is also achieved between the inferred and literature values for the reddening (A_V).

Previous work has shown the integrated-light spectral fitting of globular cluster data returns generally unreliable ages (Koleva et al. 2008; Conroy et al. 2018). We find similar results with Prospector. The primary issue is the impact of hot BHB stars which, because they are not included in the modeling, will tend to result in artificially young inferred ages (see Koleva et al. 2009 for a discussion of this effect). Indeed, in Figure 6, one sees that the clusters with the most significant BHB populations (as inferred from the HBR parameter) have the youngest ages from Prospector. There are also some cases where integrated-light measurements (both from Prospector and Koleva et al. 2008) result in maximally old ages. The origin of this shortcoming is unclear but could be due to other model systematics such as unmodeled α enhancement or other limitations in the isochrones. We note that differences between 10 and 14 Gyr are quite subtle in integrated light.

Overall, we find that Prospector performs as well as other stellar population fitting programs when applied to globular clusters.

4.2. Photometric Redshift: GNz-11

Inferring galaxy redshifts from photometric data is one of the core applications of stellar population models. Here, we demonstrate the use of Prospector for this task, using the high-redshift galaxy candidate GNz-11. This object was selected as a high-redshift galaxy candidate on the basis of photometry with Hubble Space Telescope and Spitzer in the GOODS-North field, and follow-up grism spectroscopy with HST confirms a redshift z = 11.09 (Oesch et al. 2016).

The model used to fit the data includes the galaxy redshift as a free parameter. The other free parameters are the total stellar mass formed, the stellar and nebular metallicity, the nebular ionization parameter, and three parameters describing the dust attenuation. Due to the potential for the photometry to probe the rest-frame λ < 1216 Å spectrum at high redshift, we include a redshift-dependent IGM attenuation following Madau (1995). This includes a free parameter that scales the total IGM opacity, intended to account for line-of-sight variations in the total opacity (see Appendix A for details). The SFH is modeled with five temporal bins, with the continuity prior discussed in Section 3.2. The exact width and extent of these bins are dependent on the redshift of a given model, bounded such that no stars are older than the age of the universe. The priors are similar to those given in Table 2. For the multiplicative scaling of the IGM attenuation curve, we adopt a Gaussian prior distribution centered on 1, with a dispersion of 0.3. The ideal redshift prior would be the product of the redshift-dependent differential cosmological volume element and the integral of the redshift-dependent luminosity function above some limiting magnitude. However, for clarity and due to the lack of knowledge about the luminosity function at these epochs, we adopt a uniform prior over the range 1 < z < 13. There are 13 free parameters in the model.

The results of fitting this model to the photometric data are shown in Figure 7. The photometric data are well fit, and the photometric residuals from the sample with the highest posterior probability are consistent with the stated errors. The posterior probability distribution for the redshift shows a broad peak at $z={10.8}_{-0.4}^{+0.4}$ , and the highest posterior probability sample has z = 10.35, though many samples with nearly equal probability are distributed from z ∼ 10.3–11.4. This is in excellent agreement with the redshift of z = 11.09 ± 0.1 inferred from HST grism data. We do not find significant additional or secondary peaks in the posterior distribution of redshift using this model, though a much weaker peak is present at z ∼ 2. Posterior distributions for the SFH and several additional model parameters suggest a rising SFH, a formed stellar mass of ∼10^9.6 M_⊙, low dust attenuation, a weak constraint on the stellar metallicity, and no constraint on the IGM opacity scaling factor beyond the imposed prior distribution.

4.3. SDSS Post-starburst Spectrum

We fit a post-starburst galaxy from the SDSS IV (Gunn et al. 2006; Blanton et al. 2017), chosen to highlight Prospector's capability to perform a simultaneous fit to spectra and photometry. A post-starburst galaxy is characterized by strong hydrogen absorption lines, indicating it has decreased its star formation activity sharply in the past ∼500 Myr. We select a post-starburst galaxy to demonstrate both the flexible SFH model and the flexible emission-line modeling available within Prospector.

We fit SDSS J133751.29+354755.7 from the post-starburst catalog of Goto (2007), specifically targeted for its high-S/N spectrum and substantive emission-line infilling. We fit the ugriz photometry and optical spectrum (Smee et al. 2013) from SDSS Data Release 16 (Ahumada et al. 2020). A noise floor of 5% is enforced on the photometry consistent with uncertainties in the underlying stellar models.

A 19 parameter physical model is used. The total mass, stellar metallicity, and velocity dispersion are set free. The redshift is allowed to vary within 0.01 around the SDSS catalog value of z = 0.073. The dust model is the two-parameter Charlot & Fall (2000) model, plus a flexible attenuation curve (Noll et al. 2009) with the UV bump tied to the slope of the curve (Kriek & Conroy 2013). An eight-bin nonparametric SFH is used, with the time bins spaced logarithmically and a continuity prior applied (Leja et al. 2019). Nebular emission is implemented using the Byler et al. (2017) grid, with metallicity, ionization parameter, and nebular velocity dispersion set free. Nebular line marginalization is also turned on, as described in Appendix E; in brief, the amplitude of each emission line is marginalized at each model call to fit the data. The prior on the emission-line fluxes is taken as a Gaussian, with both the center and the standard deviation taken to be the expected line luminosities from the Byler et al. (2017) physical model. A polynomial vector is used to normalize out the continuum during each model call, following Section 2.3. Finally, the pixel outlier model and jitter term from Appendix D are both adopted.

The results of this fit are shown in Figure 8. Prospector is able to fit the data with high precision; the best-fit model produces excellent χ² results for both the photometry and the spectrum. Though they are included in the fit, neither the spectral noise inflation nor the pixel outlier model is assigned any significant weight. As expected, a significant post-starburst event is detected within the previous ∼300 Myr, suggesting a factor of 20 decrease in the ongoing rate of star formation. The mass-weighted age is 1.11 Gyr, indicating a relatively short formation phase for this galaxy, and the dust attenuation is significant at A_V = 0.74.

The spectral fit includes a zoom-in on the Hα and [N ii] doublet. The [N ii]/Hα ratio is very high, as is typical of post-starburst objects (Yan et al. 2006; Yang et al. 2006; French et al. 2015). This is consistent with SDSS J133751.29+354755.7 being a LINER; in such objects, the nebular emission lines are thought to be partially or fully powered by evolved stars, AGNs, or shocks (e.g., Kewley et al. 2006). This is in contrast to the built-in assumption in FSPS that emission lines are powered by young stars only. This difference has the potential to cause two distinct issues: first, additional young stars would have to be added in order to fully power this emission, and second, the observed [N ii]/Hα ratio is challenging and likely impossible to produce with star formation alone. Indeed, the prediction from the CLOUDY grid is indicated in blue and is a poor fit to the data. The nebular marginalization technique, indicated in red, solves both of these issues by allowing flexible extensions from the standard CLOUDY grid. It is suggested to allow nebular marginalization in spectral fits whenever there is a chance that the emission lines are powered by sources that do not lie on the standard CLOUDY grid.

5. Discussion

5.1. Lessons Learned and Practical Advice

The development of Prospector began in 2014, and it has undergone continual development, refinement, and refactoring since then. This development is usually guided by areas of practical focus, as the diversity and complexity of questions and data to which SED fitting is applied make general solutions nearly impossible. Because of this, Prospector has tended to become more modular with time, in order to maintain flexibility while adding new capabilities. A benefit of this modularity is that changing or extending the code to tackle new and more detailed data is relatively straightforward. In a sense, Prospector is a toolbox for SED fitting. The tradeoff for flexibility is often computational speed; some tasks that for particular problems could be optimized with a different code structure are instead less efficiently implemented in a modular structure.

Priors are an essential ingredient in SED fitting. The SEDs of single-age stellar populations change subtly with both age and metallicity, making SFH inference from an integrated SED particularly difficult. The SEDs clearly contain information; the question is how to extract the available information in the face of substantial degeneracies. The solution is to impose priors on the SFH, either explicitly or through a particular SFH parameterization, that minimize the impact of these degeneracies where the data are not informative yet still allow the data to "speak." Nevertheless, it is important to map the remaining degeneracies and to consider the effect of the adopted prior.

Priors on dust emission parameters can also play an important role when the far-IR SED is incompletely sampled. The inferred total infrared luminosity depends sensitively on these parameters in this case, and the imposed energy balance between dust emission and dust attenuation can affect many other parameters (e.g., Appendix C of Leja et al. 2017). It is important to adopt priors that reflect prior knowledge about the distribution of these parameters, for example, derived from subsamples with more complete IR coverage. Through the Bayesian framework, this prior knowledge can be propagated through to the posterior PDFs.

While many of the uncertainties and degeneracies that can be explored through a Bayesian framework are important to consider, significant systematic uncertainties in the stellar population synthesis models remain. The modularity of FSPS grants the ability to explore some of these as model parameters, for example, the temperature distribution of horizontal branch stars or the luminosity of the TP-AGB phase. However, many other uncertainties are "baked in" to the stellar population codes: these include binary evolution, stellar rotation, ionizing fluxes and, more generally, the blue/UV spectra of low-metallicity populations, other isochrone details, and the detailed physics of nebular emission. Results that depend on the details of these calculations should be treated with caution.

5.2. Future Directions

As described above, Prospector is under continual development, and we expect to add new capabilities in the future. Of special emphasis is computational speed. The computational time required is set by the product of two factors: the speed at which stellar population models are constructed and the number of models required to adequately sample the posterior distribution (which increases with the dimensionality of the model). In the former category, we are exploring the use of emulators (e.g., Alsing et al. 2020) to produce models quickly, with the disadvantage that the emulator must be trained on a precomputed (and therefore largely inflexible) grid. In the latter category, we are considering new Monte Carlo techniques based on gradients. Such techniques will also allow larger model dimensionality.

While the underlying FSPS stellar population synthesis models used by Prospector include most physical components of a galaxy—stellar emission, nebular emission due to stellar photoionization, dust attenuation, dust emission, and the emission of AGN dust torii—additional components may be added. These include ISM absorption, nebular emission from shocked ISM gas, free–free emission from H ii regions, radio synchrotron emission, and X-ray through optical emission from AGNs. Additionally, Lyα radiative transfer models may prove important for modeling high-redshift galaxy spectra in detail. For some of these components, straightforward ab initio physical models exist. For others, empirical relationships may have to be adopted.

We will also pursue improved treatment of the existing components. For the stellar populations, this includes the addition of α-element enhancement to the underlying SPS models in both the isochrones and the spectra. We also plan to expand the capability to model age-dependent stellar metallicity and dust attenuation, as well as additional SFH forms. Finally, more flexible and self-consistent nebular emission models may prove useful.

Performing optimized SFH inference is a perennial challenge. The SFH (=SFR(t, Z)) is fundamentally a density; inference of densities from samples or integrals thereof is a notoriously difficult, and perhaps even ill-posed, problem. A variety of solutions to the problem have been proposed. These can all be considered various choices about the priors that are placed on the form or properties of the SFH. Ideally, we would combine a prior with a very large number of time bins, or anchor points, without consideration of the uniqueness of the spectra for each bin. Then, the data would be able to dictate which bins (if any) were well constrained while the prior would take over where the data do not provide information. However, sampling in such high-dimensional spaces is difficult, though solutions based on scalable gradient-based sampling approaches (Hamiltonian Monte Carlo) are under development. Additionally, more complex priors on the shape of the SFH—for example, corresponding to the correlation function between SFRs in different time periods (e.g., Kelson et al. 2016; Tacchella et al. 2020)—may be implemented.

Multiple spectra described by the same physical parameters but substantially different instrumental parameters (e.g., spectral resolution or calibration) cannot yet be fit simultaneously in Prospector, though this capability will be added in the future. This can occur due to multiple exposures of the same object taken under different observing conditions, or to spectra from different instruments or spectroscopic orders, or both.

In summary, as the available data increase in quality and diversity, SED models and inference techniques have had to improve as well, while growing in complexity. Significant progress has been made, leading to the ability to model and make inferences from many different kinds and combinations of galaxy observations. Efforts are underway to tackle the remaining challenges. These efforts are necessary to fully exploit the information content of rich data sets that will be provided by future facilities, such as the James Webb Space Telescope.

We are grateful for discussions early in the project with Dan Weisz, Dan Foreman-Mackey, and David Hogg, as well as conversations with Adam Carnall and Sandro Tacchella. We are also grateful for early testing of the code and suggestions by John Moustakas, Antara Basu-Zych, Dylan Nelson, Imad Pasha, Tom Zick, Song Huang, and Johnny Greco. B.D.J. thanks Catherine Zucker for assistance with the animated figure. B.D.J. and C.C. acknowledge support from the Packard Foundation and NSF grants AST-1313280 and AST-1524161. This research made extensive use of NASA's Astrophysics Data System Bibliographic Services. Computations in this paper were run on the FASRC Cannon cluster supported by the FAS Division of Science Research Computing Group at Harvard University.

Funding for the Sloan Digital Sky Survey IV has been provided by the Alfred P. Sloan Foundation, the US Department of Energy Office of Science, and the Participating Institutions. SDSS acknowledges support and resources from the Center for High-Performance Computing at the University of Utah. The SDSS website is www.sdss.org.

Software: prospector (v1.0.0, Johnson et al. 2021), sedpy (v0.2.0, Johnson 2019), dynesty (v1.0.0, Speagle 2020), emcee (v2.2.1, Foreman-Mackey et al. 2013), FSPS (v3.1, Conroy et al. 2009, 2010), python-FSPS (v0.3.0, Foreman-Mackey et al. 2014), Astropy (v4.0, Astropy Collaboration et al. 2013, 2018), NumPy (v1.19.1, Harris et al. 2020), SciPy (v1.5.2, Virtanen et al. 2020), matplotlib (v3.3.0, Hunter 2007), IPython (Perez & Granger 2007), f2py (Peterson 2009).

Appendix A: FSPS Updates

There have been many updates and improvements to the FSPS code since 2010. In this section, we provide a brief overview of the main features now available in the code.

Isochrone libraries are available from the MIST (Choi et al. 2016), PARSEC (Bressan et al. 2012), BaSTI (Cordier et al. 2007), Padova (Marigo et al. 2008), Geneva (Meynet et al. 1994), and BPASS (Eldridge et al. 2017) models. The BPASS models are provided as SSPs and are therefore treated differently from the other isochrone sets: for example, the isochrone-based parameters that affect the strength of the TP-AGB cannot be applied to the BPASS models, nor can the IMF be changed.

As in Conroy & Gunn (2010), the available base spectral libraries include BaSeL and MILES. Those libraries are supplemented with the TP-AGB library from Lançon & Wood (2000), the post-AGB library from Rauch (2003), and the WMBASIC hot star library from Eldridge et al. (2017).

There are a variety of options for the dust absorption model including the Milky Way extinction curve of Cardelli et al. (1989) with a variable slope and UV bump strength, empirical attenuation curves from Calzetti et al. (2000) and Kriek & Conroy (2013), the two-component dust model of Charlot & Fall (2000) with separate normalization of each component and adjustable attenuation curve slopes, and the radiative transfer models from Witt & Gordon (2000). "Diffuse" dust emission is provided by the Draine & Li (2007) templates. Circumstellar dust around AGB stars is implemented via the Villaume et al. (2015) templates, while dust associated with AGN is modeled using a simplified implementation of the Nenkova et al. (2008b) templates.

Nebular emission associated with H ii regions is included using Cloudy lookup tables computed by Byler et al. (2017). Attenuation by the IGM follows the prescription described in Madau (1995), with an optional overall scaling factor to account for line-of-sight variations.

Appendix B: SFH Integrals and Parametric Forms

The fundamental equation for computing the spectra of composite stellar populations is

$\begin{eqnarray}&&{F}_{\lambda }={\int }_{0}^{{t}_{\mathrm{univ}}}{dt}{\int }_{{Z}_{\min }}^{{Z}_{\max }}{dZ}\,\psi (t,Z)\,{s}_{\lambda }(t,Z)\,{e}^{-{\tau }_{\lambda }(t,Z)}\end{eqnarray} \tag{ B1 }$

where t is the lookback time, ψ(t, Z) is the SFH, s_λ(t, Z) is the spectrum of a unit mass of stars of age t and metallicity Z, and τ_λ(t, Z) is the effective dust opacity toward stars of age t and metallicity Z.

In practice, this is approximated by a sum over discrete SSPs. These SSPs are the s_λ(t, Z) for J specific values of t (indexed as t_j) and K specific values Z (indexed as Z_k). They are calculated using isochrones and stellar spectral libraries. The sum is then

$\begin{eqnarray}&&{F}_{\lambda }=\displaystyle \sum _{j=1}^{J}\,\displaystyle \sum _{k=1}^{K}\,{m}_{j,k}\,{s}_{\lambda ,j,k}\,{e}^{-{\tau }_{\lambda ,j,k}},\end{eqnarray} \tag{ B2 }$

$\begin{eqnarray}&&{s}_{\lambda ,j,k}={s}_{\lambda }({t}_{j},{Z}_{k}),\end{eqnarray} \tag{ B3 }$

where the m_j,k are the formed masses or weights of each SSP.

It is tempting to use trapezoidal integration of the SFR at t_j, Z_k to calculate these weights. However, if the SFR has a discontinuous derivative or is highly nonlinear between the SSP points, this can lead to substantial errors. Increasing the effective temporal resolution of the SSPs via interpolation can mitigate (but not solve) this problem, at the expense of the increased memory needed to store the interpolated SSPs and the increased computational time needed to sum them with the appropriate weights.

A more robust and scalable solution is to integrate the product of the interpolation weights and the SFH. That is, if a spectrum at time t_j < t < t_j+1 can be written as

$\begin{eqnarray}\begin{array}{rcl}{s}_{\lambda }(t) & = & {a}_{j}(t)\,{s}_{\lambda }({t}_{j})+{b}_{j+1}(t)\,{s}_{\lambda }({t}_{j+1}),\\ {a}_{j}(t) & = & \displaystyle \frac{\mathrm{log}({t}_{j+1})-\mathrm{log}(t)}{\mathrm{log}({t}_{j+1})-\mathrm{log}({t}_{j})},\\ {b}_{j+1}(t) & = & \displaystyle \frac{\mathrm{log}\,t-\mathrm{log}({t}_{j})}{\mathrm{log}({t}_{j+1})-\mathrm{log}({t}_{j})},\end{array}\end{eqnarray} \tag{ B4 }$

where we have chosen a linear interpolation in $\mathrm{log}\,t$ (because changes in the SSP fluxes are generally linear in $\mathrm{log}\,t$ , not in linear time), then we wish to calculate

$\begin{eqnarray}&&{m}_{j}={\int }_{{t}_{j}}^{{t}_{j+1}}{dt}\,{\rm{\Psi }}(t)\,{a}_{j}(t)+{\int }_{{t}_{j-1}}^{{t}_{j}}{dt}\,{\rm{\Psi }}(t)\,{b}_{j}(t).\end{eqnarray} \tag{ B5 }$

This gives the contribution to the weights of the SFR in the bins that are older and younger than t_j, and for parametric SFHs, this can be solved analytically for each j. This is what is used in the FSPS code for the available parametric SFHs, as of v3.0. Combinations of parameterized SFHs can be easily considered by calculating the above quantity for each component and summing the m_j values with the appropriate scaling to retain the proper mass fractions in each component. Similarly, for tabular SFHs, we make the assumption that the SFR is a linear function between the tabulated time points and solve the equation above for each t_j and segment separately, accumulating the weights m_j from each segment.

Appendix C: Smoothing

When comparing model galaxy spectra to observed spectra, there are (at least) three spectral smoothing resolutions (or smoothing kernels) that must be considered. These are (1) the line-of-sight velocity distribution (LOSVD) of stars in the rest frame of the galaxy (L_g); (2) the observed-frame LSF of the instrument (L_i); and (3) the rest-frame LSF of the model spectral library (L_ℓ). Smoothing the model spectrum, these combine to create the LSFs of the data (L_d) and the model (L_m) as follows:

$\begin{eqnarray}\begin{array}{rcl}{L}_{d}({\lambda }_{o}) & = & {L}_{i}({\lambda }_{o})\,\otimes \,{L}_{g}({\lambda }_{r}),\\ {L}_{m}({\lambda }_{o}) & = & {L}_{{\ell }}({\lambda }_{r})\,\otimes \,{L}_{k}({\lambda }_{o}),\\ {\lambda }_{o} & = & (1+z)\,{\lambda }_{r},\end{array}\end{eqnarray} \tag{ C1 }$

where λ_o is the observed-frame wavelength, λ_r is the rest-frame wavelength, L_k is the kernel corresponding to extra smoothing applied while modeling the data, and ⨂ denotes convolution. While the LOSVD of the galaxy can be handled by straightforward and fast smoothing in velocity space (e.g., Cappellari 2017), the instrument and library resolution are both potentially complicated functions of observed-frame and rest-frame wavelength, respectively. This can be difficult to implement in a speedy convolution algorithm.

When L_i = L_ℓ or both are narrow compared to L_g, they can be ignored and the best match between the data and model is obtained when L_k ∼ L_g. Varying L_k to determine the best match can be done using FFT-based smoothing in velocity space. However, when the LOSVD of the galaxy is comparable to or only slightly broader than the instrumental or library LSF (at any observed wavelength), then these latter effects must be taken into account. The possible wavelength dependence of the instrumental and library resolutions makes this difficult, as standard FFT-based algorithms do not admit smoothing kernels that are wavelength dependent. The usual solution is to use slower brute-force convolution algorithms to smooth the library spectra once, before any other calculations are made or any fitting is done, such that $L{{\prime} }_{{\ell }}({\lambda }_{r})\sim {L}_{i}({\lambda }_{o})$ . This is possible provided the library everywhere has a higher resolution than the instrument, that L_i is well known, and that the redshift of the model does not vary significantly.

If any of these conditions are not met—for example, if one wishes to infer and marginalize over uncertainties in the instrumental LSF—then another approach must be used. A solution is to include, in each generation of the model, a convolution by the kernel L_k defined by the difference between the rest-frame library resolution and the observed-frame instrumental and physical resolution. This potentially wavelength-dependent difference kernel may depend on the modeled systemic velocity. Because the convolution is done at each model generation step, this scheme also allows for simultaneous fitting of the LSF.

In Prospector, we allow for both options: either smoothing the library once, before fitting, by the difference between the observed-frame instrumental and rest-frame library resolution or smoothing at each model generation step by this difference, calculated on the fly for the model systemic velocity, velocity dispersion, and instrumental LSF parameters. We assume that both the instrumental and library resolution can be approximated by Gaussians at each wavelength so that the difference kernel is also represented by a wavelength-dependent Gaussian.

It is desirable to have algorithms that can quickly and accurately produce spectra smoothed by a wavelength-dependent kernel, so that instrumental smoothing can be done during the fitting process in a flexible way. Because the difficulty with FFT-based algorithms in this case is that the kernel is varying as a function of wavelength, the solution is to resample the model spectrum onto a space in which the kernel is not varying (analogous to resampling a spectrum from a regular wavelength grid to a regular velocity grid when smoothing by an LOSVD).

Assuming the LSF can be described by a Gaussian, this can be accomplished by defining a coordinate system from the cumulative distribution function of the inverse of the dispersion as a function of wavelength:

$\begin{eqnarray}&&x(\lambda )\equiv K\,{\int }_{{\lambda }_{\min }}^{\lambda }\displaystyle \frac{d\lambda ^{\prime} }{\sigma (\lambda ^{\prime} )},\end{eqnarray} \tag{ C2 }$

where λ is the wavelength, σ(λ) is the width of the Gaussian LSF at any wavelength, and K is a normalization constant. A regular sampling in x will provide an equal number of samples per σ, even as σ changes with wavelength. Thus, convolving a spectrum resampled onto an even grid in x with a single Gaussian (in x) will produce a spectrum with a smoothing in λ that is λ dependent.

The integral that defines x can be done analytically or, for more flexibility, numerically. The constant K and the final spacing in x should be chosen so that the interpolated spectrum adequately samples at least the output LSF, and preferably the input LSF, for some definition of adequate. Such an algorithm is implemented within Prospector. Further improvements to the smoothing functionality in Prospector may include a larger variety of smoothing kernels (including higher-order LOSVD moments and kernels appropriate for pixelization).

Appendix D: Alternative Noise Models

In addition to the default assumption of known Gaussian independent noise described in Section 2.4, we have also implemented more complex likelihood calculations. These allow for properties of the noise itself to be inferred, or for the presence of correlated noise to be modeled. This is accomplished through a flexible noise covariance matrix construction. The ln-likelihood for the more complex noise model is

$\begin{eqnarray}&&\mathrm{ln}{ \mathcal L }(d\,| \,{\rm{\Theta }},\gamma )=-\displaystyle \frac{1}{2}\left[{{\rm{\Delta }}}^{{\mathsf{T}}}\,{C}^{-1}(\gamma )\,{\rm{\Delta }}+\mathrm{ln}| | 2\pi \,C(\gamma )| | \right]\end{eqnarray} \tag{ D1 }$

$\begin{eqnarray}&&{\rm{\Delta }}=d-\mu ({\rm{\Theta }}),\end{eqnarray} \tag{ D2 }$

where d is the data vector of length n, μ is the mean physical and instrumental model that depends on the parameters Θ, Δ is the residual, and C is the variance–covariance matrix that depends on the noise model parameters γ.

The parameters Θ generally include the physical parameters θ. They may also include instrumental effects such as wavelength-scale distortions or instrumental resolution as parameters ϕ.

In Prospector, the covariance matrix is specified as a sum of weighted kernels applied to given metrics, where the kernels, weight vectors, and metrics can be specified at run time. More precisely, we use the following formulation of C:

$\begin{eqnarray}&&{C}_{i,j}=\displaystyle \sum _{{\ell }}{w}_{i,{\ell }}\,{w}_{j,{\ell }}\,{K}_{{\ell }}({x}_{i},{x}_{j},{\gamma }_{{\ell }}),\end{eqnarray} \tag{ D3 }$

where w is the weight vector of length n, x is the metric (also of length n), K_ℓ is the ℓth kernel, and γ_ℓ are the noise parameters for that kernel.

The kernels K can in principle be anything that results in a positive semidefinite matrix when multiplied by the weight vectors w. The following are available by default in Prospector:

1.
uncorrelated: K(x_i, x_j ∣ a) = a δ_i,j. This is a simple diagonal matrix parameterized by an amplitude a for the diagonal elements. All other elements are 0.
2.
photocal: K(x_i, x_j ∣ a) = a for i, j corresponding to user-specified photometric bands. This adds correlated noise to these bands with amplitude a. All other matrix elements are 0.
3.
exp-squared: $K({x}_{i},{x}_{j}\,| \,a,L)=a\,{e}^{-{\left({x}_{i}-{x}_{j}\right)}^{2}/(2{L}^{2})}$ . This is the exponential squared kernel, parameterized by an amplitude a and scale length L.

A simple use for this more complicated noise model is to incorporate uncertainty in the noise estimates themselves. This is accomplished by allowing the matrix C to depend on parameters γ that are then inferred and can be marginalized over. For example, to allow for the possibility that the supplied uncertainties are underestimated by a constant but unknown factor b, a diagonal kernel K₁(x_i, x_j) = b² δ_i,j can be combined with weights w_i,1 given by the nominal supplied uncertainties, where δ_i,j is the Kronecker δ. Increasing b will decrease the first term in the likelihood of Equation (D1), but this is eventually balanced by an increase in the normalization given by the determinant of C, preventing the uncertainties from becoming arbitrarily high. Inferring such an inflation in the uncertainties can be useful for providing more accurate final parameter uncertainties. An additive noise component, i.e., a noise floor, can also be incorporated with an additional diagonal kernel K₂(x_i, x_j) = a δ_i,j and weights w_i,2 = 1.

The covariance matrix can be used to account for correlated noise in the data. Correlated noise can be induced during extraction of 2D spectra, smoothing of the data, or interpolation to a particular wavelength grid (e.g., Bolton & Schlegel 2010). When the pixel covariances are known from error propagation during extraction, they can be supplied in matrix form as the kernel K, with weights w_i = 1.

Sky subtraction residuals are likely to be correlated over at least several pixels and may also be treated as additive covariant noise (e.g., Carnall et al. 2019b). The scale length and amplitude of these correlations may be supplied or inferred.

Finally, the use of weight vectors also allows calibration to be modeled as covariant or correlated noise. Here the covariant noise may be due to calibration uncertainties or to inaccuracies in the spectral models. Because calibration is multiplicative with respect to flux, modeling calibration noise can be accomplished by using the predicted model flux vector μ(Θ) as the weight vector w. This approach was used for photometry, where calibration uncertainties from different instruments or observatories are often correlated, by Gordon et al. (2014). It can be extended to spectroscopic data. This approach has the benefit of being more flexible than polynomials or other parameterized functions, which will introduce systematic effects if they are not a good representation of the actual spectrophotometric calibration function. It also has the benefit that a model where the data can be described well purely by parameters unrelated to the calibration (e.g., absorption lines) will be a higher likelihood than one where correlated noise is necessary. This mitigates the problem where physical information in the data may be lost to the spectrophotometric calibration model if the latter is overly flexible. However, this approach requires a realistic model for the correlations induced by miscalibration, which can be complex, multiscale, and must be separated from correlations induced solely by data processing (e.g., by interpolation or spectral extraction). This approach may also incur a high computational cost.

Finally, we also include a rudimentary outlier model in Prospector. This model approximately accounts for the presence of data that deviates from the model much more than should be allowed by the noise, due, for example, to cosmic rays or strong, unmasked sky subtraction residuals. The outliers are modeled as a small fraction of data points that are drawn from a normal distribution centered on the model fluxes but with much larger than the nominal supplied uncertainties. By analytically marginalizing over the exact location of the outliers (e.g., Hogg et al. 2010), this can be expressed as

$\begin{eqnarray}\begin{array}{rcl}\mathrm{ln}p^{\prime} (d\,| \,{\rm{\Theta }},\beta ) & = & (1-{f}_{\mathrm{out}})\mathrm{ln}p(d\,| \,{\rm{\Theta }},\beta )\\ & & +\ {f}_{\mathrm{out}}\,\displaystyle \sum _{i}\displaystyle \frac{{d}_{i}-{\mu }_{i}({\rm{\Theta }})}{2\,{\left({s}_{\mathrm{out}}{\sigma }_{i}\right)}^{2}},\end{array}\end{eqnarray} \tag{ D4 }$

where f_out is the fraction of outlier data points, and s_out is the inflation of the nominal noise for outlier points, which defaults to 50. Both of these parameters may be supplied or inferred. At the present time, this outlier model cannot be used with correlated noise models.

Appendix E: Emission-line Marginalization

As described in Appendix A the emission-line model included with FSPS is based on Cloudy (Ferland et al. 2013; Byler et al. 2017). This treatment is sufficient in many contexts, but deviations of emission-line ratios and luminosities from the two-parameter model implemented there may occur for a variety of reasons. These include the presence of AGNs, shocks, or other nonstellar ionizing sources, abundance variations, and complicated geometries. Furthermore, the kinematics of ionized gas may be different from that of the stars. It is therefore desirable to include some additional flexibility in the emission-line modeling.

This is a challenge; there are 128 emission lines included in FSPS, and it is not practical to sample directly for the amplitude of each line. With some reasonable assumptions, it is possible to introduce analytic Gaussian emission-line modeling into the Prospector likelihood which does not require explicitly sampling for emission-line amplitudes. When the prior on the amplitudes is also Gaussian, the posterior probability for the amplitude is also Gaussian and can be analytically marginalized in each likelihood call. The emission-line width and central wavelength offset are sampled for as well; the offset and width are assumed to be the same for each emission line to ease the sampling, though this assumption can be relaxed by the user if necessary.

For clarity, we define here a Gaussian distribution ${ \mathcal N }$ of the variable x with mean μ and covariance matrix Σ as

$\begin{eqnarray}&&{ \mathcal N }({\boldsymbol{x}}\,| \,{\boldsymbol{\mu }},{\boldsymbol{\Sigma }})=\displaystyle \frac{1}{\sqrt{| | 2\,\pi \,{\boldsymbol{\Sigma }}| | }}\,{e}^{-\tfrac{1}{2}{\left({\boldsymbol{x}}-{\boldsymbol{\mu }}\right)}^{{\mathsf{T}}}{{\rm{\Sigma }}}^{-1}({\boldsymbol{x}}-{\boldsymbol{\mu }})}.\end{eqnarray} \tag{ E1 }$

E.1. Analytically Marginalizing over Emission Lines

First, we write the likelihood of the data, conditioned on the parameters θ, marginalized over several emission-line amplitudes α_j, and including the prior on these amplitudes, as

$\begin{eqnarray}&&{ \mathcal L }(D\,| \,\theta )=\int \,d{\boldsymbol{\alpha }}\,{ \mathcal L }(D\,| \,\theta ,{\boldsymbol{\alpha }})\,p({\boldsymbol{\alpha }}),\end{eqnarray} \tag{ E2 }$

where D is the data, θ are the parameters of the model not including the line amplitudes, and α is the vector of all line amplitudes {α_j}. We can then write the log-likelihood, conditional on both θ and α , as

$\begin{eqnarray}\begin{array}{rcl}\mathrm{ln}{ \mathcal L }(D\,| \,\theta ,{\boldsymbol{\alpha }}) & = & {\left({\boldsymbol{\Delta }}-{\boldsymbol{G}}{\boldsymbol{\alpha }}\right)}^{{\mathsf{T}}}\,{{\rm{\Sigma }}}^{-1}\,({\boldsymbol{\Delta }}-{\boldsymbol{G}}\,{\boldsymbol{\alpha }}),\\ {\boldsymbol{\Delta }} & = & \{{D}_{\lambda }-{M}_{\lambda }\},\\ {\boldsymbol{G}} & = & \{{G}_{\lambda },j\},\\ {G}_{\lambda ,j} & \equiv & { \mathcal N }(\lambda \,| \,(1+z)\,{\lambda }_{0,j},\displaystyle \frac{{\sigma }_{v}}{c}\,(1+z)\,{\lambda }_{0,j}),\end{array}\end{eqnarray} \tag{ E3 }$

where Σ is the variance–covariance matrix of the data, M_λ is the model prediction without emission lines, Δ is a vector of residuals from the line-free model, G is a normalized Gaussian profile with a given width σ_v and center (specified as part of θ).

The key is that by using α_j only for the amplitude of the lines (i.e., not for the shape or center), the conditional log-likelihood is linear with respect to α . The likelihood can then be factored into two components. The first component is the likelihood of the model conditional on the maximum-likelihood estimate (MLE) of the line amplitudes, $\hat{{\boldsymbol{\alpha }}}$ . This scales the second component, which is a multidimensional Gaussian describing the dependence of the likelihood on the line amplitudes α . This Gaussian is centered on $\hat{{\boldsymbol{\alpha }}}$ and has a width described by the covariance matrix ${{\rm{\Sigma }}}_{\hat{{\boldsymbol{\alpha }}}}$ . Our expression for the likelihood function is then

$\begin{eqnarray}&&{ \mathcal L }(D\,| \,\theta ,{\boldsymbol{\alpha }})={ \mathcal L }(D\,| \,\theta ,\hat{{\boldsymbol{\alpha }}})\,\displaystyle \frac{{ \mathcal N }({\boldsymbol{\alpha }}\,| \,\hat{{\boldsymbol{\alpha }}},{{\rm{\Sigma }}}_{\hat{{\boldsymbol{\alpha }}}})}{{ \mathcal N }(\hat{{\boldsymbol{\alpha }}}\,| \,\hat{{\boldsymbol{\alpha }}},{{\rm{\Sigma }}}_{\hat{{\boldsymbol{\alpha }}}})}.\end{eqnarray} \tag{ E4 }$

The denominator is a non-unity value required to make the equality hold for ${\boldsymbol{\alpha }}=\hat{{\boldsymbol{\alpha }}}$ . When the prior is uniform (p( α ) is constant), the integral in the marginalized likelihood of Equation (E2)—ignoring a constant factor that is independent of θ and α —reduces to

$\begin{eqnarray}&&{ \mathcal L }(D\,| \,\theta )=\displaystyle \frac{{ \mathcal L }(D\,| \,\theta ,\hat{{\boldsymbol{\alpha }}})}{{ \mathcal N }(\hat{{\boldsymbol{\alpha }}}\,| \,\hat{{\boldsymbol{\alpha }}},{{\rm{\Sigma }}}_{\hat{{\boldsymbol{\alpha }}}})}.\end{eqnarray} \tag{ E5 }$

We now derive the MLE for α . By taking the derivative of the likelihood with respect to α and setting the result to 0, we find the MLE value $\hat{{\boldsymbol{\alpha }}}$ as

$\begin{eqnarray}&&\hat{{\boldsymbol{\alpha }}}={\left({{\boldsymbol{G}}}^{{\mathsf{T}}}{{\rm{\Sigma }}}^{-1}{\boldsymbol{G}}\right)}^{-1}\,{{\boldsymbol{G}}}^{{\mathsf{T}}}\,{{\rm{\Sigma }}}^{-1}\,{\boldsymbol{\Delta }}.\end{eqnarray} \tag{ E6 }$

We are also interested in the widths or uncertainty on the MLE value, described by the covariance matrix ${{\rm{\Sigma }}}_{\hat{{\boldsymbol{\alpha }}}}$ . This is obtained from the inverse second derivative of the likelihood with respect to α , resulting in

$\begin{eqnarray}&&{{\rm{\Sigma }}}_{\hat{{\boldsymbol{\alpha }}}}={\left({{\boldsymbol{G}}}^{{\mathsf{T}}}{{\rm{\Sigma }}}^{-1}{\boldsymbol{G}}\right)}^{-1}.\end{eqnarray} \tag{ E7 }$

Notably, by doing this calculation in vector space, we explicitly model possible covariances between the line amplitudes; this is important in cases where the emission lines overlap.

Using the MLE solutions for the emission-line amplitudes is correct for a uniform prior on α . In some cases, an informative prior on α is desirable. This is discussed in the next section.

E.2. Gaussian Priors on the Emission-line Amplitudes

To incorporate priors we can multiply the factorized likelihood in Equation (E4) by the prior. This is straightforward if the prior is another Gaussian with mean $\breve{{\boldsymbol{\alpha }}}$ and covariance ${{\rm{\Sigma }}}_{\breve{{\boldsymbol{\alpha }}}}$ : that is, $p({\boldsymbol{\alpha }})={ \mathcal N }({\boldsymbol{\alpha }}\,| \,\breve{{\boldsymbol{\alpha }}},{{\rm{\Sigma }}}_{\breve{{\boldsymbol{\alpha }}}})$ . The product of two Gaussians is another Gaussian; using well-known formulae for the products of normal distributions, we obtain

$\begin{eqnarray}\begin{array}{rcl}{ \mathcal L }(D\,| \,\theta ,{\boldsymbol{\alpha }})p({\boldsymbol{\alpha }}) & = & { \mathcal L }(D\,| \,\theta ,\hat{{\boldsymbol{\alpha }}})\,\displaystyle \frac{{ \mathcal N }(\hat{{\boldsymbol{\alpha }}}\,| \,\breve{{\boldsymbol{\alpha }}},{{\rm{\Sigma }}}_{\hat{{\boldsymbol{\alpha }}}}+{{\rm{\Sigma }}}_{\breve{{\boldsymbol{\alpha }}}})}{{ \mathcal N }(\hat{{\boldsymbol{\alpha }}}\,| \,\hat{{\boldsymbol{\alpha }}},{{\rm{\Sigma }}}_{\hat{{\boldsymbol{\alpha }}}})}\\ & & \times { \mathcal N }({\boldsymbol{\alpha }}\,| \,\bar{{\boldsymbol{\alpha }}},{{\rm{\Sigma }}}_{\bar{{\boldsymbol{\alpha }}}}),\end{array}\end{eqnarray} \tag{ E8 }$

where $\bar{{\boldsymbol{\alpha }}}$ and ${{\rm{\Sigma }}}_{\bar{{\boldsymbol{\alpha }}}}$ are the mean and covariance of the new Gaussian. In fact, this new Gaussian does not need to be evaluated, as the integral in the marginalized likelihood of Equation (E2) reduces to

$\begin{eqnarray}&&{ \mathcal L }(D\,| \,\theta )={ \mathcal L }(D\,| \,\theta ,\hat{{\boldsymbol{\alpha }}})\,\displaystyle \frac{{ \mathcal N }(\hat{{\boldsymbol{\alpha }}}\,| \,\breve{{\boldsymbol{\alpha }}},{{\rm{\Sigma }}}_{\hat{{\boldsymbol{\alpha }}}}+{{\rm{\Sigma }}}_{\breve{{\boldsymbol{\alpha }}}})}{{ \mathcal N }(\hat{{\boldsymbol{\alpha }}}\,| \,\hat{{\boldsymbol{\alpha }}},{{\rm{\Sigma }}}_{\hat{{\boldsymbol{\alpha }}}})}.\end{eqnarray} \tag{ E9 }$

We do not explore non-Gaussian priors, as these are considerably more difficult to implement.

E.3. Practical Usage

The Cloudy implementation in FSPS includes 128 emission lines; any or all of them may be optimized or marginalized over using this scheme. In practice, it is suggested to only optimize or marginalize over emission lines included in an observed spectrum. The marginalization calculation is fast and adds little computational overhead in most cases. Note that neither optimization nor marginalization respects emission-line physics. This means that, for example, emission-line doublets may be assigned forbidden ratios if the data prefer it. The added emission lines may also have negative luminosities. In objects with weak or no nebular emission, the user may wish to turn off the optimization or limit it to strong emission features to prevent it from instead erroneously modeling differences between the model and observed continuum. This is particularly true in cases where a weak or nonexistent emission line lies on top of an informative absorption feature, such as Hα or Hβ.

When applying the Gaussian priors, the center of the prior for each line is taken as the prediction from the Cloudy implementation. This means that the lines are assumed to be powered by normal star formation, but deviations are permitted. The width of the prior is specified by the user, in units of (true/predicted) luminosity.