1 Introduction

Today thousands of exoplanets have been confirmed, revealing a diverse population in size, mass, temperature and orbital properties. The characterization of exoplanets through spectroscopic analysis of their atmospheres is key to fully understanding the properties of the individual planets. Such observations can help to constrain theories of planet migration, formation and evolution, shedding light on the mechanisms underlying the great diversity seen in the exoplanet population.

Transit spectroscopy, theorized by Seager and Sasselov [39] and first demonstrated by Charbonneau et al. [6], has been the major technique used to obtain exoplanet spectra to date. This technique returns the transmission spectrum at the day-night terminator. It has been used to discover atmospheric atomic species such as sodium [6, 31], potassium [7], hydrogen [50] and helium [42], hazes [28] and clouds [17, 18], and molecular species including water [40, 46]. The similar technique of eclipse spectroscopy returns the dayside emission spectra of the exoplanet, and can provide constraints on the vertical temperature-pressure profile of the atmosphere. Eclipse spectra have detected molecular species including water, methane, CO and CO2 [14, 44, 45]. In phase-resolved emission spectroscopy [3, 43] multiple emission spectra are obtained as a function of phase.

Both transit and eclipse spectroscopy operate in the time domain, relying on high precision spectrophotometric light curve measurements. These require high levels of photometric stability to be maintained over the time scale of the planet transit or eclipse. The light curves are used to extract fractional transit depths at different wavelengths. The wavelength-dependent variations in these transit depths trace out the exoplanet spectrum. In primary transit, the transit depth gives (Rp/Rs)2, where Rp is the apparent radius of the planet and Rs is the radius of the star. In secondary eclipse, the eclipse depth gives Fp/Fs, i.e. the contrast ratio, where Fp is the flux from the planet, and Fs is the flux from the star.

These wavelength-dependent variations in transit or eclipse depths can give spectral amplitudes in the order of a few tens to hundreds of ppm of the stellar flux, depending on the planet in question. The detection of such small signals is thus highly vulnerable to noise and systematics. Determination of the experimental uncertainties require accounting for at a similar level of precision. This is complicated by the time domain nature of the observation which make it vulnerable to the effects of time-correlated (‘coloured’) noise and time-dependent systematics that may distort the light curve. Estimating the correct experimental uncertainties on the final light curve measurements and the emergent spectrum is essential for confidence of the final scientific conclusions.

To address the need for better estimation of experimental precision and accuracy, ExoSim, the Exoplanet Observation Simulator, was developed as a generic end-to-end simulator of transit spectroscopy observations. ExoSim is publicly available on GitHub.Footnote 1 ExoSim is designed to be flexible, versatile and applicable to different instruments. ExoSim operates dynamically, modelling the time domain directly in small steps. This gives it the potential to better capture the effects of correlated noise and systematics on the final spectrum. Dynamical simulation permits the investigation of complex noise sources such as pointing jitter or stellar activity on an observation. This gives ExoSim the potential to investigate the performance of both new and established instruments under complex conditions, and thus optimize the scientific potential of space missions.

2 ExoSim

ExoSim draws on the experience gained from the development and application of EChOSim [26], a simulator developed for the EChO mission concept [47]. It adopts a similar modular structure and nomenclature, and is written in Python 2.7. ExoSim however improves upon EChOSim in several ways, both in function and structure. The algorithmic differences are described in this paper. Although EChOSim was capable of simulating different instruments, the ease of changing the instrumental parameters is improved in ExoSim. ExoSim has also been more comprehensively validated than EChOSim.

EChOSim’s fidelity and versatility were in part limited by the fact it did not directly simulate the detector array in 2-D, and did not produce images using 2-D point-spread functions (PSFs). This limited its ability to realistically represent the effects of PSF abberations or spectral traces that had complex shapes or distributions. In addition detector field-dependent effects such as variations in dark current, hot pixels, or the impact of a cosmic ray could not be simulated. While ExoSim does not currently include all of these effects, it has been designed such that all these effects can be added to its basic structure. Furthermore the complex interaction between spatial and temporal processes was not captured with high fidelity in EChOSim, for example it could not simulate spectral jitter, or the blurring effects of intra-exposure jitter. In the design phases of space instruments, many alternate spacecraft and optical systems will be under consideration and thus a simulator such as ExoSim that has the framework to simulate a wide range of spatial and temporal processes and their interactions, will aid greatly at these stages. Additionally, a simulator that can produce realistic image outputs can be used to design and test data pipeline steps and full pipelines. EChOSim was not designed with such a goal in mind, whereas we wished ExoSim to be capable of producing synthetic data at a level of realism where it could be used for such a purpose.

The development of ExoSim has been driven by the Atmospheric Remote-sensing Exoplanet Large-survey (Ariel) mission [48], with the goal of producing a reliable, realistic and accurate end-to-end simulator. Ariel is a 0.9 m space telescope which will perform the first space-based and large scale survey of about 1000 exoplanets. The simulator was needed to direct and test design iterations of the science payload, and perform simulations to assess the impact of complex noise sources on scientific return. ExoSim was used extensively in the performance evaluation of Ariel during its Phase A study [36]. However ExoSim has been designed to be generic. This feature not only extends its use to other observatories, but also allows it to be validated against existing instruments which are already producing data.

In this paper we describe the ExoSim algorithm, highlighting where it differs from that previously described for EChOSim. We also present validation testing results. In terms of validation, we aimed to achieve noise simulations which matched a given validation comparison to level consistent with the expected statistical variation on the noise. For signal simulations, there may be many confounding factors that could lead to differences between ExoSim and comparisons, however we expected these differences to be small producing no significant differences in noise. We firstly compare ExoSim signal and noise against equations. The pointing jitter model is validated against an analytic expression and a simple independent simulation. We then perform a cross-validation of ExoSim with an independently-created radiometric simulator, the ESA Radiometric Model [48]. Finally, we compare ExoSim focal plane counts and simulated light curves with those from published studies using the Hubble Wide Field Camera 3 (WFC3) IR instrument.

3 Overview

ExoSim models the host star and planet transit event, simulating the temporal change in stellar flux due to this, i.e. the light curve. It does this in a wavelength-dependent manner, using an input planet spectrum to determine the light curve depth for any given wavelength. It also models the optical system, consisting of the telescope (‘common optics’), one or more instrument channels, and their associated detectors. The instrument channels can be spectroscopic or photometric. ExoSim simulates the modification of the signal as it passes through the optical system, and generates a focal plane image of the star, either as a photometric or a spectral image. ExoSim simulates an observation as a series of exposures consisting of up-the-ramp non-destructive reads (NDRs). The detectors are assumed to be mercury-cadmium-telluride infra-red detectors (e.g. Jerram and Beletic 16). Noise and systematics are simulated and added to the images. This image time series thus contains the planet transit event, as well as the effects of numerous noise sources and systematics. The inputs to ExoSim are an Input Configuration File, which contains user-defined parameters for the observation and instrument. This in turn references several instrument-specific reference files. The final output is a data cube of NDRs packaged in FITS format. This requires processing by a data reduction pipeline. The simulations can be run without a transit event (‘out-of-transit’), with various noise sources switched on or off, and can also be run as a Monte Carlo simulation to obtain a probability distribution of transit depths as a way of finding the ‘error bar’ on the spectrum. The data pipeline is not part of ExoSim- but is required to process the ExoSim output, extract the required signal and noise information, and reconstruct the planet spectrum. A list of inputs is given in Table 1 which correspond to the main user-adjustable parameters in the Input Configuration File.

Table 1 Input parameters for ExoSim as represented in the Input Configuration file

4 Algorithm

The ExoSim algorithm simulates a physical model which is described within the modular structure of the code (Fig. 1) as follows.

Fig. 1
figure 1

ExoSim modular architecture and informational flow. The core of the algorithm is completely generic. The Input Configuration File sets the simulation parameters and calls on instrument-specific reference files. Modules shown in purple can be upgraded. Object classes are shown in red circles. SpotSim, a dedicated star spot simulator, interacts with ExoSim in the Timeline generator module (indicated as ‘Timeline’ in the diagram)

4.1 Astroscene

The host star radiates isotropically with a surface flux density, Fs(λ). After reaching the telescope aperture at a distance D, the flux density falls to Ftel(λ) = Fs(λ)(Rs/D)2. The Astroscene module instantiates Star and Planet object classes. The Star class contains a PHOENIX stellar model [2] matched to the host star parameters, and calculates Ftel(λ).

As the planet transits the star, the flux from the star is modulated in time forming a light curve. The shape of the light curve depends on: z (the time grid of the projected distance between the centre of the star and the centre of the planet in units of the star radius), stellar limb-darkening, and the planet-star radius ratio. The latter two are wavelength-dependent parameters. In eclipse, the light curve depends on z and the contrast ratio between the star and the planet. z is a function of the period, the semi-major axis, the star radius, orbital inclination, eccentricity, and time. The Planet class calculates z from the orbital parameters and generates wavelength-dependent light curves using the formulation of Mandel and Agol [20]. An input planet spectrum is used to give the planet-star radius or flux ratios. This can be produced from third party radiative transfer models such as TauREx [51], NEMESIS [15] or CHIMERA [19].

Compared to EChOSim, ExoSim can now select star and planet parameters automatically from databases given a chosen planet identified in the Input Configuration File. ExoSim can also generate ‘integrated’ light curves, where the variation of the light curve within the integration time of an image is accounted for. This may be more accurate for long integrations than using ‘instantaneous’ light curves, where the value of the light curve at an instant of time is applied. Compared to EChOSim, ExoSim uses improved limb-darkening coefficients generated from ATLAS and PHOENIX models [22].

An additional astrophysical source of photons is the zodiacal light, which is modeled in the Zodi object class. The zodiacal light is a diffuse source (rather than a point source). ExoSim utilizes a zodi model based on Glasse et al. [13] and is processed in a similar way to that described in Pascale et al. [26].

4.2 Instrument

4.2.1 Telescope

The starlight then enters the telescope aperture of area Atel, to give a radiant power per unit wavelength, Ptel(λ) = AtelFtel(λ). It subsequently passes through or is reflected off a series of optical surfaces, e.g. the primary mirror, folding mirror, pickoff mirror, dichroics etc. These are termed the ‘common optics’. On encountering each optical surface, the stellar spectrum is attenuated by a transmission or reflectance factor, i.e. throughput. The power per unit wavelength after passing through the common optics is Pch(λ) = Ptel(λ)ηtel(λ), where ηtel(λ) is the total throughput of the common optics. The wavelength-dependent throughput of each optical surface is defined in a reference file for the specific instrument. This constitutes the telescope model and is implemented in the Instrument module. The module then continues the modulation of the signal as follows.

4.2.2 Channel

From the common optics, light is passed into one or more instrument channels, which are specialised optical systems designed to examine a specific aspect of the signal. ExoSim allocates a Channel object class to each instrument channel. Each channel has its own set of optical surfaces, which further attenuates the signal, with its own optical prescription. An instrument channel can be a spectrometer or photometer. Light may enter the spectrometer through a slit (to reduce background and stray light), or it may be slitless. It is then collimated, dispersed (by a grism, grating or prism) and then focussed onto the detector by camera optics. Different instrument channels may have different wavelength coverages, different spectral resolving (R) power for the dispersion element, and different detector characteristics. Here we do not discuss Fourier transform or integral field spectrometers as currently ExoSim does not simulate these. Throughput files are used for each channel optical surface. The power per unit wavelength reaching the detector is then Pdet(λ) = Pch(λ)ηch(λ), where ηch(λ) is the total throughput of the channel optics.

4.2.3 Detector

Before falling on the detector, the light has been convolved with the 2-D optical PSF (which is a function of wavelength), and in the case of a spectrum, a spectral trace is projected onto the detector. These factors determine the spatial distribution of photons on the detector array, i.e. the spectral image. The ‘wavelength solution’ for each pixel λ(x), where x is the pixel coordinate in the spectral direction, is obtained as an external input from optical modelling of the dispersion element. For each pixel column x, the corresponding value of λ(x) is used to generate a 2-D PSF. The volume of the PSF equals the power falling over the wavelength span of the pixel column. The PSF convolution is completed by the coadding of these 2-D PSF images onto a 2-D array representing the detector pixel array. Each 2-D PSF is centered on a pixel x,y, where x is the corresponding pixel column, and y is the pixel coordinate in the spatial direction. For a straight spectral trace parallel to the x axis, the y coordinate is the same for all PSFs.Footnote 2 A spectral image is thus produced as shown in Fig. 2. As shown, PSFs can not only be constructed from Airy functions, but also abberated PSFs provided by third-party software or from real instruments can be used. This is an improvement over EChOSim which used only Gaussian functions for the PSF. ExoSim can utilise non-linear wavelength solutions (e.g. from a prism) in additional to linear solutions (e.g. from a grating), whereas EChOSim modeled only linear solutions. The convolution with the PSF is performed in 2-D in ExoSim, rather than just 1-D in EChOSim. In practice, the PSF convolution is performed on a sub-pixelised grid, which ensures Nyquist sampling of the PSF, however for clarity we describe the algorithm using whole pixels. The degree of subpixelisation, the oversampling factor, is set by default to 3 (i.e. 9 subpixels per whole pixel), which Nyquist samples the full-width half maximum of most PSFs. The oversampling factor can be adjusted if desired in the Input Configuration File, although in most cases we expect the default factor of 3 to be adequate.

Fig. 2
figure 2

Spectral images in ExoSim, produced from the co-addition of 2-D PSFs. Top: Spectral image constructed from Airy function PSFs. Bottom: Spectral image constructed from an aberrated PSF generated from third-party software; with a few samples provided at different wavelengths, the intervening PSFs can be constructed by interpolation

As a result of the PSF convolution and spectral dispersion, each pixel will have a radiant power, Ppix(x,y). The detector is a 2-D pixel array, with a given quantum efficiency per pixel,Footnote 3QE(λ[x]), which affects the conversion rate of incident photons to electrons. The number of photoelectrons per second produced per pixel is then: Q(x,y) = Ppix(x,y)QE(λ[x])hc/λ(x), where h and c are the Planck constant and speed of light respectively, and λ(x) is the central wavelength of the pixel column, x. The final electron count rate will also depend on the intra-pixel response function. We take the 1-D intra-pixel response function described in Eq. 12 of Pascale et al. [26] and generalise this to 2-D (example shown in Fig. 3). After normalising this to a volume of unity, we convolve it with the detector array, such that each point on the convolved 2-D array gives the electron count rate over a pixel-sized area and incorporates the effect of the intra-pixel response function. Downsampling to whole pixel positions then gives the count rates for each pixel on the focal plane detector array.

Fig. 3
figure 3

2-D pixel response function used in ExoSim to simulate intra-pixel variation in responsivity. x and y axes are show distance in units of m; z axis shows the responsivity after normalizing the volume to unity

Compared to EChoSim, ExoSim can model both photometers as well as spectrometers. In a photometer there is no dispersing element and an image of the star is formed from all wavelengths over a given bandpass. A select number of wavelength-dependent 2-D PSFs are generated covering the wavelength band of the photometer, which are then coadded to the 2-D detector pixel array over the same location to produce the final photometric image.

Another source of photons is the thermal emission from telescope and channel optical surfaces. Like zodiacal light this is a diffuse source. ExoSim uses essentially the same algorithm for diffuse radiation as explained in Pascale et al. [26].

4.3 Timeline generator

An observation will attempt to follow the entire transit (or eclipse) event, usually including a similar duration of time out-of-transit. The observation is divided into a series of exposures, between which the detector array is reset. Within each exposure, the pixels will accumulate electrons as a ‘ramp’. Each exposure consists of non-destructive reads (NDRs) where snapshots of the accumulating count are read at various times up the ramp. In data reduction, an image per exposure is obtained, either through fitting the NDR ramp gradient or through last-minus-first processing (correlated double sampling), where the first NDR is subtracted from the last. Up-the-ramp fitting reduces read noise, while correlated-double-sampling removes reset noise. There may be ‘dead time’ within an exposure cycle due to reset time and detector idling, making it less than 100% efficient.

The Timeline generator module allows for dedicated control of time domain elements: the observational timeline, the timing of exposures, and the timing of exposure cycle elements, such as NDRs and detector ‘dead’ times. The focal plane detector array generated in Instrument is used to setup an 3-D array of NDR images against time. Multiplying by the integration time for each NDR, the pixel photoelectron counts per NDR can be generated. The time-dependent transit light curves are then applied to these counts.Footnote 4 Application of the transit light curve can be omitted if a simple out-of-transit simulation is needed.

Time-dependent instrumental systematics or astrophysical processes such as star spots, can bias or distort the light curve, and therefore impact on the signal variations captured in the NDR time series. The Timeline generator module allows ExoSim to interface with external time domain simulators that model such processes. Timelines of modulated wavelength-dependent variations in the signal from various processes can be produced from these external models and then applied to ExoSim light curves in this module. An example are the effects from stellar pulsation and granulation [37]. Another example is the effect of star spots and faculae on the light curve. We have developed a dedicated star spot simulator called ‘SpotSim’, described further below.

4.4 Noise

All observations will be subject to random statistical noise. These include Poisson noise processes, also known as ‘shot’ noise. Possion noise is generated from the target itself, as well as from the diffuse sources and from the dark current within each detector pixel. Read noise occurs from the uncertainty in the conversion of charge in the on-chip amplifier to analogue voltage. It can be modelled as a Gaussian distribution around the final pixel count. The above noise types are ‘white noise’ processes, where there is no correlation between different reads. Hence ExoSim applies these noise effects in a Noise module where the individual pixel counts per NDR are randomly adjusted to simulate the effect of these noise processes. For correlated noise, there is stastistical dependence between different measurements and thus a different approach is taken when simulating these. If the correlated noise is wavelength-dependent (but not dependent on spatial position) one way is to apply the effects of the noise on the light curves in the Timeline generator module. The noise timelines are generated by an external model of the correlated noise process. This approach was used to apply correlated noise from stellar pulsation and granulation in Sarkar et al. [37].

Pointing jitter is a type of instrumental correlated noise that is complex in origin, depending on several factors: the power spectrum of the jitter, integration time, source brightness, intra- and inter-pixel responsivity variations, and the application of apertures or bins (Fig. 4). ExoSim incorporates a jitter simulation subroutine which improves on that in EChOSim in the following ways. The EChOSim jitter simulation only considers spatial jitter noise, whereas ExoSim treats jitter in 2-D modeling both spectral and spatial jitter simulataneously. EChOSim used a parametric model prediction for the jitter noise, whereas in ExoSim jitter is simulated in a dynamical model which more accurately represents the physical process.

Fig. 4
figure 4

Factors affecting the noise from pointing jitter. In this simplified case, a 1-D signal (black line) is jittering over a 1-D detector array with pixels of varying quantum efficiency. A narrower aperture, larger QE variation, larger signal gradient and higher pointing jitter (left) results in more noise compared to a wider aperture, smaller QE variation, smaller signal gradient and lower pointing jitter (right)

Pointing jitter timelines are generated for x and y directions from a power spectral density (PSD) frequency spectrum. The PSD is randomised in amplitude and phase so that random jitter timelines for each axis are produced after inverse Fourier transformation. The time step duration in the jitter timeline is calculated to ensure Nyquist sampling of the jitter PSD. The convolved 2-D image array generated in the Instrument module is first interpolated (if required) to a higher spatial frequency to Nyquist sample the rms of the jitter in each axis. Downsampling of this array to a grid of positions corresponding to the central position of each whole pixel gives a 2-D array containing the count rate on each whole pixel. The code then iterates through all jitter time steps. Each jitter time step has associated jitter offsets in x and y directions. These spatial offsets are applied to the positions where the array is downsampled, so that the grid of points corresponding to whole pixel positions is shifted at each jitter step. This is equivalent to shifting the image relative to the pixels. Multiplying the count rates (in e/s) per pixel by the jitter time step duration gives the count (in e) per jitter time step. For each NDR, the cumulated counts per jitter time step are co-added to give a final NDR count. An inter-pixel variation in sensitivity (inter-pixel gain variation) is modelled (typically 5% rms) as a Gaussian distribution upon the baseline QE of the detector; this is an important factor in producing jitter noise and is applied to the NDR counts. Therefore, unlike EChOSim which accounted only for the photometric variation from jitter between integrations, ExoSim in addition captures the effects of jitter within the integration time. This manifests as blurring of the image.

Another innovation in ExoSim is the ability to simulate scanning mode observations (Fig. 5), as performed on the Hubble WFC3 IR instrument for exoplanet transit spectroscopy (e.g. Kreidberg et al. [18]). This is implmented by applying a sawtooth profile to the y jitter timeline.

Fig. 5
figure 5

Example of simulated scanning mode in ExoSim. ExoSim models the Hubble WFC3 IR instrument observing the super-Earth GJ 1214b. This exposure consists of 13 NDRs. The progression of the scan is evident in the sequence of NDR images

4.5 Output

ExoSim uses the same FITS file format for its output as EChOSim. In ExoSim however, the output files will contain multiple NDRs per exposure, which is closer to the usual raw image output of an instrument. EChOSim, in contrast, generated images per exposure that assumed post-processing had been performed, e.g read noise was reduced assuming up-the-ramp processing but without the production of actual ramps. Thus the output from ExoSim provides more realistic mock data for the validation of data reduction pipelines.

5 SpotSim

A dedicated star spot simulator has been developed for use with ExoSim: ‘SpotSim’ [34]. SpotSim interacts with ExoSim in the Timeline generator module, where it modulates the light curves produced in ExoSim. SpotSim models a spotted star surface, complete with wavelength-dependent limb darkening. A transiting planet is simulated, with the planet/star radius and orbital parameters obtained via the ExoSim inputs. Additional inputs for SpotSim include: 1) inclusion or exclusion of faculae, 2) spot coverage, 3) choice of spatial distribution (equatorial, longitudinal, latitudinal), 4) choice of spot area distribution (log-normal, uniform).

The planet transit light curve is obtained by moving the planet in small time steps across the stellar surface. The resulting light curve will capture the effects of both occulted and unocculted star spots. In the Timeline generator module, these SpotSim light curves are applied as corrections to the unspotted ExoSim light curves, which therefore incorporate these effects and are used downstream in the remaining ExoSim simulation [34]. SpotSim can also model faculae, which have opposing effects to spots. The unspotted photosphere, spots and faculae, have different brightness spectra respectively, with spots being colder than the photosphere, and faculae being hotter. The modeling of faculae currently does not include limb distance brightness dependency [24]. The spatial distribution of spots can be modelled as random uniform, clustered as a latitudinal band, or clustered into longitudinal groupings (Fig. 6). The size distribution of the spots can be modelled as a log-normal distribution [41] (Fig. 6a), or as fixed sizes (Fig. 6b). The effects of spots and faculae will be to distort the transit light curve (Fig. 7) and potentially bias the final recovered transit depth. This can manifest on the final reconstructed spectrum as a wavelength-dependent bias, the level of which will depend on many factors, such as size of spots and faculae, the balance between occulted and unocculted features, the spot filling factor, the stellar class, relation of the transit chord to the spatial distribution, etc. The effects of spots and faculae are thus complex and their impact depends on many factors. A simulator like ExoSim with SpotSim can thus help to understand the impact of spots and faculae on recovered planet spectra under different conditions. The ExoSim data output, which will contain spot-contaminated time series spectra, can also be used as test data for spot correction algorithms in data pipelines.

Fig. 6
figure 6

Example stellar surface simulations from ExoSim’s dedicated star spot simulator, ‘SpotSim’. Limb darkening is included, and both spots and faculae can be modelled. Spots are shown as blue areas, and faculae as red. Different spatial and size distribution cases are shown. a) Uniform random spatial distribution with log-normal size distribution. b) Uniform random spatial distribution with fixed size distribution. c) Latitudinal spatial distribution. d) Longitudinal grouping spatial distribution

Fig. 7
figure 7

Example of SpotSim planet transit chord (top) with resulting effect on an ExoSim-simulated spectral light curve at 2.9 μ m (bottom) due to star spot and faculae occultations. In the right figure, red dots show the light curve in the unspotted case, and blue dots in the spotted case. The blue line is a model curve fit to the spotted light curve, showing an underestimation of the true transit depth

The fidelity of SpotSim is considered first-order, since many unknowns exist regarding spots and faculae on other stars such as their spatial and size distribution patterns and temperatures. In unconstrained cases, we can use SpotSim to assess the additional uncertainties that could result from spots and faculae to a first order using the characteristics of the star to predict certain parameters (such as spot temperature) and considering different possible size and spatial distributions. If some of the parameters can be constrained for individual cases, e.g spatial distributions through Doppler imaging studies, then SpotSim can be used to assess a more specific impact on the light curve.

6 Validation

6.1 Validation of focal plane signal

ExoSim simulations were performed without noise for an out-of-transit observation of the star 55 Cancri. The star was modeled using a Planck black body spectrum (T = 5196 K) rather than using a PHOENIX spectrum, since this permitted an easier comparison with the validating equation which also uses a black body function. The instrument model included a 0.5 m primary mirror telescope incorporating a grating spectrometer with F-number = 18.5. From the focal plane image produced in the Instrument module, the summed photoelectron count rates per pixel column, x(λ), were obtained. The signal is expected to agree with the following equation:

$$ Q(x) = \pi B_{\lambda(x)}(5196 K)\left( \frac{R_{s}}{D} \right)^{2}A_{tel} \eta(x) QE(\lambda[x]){\Delta}\lambda(x) \frac{\lambda(x)}{hc} $$
(1)

where Q(x) is the photoelectron count rate in pixel column x(λ), Bλ(x)(5196K) is the Planck function of the star, η(x) is the total optical transmission, and Δλ(x) is the wavelength interval over the pixel column width. In this simulation η(x) was set to a constant value of 0.44, and QE(λ[x]) was set to a constant value of 0.55. The comparison between the ExoSim signal and the prediction from (6.1) is shown in Fig. 8 (where the count rates are plotted against the wavelength of the corresponding pixel column). The wavelength range examined (2.1-3.4 μ m) avoids regions at the extreme edges. At the extreme edges of the image array in ExoSim, the signal will be artefactually low due to PSFs not being added right to the edges (the PSF closest to the edge is added such that it is entirely within the image array). This does not normally impact simulations since in most cases instrument transmission bands will not extend to the extreme edges of the array. The lower plot in Fig. 8 shows that the percentage difference of ExoSim from the equation prediction is within 0.003%. For a 1 sec integration the maximum absolute difference is 0.4 e, and for 300 sec, this is 120.3 e. In both cases the differences are well within the 1σ limit due to Poisson noise. This shows that under these highly matched conditions ExoSim generates its focal plane image replicating the expected count to high level of agreement.

Fig. 8
figure 8

ExoSim focal plane signal compared to analytical prediction. Upper plot: signal/time vs wavelength where the x-axis shows the wavelength on a pixel column. Lower plot: Percentage difference

6.2 Validation of uncorrelated noise

The same instrument and star models as in Section 6.1 were used in ExoSim, and an out-of-transit observation simulated, where the exposure cycle consisted of two NDRs. Following the correlated double sampling (CDS) method, the first NDR is subtracted from the final NDR, to give a final CDS exposure. The noise variance was obtained for the summed count in each pixel column x, and divided by the CDS time to obtain a noise variance per unit time in e/s. The results were obtained for different types of uncorrelated, ‘white’ noise sources in isolation: photon noise from the target star, dark current noise and read out noise, and for all these noises sources combined. The results were tested against predictions from (2) to (5).

$$ {\sigma_{photon}}^{2}(x)/t = Q(x) $$
(2)
$$ {\sigma_{dark}}^{2}(x)/t = N_{pix}I_{dc} $$
(3)
$$ {\sigma_{cds}}^{2}(x)/t = (2N_{pix}{\sigma_{ro}}^{2})/t $$
(4)
$$ {\sigma_{total}}^{2}(x)/t = [{\sigma_{photon}}^{2}(x) + {\sigma_{dark}}^{2}(x) + {\sigma_{cds}}^{2}(x)]/t $$
(5)

where Npix is the number of pixels in the column (in this case 64), and Idc is the dark current on a pixel (in this case 20 e/s). σphoton is the target star photon noise, σdark is the dark current shot noise, σro is the read out noise for a single pixel read (in this case 20 e), and σcds is the read out noise for a CDS exposure per pixel column.

Since ExoSim uses stochastically-generated values in a dynamic simulation, the noise variance measurements reflect these random variations from pixel column to pixel column. Figure 9, upper plots, show the stochastic ExoSim noise in comparison to predicted noise based on the above equations. To ascertain the fidelity of these noise results, we calculate for each case the variance on the sample variance.Footnote 5 For a sample of n values taken from a population, the variance on the sample variance \(\left \langle s^{2} \right \rangle \) is given by :

$$ Var(s^{2}) = \frac{(n-1)^{2}}{n^{3}} \mu_{4} - \frac{(n-1)(n-3)}{n^{3}} {\mu_{2}}^{2} $$
(6)

where μ4 and μ2 are the 4th and 2nd moments of the distribution respectively. In these simulations we run 500 exposures to obtain the results per wavelength, and thus n = 500. If the mean sample variance is \(\left \langle s^{2} \right \rangle \), which we take to be the predicted noise variance, then \(\mu _{2} = n \left \langle s^{2} \right \rangle / (n-1) \). For Poisson noise (stellar photon noise, and dark current noise)Footnote 6μ4 = μ2(1 + 3μ2). For read noise and total noise, we obtain a value for μ4 numerically assuming a Gaussian distribution of standard deviation \(\sqrt { \left \langle s^{2} \right \rangle }\). In the middle plots in Fig. 9, we show the residual differences compared to 1 and 3σ limits where σ is \(\sqrt {Var(s^{2})}\). This shows that the differences mostly lie within the 1σ limit and virtually all are within 3σ. Thus the variation of ExoSim values from the prediction is consistent with the expected statistical variation on the noise. The lower plots show that the predicted value of V ar(s2) is consistent with that measured from the ExoSim data. In the case of dark current and read noise, the measured variance on all the residuals is obtained and compared to V ar(s2), and these two single values represented as two lines covering the wavelength range. The measured variance is within 6.4% of the prediction for dark current noise and 0.8% for read noise. Since the noise is wavelength- dependent for stellar photon noise and total noise, for the measured variance we use a rolling sample of 40 values across the wavelength band (centred on a given wavelength), obtaining the variance for each sample vs wavelength. This is then compared to the V ar(s2) prediction. We find that the measured variance for stellar photon noise has a mean difference of 16.7% with the prediction, and for total noise this is 8.7%.

ExoSim can simulate full ramps with multiple NDRs as well as CDS reads. Compared to CDS, up-the-ramp (UTR) reads where slopes are fitted to the NDR counts, the photon (Poisson) noise is increased slightly, and the read noise reduced. For n NDRs, the photon noise is expected to increase by a factor of \( \sqrt {(6/5)(n^{2}+1)/(n[n+1])}\) and the read noise expected to decrease by a factor \(\sqrt {6(n-1)/(n[n+1])}\) [30]. Figure 10 shows a comparison between CDS noise and UTR noise. A simulated OOT observation of GJ 1214 was performed, where in full ramp mode each NDR lasted 1.5 seconds, and each exposure consisted of 46 NDRs (n = 46) up-the-ramp. In a second simulation using the same instrument model, CDS mode was selected, so that only the first and last NDRs are simulated. This set up was first run for photon noise only, and then read noise only. To convert the UTR slope values to units of e from e/s, the slope gradients were multiplied by the integration time (the duration of the last NDR minus the duration of the first NDR). CDS exposures were already in units of e. 500 exposures were simulated in each case, processed through a pipeline and binned to R= 100. The standard deviation (noise) of the timelines in each spectral bin were obtained. The results were averaged over 50 realisations for each case and are shown in Fig. 10. The predicted UTR noise is the average CDS noise multiplied by the factors given above. We find that for photon noise the difference with the prediction is within 8 e and for read noise this is 0.5 e. Both these differences are well within the expected statistical standard deviation of the noise based on the above variance on the sample variance calculation. This shows that ExoSim can simulate the noise from up-the ramp reads to high fidelity.

Fig. 10
figure 10

CDS and UTR noise compared from ExoSim simulations. Each point is the average of 50 realisations. Left: photon noise. Right: read noise. Lower plots give the difference between the prediction for UTR noise and the measured noise

6.3 Validation of pointing jitter model

The mechanism of the jitter code was tested for validity against a prediction from an analytic expression. To accomplish this, ExoSim was run under a simplified set of conditions: 1) a focal plane with all pixels having an identical flat response (i.e. no intra- or inter-pixel variations), 2) a source consisting of a monochromatic beam with a Gaussian PSF with full-width-at-half-maximum (FWHM) of 2.27 pixels, 3) jitter with a flat PSD giving a jitter timelines in each axis with standard deviation, rmsjit. The beam was centered on a central pixel, given a pixel coordinate (0,0), and allowed to jitter in 2-D around this. The count on this pixel at each jitter time step was monitored and a timeline of counts obtained. Pixels at progressively further distances from the central pixel were similarly monitored. These were pixel (1,1) (i.e. one pixel row above and one pixel column to the right of the central pixel), pixel (2,2) and pixel (3,3). The measured counts were all normalised to the central pixel count obtained with a stationary beam. The standard deviation of the normalised counts in these pixels resulting from the jittering beam were compared to a prediction from (7). This equation is based on a second order Taylor expansion, and gives the expected variation in the value (‘count’) of a 2-D Gaussian function (of unity maximum) as the measurement point is jittered in 2-D.

$$ \sigma_{(a,a)} \approx \left[ \exp(-a^{2}/2s^{2}) \left( 4{rms_{jit}}^{4}\left[\frac{a^{2}-s^{2}}{s^{4}}\right]^{2} + {rms_{jit}}^{2}\left[\frac{2a^{2}}{s^{4}}\right] \right) \right]^{1/2} $$
(7)

where σ(a,a) is the standard deviation of the count predicted at position (a,a), and s is the standard deviation of the 2-D Gaussian (s ≈ FWHM/2.355). This expression will be most accurate for small values of rmsjit. The predicted values of σ(a,a) for a 2-D Gaussian matching the ExoSim beam were calculated and compared to the ExoSim results. The comparison is shown in Fig. 11, top. In this figure, σ(a,a) is the standard deviation of the normalised counts in each ExoSim pixel (a,a), and is compared to the predicted standard deviation from (7). ExoSim matches the prediction best at lower values of rmsjit. If rmsjit is 0.1 pixels, ExoSim is within 5% of the prediction for a < 3. However the percent deviations worsen with larger values of rmsjit, as may be expected due to the Taylor approximation, and are worst at a = 3.

Fig. 11
figure 11

ExoSim jitter noise compared to predictions for pixel positions shown. Top: ExoSim compared to (7). Bottom: Exosim compared to independent simulation. Grey plots show percentage difference of ExoSim from the prediction

To assess validity of the ExoSim jitter simulation at higher values of rmsjit, we performed a separate computer simulation of a 2-D Gaussian function, matched in size to the ExoSim beam, with a maximum of unity. The 2-D Gaussian was sampled randomly 106 times around points (a,a) [a= 0,1,2,3], with a standard deviation of rmsjit in each axis. The resulting standard deviations of the sample values, σ(a,a), at each point (a,a) were compared with ExoSim results, as shown in Fig. 11, bottom. We obtain a good agreement upto the higher values of a and rmsjit, ExoSim always being within 3% of the independent simulation. These results verify the accuracy of the baseline mechanism used in ExoSim’s pointing simulation.

6.4 ExoSim vs ESA radiometric model

The European Space Agency Radiometric Model (ERM) was developed independently as simulator for the EChO and Ariel missions [29], [48]. Unlike ExoSim, the ERM does not perform a dynamical simulation, but a ‘static’ one, using a set of parametric equations to obtain an estimate of signal and noise for transiting exoplanet observations. Results are rapid which make it ideal for estimating the SNR of a large number of exoplanet observations in a survey, as demonstrated on the Ariel target list [52]. A disadvantage of static simulators is that since they do not model the time domain directly, they cannot capture the effects of correlated noise or time-dependent systematicsFootnote 7. Obtaining similar results across different simulators for the same conditions adds confidence to the results of both simulators. We therefore set up a cross-validation test to see if similar results were obtained between ExoSim and the ERM.

The same stellar parameters and Phoenix models were used in both simulators for the stars 55 Cancri (a bright source) and GJ 1214 (a dim source). An early iteration of the ARIEL instrument description was modeled in each simulator. This was a modified Offner grating design with 2 infra-red channels between 1.9 and 7.8 μ m, with the final spectra binned to R= 100 and R= 30 in each channel respectively. This design was from the Ariel proposal document [1], and does not represent the Phase A design of the instrument [48].Footnote 8 However, for the purpose of cross-validating the two simulation tools this approach is adequate.

Noise sources included in both simulators were the photon noise from the target star and the dark current shot noise. In ExoSim, the average noise variance per unit time per spectral bin was obtained over 50 realizations of a noisy out-of-transit observation of each star. Each observation consisted of 500 exposures. The images were processed including an aperture mask of width 2.44 f λ (to minimise dark current noise)Footnote 9, where (f ) is the image space F-number. The ERM performed an equivalent calculation of photon noise and dark current noise, assuming the same aperture. The final results are shown in Fig. 12, bottom. The ExoSim average noise variance is always within 5% of the ERM for both 55 Cancri and GJ 1214. We again use the variance on the sample variance calculation to obtain an expected 1σ error on the ERM variance, where σ is \(\sqrt {Var(s^{2})}\). We find that the differences in noise variance are well within this limit indicating a good level of agreement across the two simulators. To compare the signal obtained in each simulator, noiseless simulations were run in both ExoSim and the ERM. No apertures were used or assumed. The signal per unit time per spectral bin was obtained for both 55 Cancri and GJ 1214 using each simulator. The results are shown in Fig. 12, top. The ExoSim signal is within 2% of the ERM for 55 Cancri, and within 4% for GJ 1214.

Fig. 12
figure 12

ExoSim and ESA radiometric model cross-validation test. Top: signal per unit time. Bottom: noise variance per unit time. Results are binned to R = 100 for 1.9-3.9 μ m and R = 30 for 3.9-7.8 μ m. The ExoSim noise results show the average for 50 realizations. Grey plots show percent difference of ExoSim from the prediction. Lower plots show the absolute difference compared to the predicted noise on the ERM variance

6.5 ExoSim vs Hubble WFC3

We next validated ExoSim against Hubble WFC3 measurements by implementing a WFC3 IR simulation [34] within ExoSim. ExoSim simulated a primary spectroscopic transit of the super-Earth, GJ 1214 b. The results from ExoSim were compared to results from two published transit spectroscopy studies with WFC3 IR that observed GJ 1214 b: [5] (hereafter B12) observing in staring mode, and [18] (herafter K14) observing in spatial scanning mode [21]. GJ 1214 was simulated with a PHOENIX model spectrum (T = 3000 K, logg= 5.0, [Fe/H]= 0) and all remaining stellar and planet parameters obtained via the Open Exoplanet Catalogue [32].

The Hubble WFC3 IR instrument was modelled in ExoSim using transmissions obtained from publically available synphot files [9] . Other instrument parameters were obtained or derived from the WFC3 data and instrument handbooks [8, 10]. These included a reciprocal linear dispersion of 0.00025 µm/µm, plate scale of 0.13 mas/pixel, dark current of 0.048 e/s, and a read noise of 14.7 e per pixel [34]. The F-number used to generate the PSFs is adjusted per wavelength to match the model PSF FWHM values in Dressel [10]. In the current model we have not taken into account the small variations in the wavelength solution with row due to the geometric distortion of the focal plane.

Firstly, we compare the focal plane spectral image count rates, in e/s per pixel column, from ExoSim to those published in B12 (Fig. 13).Footnote 10 Over the range 1.10-1.67 μ m we find that the ExoSim spectrum is sometimes higher and sometimes lower than the B12 spectrum, averaging 2% lower, with a peak-to-peak variation of + 8 to -11%. Considering that a model is being compared to the real star and instrument, the spectral shape and count rates are remarkably similar. Integrated over all pixel columns, the total photoelectron count is 2.665x106 e/s for ExoSim, compared to 2.707x106 e/s for B12, a 1.6% difference.

Next, ExoSim was compared to results from K14. The ExoSim simulation was performed with observational parameters closely matched to those used in K14: spatial scanning at 12\(^{\prime \prime }\)/s, 90 s integration time per exposure,Footnote 11 160 s cadence and 12 subexposures per exposure (where a subexposure is the difference image between adjacent NDRs). The count per exposure was obtained by summing the subexposure counts. The characteristic ‘ramp’ systematic due to detector persistence, and gaps in data due to Earth occultation were not simulated. ExoSim utilized the same linear wavelength-dependent limb-darkening coefficients obtained in K14,Footnote 12 with the average (0.2674) used outside the published wavelength range. A flat planet transmission spectrum (consistent with known results for this planet) was used, with (Rp/Rs)2 = 0.0135. 20 realizations of this simulation were performed.

We first compare the absolute photoelectron counts in the ‘white light’ curves, i.e. the full array counts per exposure. We find that the average out-of-transit photoelectron count from ExoSim to be 2.35 x 108 e, compared to 2.34 x 108 e from orbits 2 and 4 in K14. ExoSim is thus within 0.5% of the K14 value. This is within the reported 1% peak-to-peak stellar variability for the parent star in the visual range [4]. One of the 20 ExoSim realizations is shown along side the K14 data in Fig. 14, top.

Fig. 14
figure 14

Comparison of white light curves from ExoSim and K14. Shown are the results from one of 20 ExoSim realizations. K14 data was obtained by resampling the published graph with WebPlotDigitizer [33]. Top: absolute photoelectron counts. Bottom: normalized light curves (with systematic correction in K14). Only orbits 2-4 are shown from K14

Next we compare the white light curves from ExoSim normalised to the out-of-transit signal, with the example from K14. In K14, in addition to out-of-transit normalisation, the ‘ramp’ systematic is also detrended (Fig. 14, bottom). The photometric noise on the residuals matches closely: 70 ppm reported in K14, and 68 ± 7 ppm from ExoSim.Footnote 13 Comparing the K14 transit curve to one of the 20 ExoSim simulations shown in Fig. 14 (bottom), visually there is a marked similarity in the transit profiles.

To further evaluate the similarity of the transit light curves, the 17 data points from K14 orbit 3 (the partially transiting portion of the normalised white light curve) were compared to the 17 time-matched data points in each of the 20 ExoSim simulations, using the two-sample Kolmorov-Smirnoff (K-S) test. The two-sample K-S test is used to test the null hypothesis that the two samples come from the same distribution. The test was performed for each of the 20 ExoSim simulations. An average K-S statistic was obtained of 0.16 ± 0.04, and an average p-value of 0.9 ± 0.1. Assuming a significance level of 5% (α = 0.05) for rejection of the null hypothesis, these results fail to reject the null hypothesis that the two distributions are different, and are thus supportive of the similarity between the ExoSim and K14 light curves.

We have therefore found a number of metrics in good agreement between ExoSim and the real data from these two studies. This gives us additional confidence in the accuracy of the complete end-to-end ExoSim simulation.

7 Computational requirements

ExoSim can be run on most laptop and desktop computers. It has been tested on machines ranging from 4Gb RAM / 1.8 GHz CPU with 4 virtual cores, to 64Gb RAM / 3.5 GHz CPU with 12 virtual cores. Simulation times increase as the size of the image stack produced (i.e. the 3-D array containing the time series of images) increases. This can occur in the setting of increased size of the detector array, or increased number of total NDRs. Large numbers of exposures may occur for bright targets with short exposure times (to avoid detector saturation) resulting in more exposures being produced, and this can increase simulation time compared to a dim target if CDS mode is being used. In addition if CDS is not used, then many intermediate NDRs will be simulated that add to the size of the image stack and will tend to slow the simulation time. In such situations we recommend using higher RAM and CPU specifications. Another simulation parameter that impacts speed and memory is the level of jitter. For non-zero jitter, the smaller the jitter rms, the more the focal plane image is upsampled (in order to Nyquist sample the jitter rms), resulting in a larger temporary array during the jitter simulation. This can slow the simulation and add to memory pressure. Longer jitter timelines, either from longer simulations or more steps due to higher maximum PSD frequency, can also slow the run time of the jitter code. Table 2 gives indicative running times for single transit simulations where the transit time is 0.889 hours, (with total simulated observing time twice this duration). We vary the exposure time (and hence number of exposures), and the jitter rms, giving indicative run times for two machines with different specifications. The advantage of higher RAM and CPU specification becomes most apparent when the number of exposures is high and the rms is small. The simulations were run with all sources activated and in CDS mode. The times shown were for simulation only and did not include any time in data reduction.

Table 2 Time to complete single transit simulation under different simulation conditions and with different machine specifications

8 Limitations and future improvement

The time domain simulation in ExoSim results in simulation times that can vary widely from a few seconds to minutes, to potentially hours depending on the simulation circumstances as described above. In addition, if Monte Carlo simulations are used, then the overall time to complete a simulation is increased and dependent on the above factors as well as the number of realisations. Future upgrades can consider using parallelisation routines as well as optimisation of Python methods to improve speed. Improving speed and reducing memory limitations will result in increased usability of the code and greater application of Monte Carlo simulation. Alternate approaches to the jitter code are being explored to improve speed such as convolution with kernels to represent the jitter over the NDR duration rather than looping through each jitter time step. The pointing jitter algorithm generates a timeline from a single randomisation of the PSD at the beginning of the simulation, whereas in reality the PSD is changing itself dynamically. A dynamically changing PSD would be extremely complex to implement algorithmically, however we have simulated the effects of varying high frequency jitter by splicing together timelines from two different PSDs, and other approaches such as a time-varying envelope function can be considered. The range of systematics in ExoSim is actively being increased with non-linearity and hot pixel effects being prototyped. We are also investigating how processes such as cosmic rays and persistence can be added to the simulator. Such additional systematics will require a corresponding pipeline steps to be developed. Systematic processes could exist that pertain especially to specific instruments. In such situations instrument-specific modifications or modules may be need to be developed and added. As characteristics of specific instruments become known, such as their flat-field characteristics and dark current distributions, these may be incorporated into the simulator with some simple modifications. As such ExoSim serves as an excellent generic starting point for simulating observations from putative instruments, but as these instruments become better characterised, dedicated modifications and functions may be need to be developed to simulate specific systematics.

9 Conclusions

In this paper we presented ExoSim, a generic simulator of exoplanet transit spectroscopy observations. ExoSim has been extensively validated and its end-to-end simulation is able to reproduce results obtained from an existing instrument, the Hubble WFC3 IR instrument. ExoSim is accurate to within about 5% of most comparisons. This gives confidence in the further use of ExoSim to simulate signal and noise in a variety of instruments and for many different purposes. We also have demonstrated in this paper how ExoSim can be used to model different instruments, both future and existing. ExoSim has already been applied in the performance evaluation and design of the Ariel instrument [36], and in finding the noise impact from stellar pulsations and granulations [37]. ExoSim has played a key role in validating the Ariel mission science feasibility [12, 27, 48, 52]. It has also been used to assess noise and feasibility in the Twinkle space mission [11] and the EXCITE balloon mission concept [23]. A derivative of ExoSim, an independent development which has been modified and optimized for the James Webb Space Telescope, called JexoSim, has also been produced [38].

With its dedicated star spot simulator, ExoSim can be used to assess the impact of spots and faculae on transit spectroscopy observations. Since ExoSim can also model existing instruments it can potentially be used to verify the error bars obtained from previously published studies. ExoSim, through its data products, can be used to test the development of data reduction pipelines. ExoSim is thus a highly versatile tool that will aid the future development of exoplanet transit spectroscopy.