A Distinct Population of Small Planets: Sub-Earths

and

Published 2021 March 25 © 2021. The American Astronomical Society. All rights reserved.
, , Citation Yansong Qian and Yanqin Wu 2021 AJ 161 201 DOI 10.3847/1538-3881/abe632

Download Article PDF
DownloadArticle ePub

You need an eReader or compatible software to experience the benefits of the ePub3 file format.

1538-3881/161/4/201

Abstract

The sizes of small planets are known to be bimodal, with a gap separating planets that have lost their primordial atmospheres (super-Earths) and the ones that retain them (mini-Neptunes). Here, we report evidence for another distinct population at smaller sizes. By focusing on planets orbiting around GK dwarfs inward of 16 days and correcting for observational completeness, we find that the number of super-Earths peaks around 1.4 Earth radii and disappears shortly below this size. Instead, a new population of planets (sub-Earths) appears to dominate at sizes below ∼1 Earth radius, with an occurrence that increases with decreasing size. This pattern is also observed in ultra-short-period planets. The end of super-Earths supports earlier claims that super-Earths and mini-Neptunes, planets that likely form in gaseous protoplanetary disks, have a narrow mass distribution. Sub-Earths, in contrast, can be described by a power-law mass distribution and may be explained by the theory of terrestrial planet formation. We therefore speculate that they are formed well after the gaseous disks have dissipated. The extension of these sub-Earths toward longer orbital periods, currently invisible, may be the true terrestrial analogs. This strongly motivates new searches.

Export citation and abstract BibTeX RIS

1. Introduction

Most extrasolar planets known to date are likely formed in gaseous protoplanetary disks. These include the Jovian planets, super-Earths, and their siblings, mini-Neptunes—the latter two are sometimes jointly called "Kepler planets" for short (e.g., Borucki et al. 2010; Lopez et al. 2012; Wu & Lithwick 2013; Fulton et al. 2017). However, there is little evidence for the existence of terrestrial planets outside our own solar system. Terrestrial planets are thought to have formed from collisional debris, well after the dispersion of protoplanetary disks. We term these Generation II planets, to distinguish them conceptually from the first group (Generation I planets).

Currently, evidence for Gen-II planets are best sought using data from the Kepler mission. The final Kepler mission transit planet search (DR25; Twicken et al. 2016) includes nearly 200,000 stellar targets and yields ∼4700 planet candidates. Such a large sample allows us to examine the exoplanet populations in detail. A majority of these candidates are super-Earths and mini-Neptunes. It has been convincingly shown that these two populations are separated by a radius gap (Fulton et al. 2017; Van Eylen et al. 2018), readily explained by the theory of photoevaporation (Lopez et al. 2012; Owen & Wu 2013, 2017; Jin et al. 2014), or the theory of core-powered mass loss (Ginzburg et al. 2018). As such, the two populations have the same origin, with super-Earths being former mini-Neptunes that happen to lie too close to their host stars. Moreover, the sizes of these planets appear to be narrowly distributed 1 and increase with the masses of the host stars (Fulton & Petigura 2018; Wu 2019; Cloutier & Menou 2020).

On the other hand, the initial aim of Kepler, Earth-like planets, remains elusive. While there are a smattering of small planets at short orbital periods that are Earth-sized, Kepler has a very low survey completeness for such planets at astronomical-unit distances. It is also unclear if the close-in ones are a low-mass extension of super-Earths (and are therefore Gen-I planets) or are a distinct population. This is the very question we set out to answer in this work.

Our effort is made possible thanks to two recent advances. The planet detection efficiency of the DR25 pipeline has now been characterized for every star by injection tests at the flux level (Burke & Catanzarite 2017a) and characterized for the group as a whole by injection tests at the pixel level (Christiansen et al. 2020). Meanwhile, uncertainties in the radii of Kepler Objects of Interest (KOIs) have now been dramatically reduced using Gaia DR2 stellar radii (Berger et al. 2020a).

In this work, we focus on planet candidates that orbit around GK dwarfs, with periods shortward of 16 days. We are able to show that super-Earths, like mini-Neptunes, have a narrow size distribution. Their occurrences fall off rapidly below a peak. This feature enables us to detect the rise of a new population at smaller sizes, sub-Earths. The size distribution exhibits clearly an inflection point between super-Earths and sub-Earths, most likely a gap.

Results similar to ours, that there is a new population of small planets, have been obtained or alluded to previously. Neil & Rogers (2020) found that Kepler data are best explained by three populations of exoplanets: gaseous (mini-Neptunes), evaporated cores (super-Earths), and intrinsically naked. However, their joint distribution shows neither an inflection point nor a gap, as we find here. In a different approach, Hsu et al. (2019) studied the planet occurrence rates across a wide range of orbital periods and planet radii. Their radius bins are broader than ours here, so any possible gap feature is somewhat smeared. They do, however, report an excess of small planets, consistent with our results here.

Both of these works aim to determine the more general landscape of small planets, while our work drills down to one specific question. This allows us to focus on the most revealing parameter space and to slim down the model. We are also able to test the robustness of our results by experimenting with different sample selections and completeness functions.

2. Planet and Star Samples

Starting from the Kepler DR25 catalog of 200,038 host stars and 4034 planet candidates (Thompson et al. 2018; excluding false positives, NASA Exoplanet Archive), we introduce a number of cuts to obtain a suitable sample for this study. We give our justifications for these cuts and list the resulting sample sizes in Table 1. In Section 5.1, we explore how results may be impacted by different sample cuts.

  • 1.  
    Gaia-able: We only consider stars that are in the Gaia-Kepler Stellar (GKS) Catalog (Berger et al. 2020b). In addition, we follow Berger et al. (2020a) to exclude stars with a Gaia DR2 renormalized unit-weight error larger than 1.2 and with isochrone-derived goodness-of-fit parameters lower than 0.99. These cuts can remove binary stars and ensure the reliability of parameters. Compared to the much larger radius uncertainty in the KIC catalog (typically 25%; Fulton et al. 2017), the GKS catalog employs Gaia parallaxes to achieve a much lower uncertainty of ∼4% in stellar radius (Berger et al. 2020b). This allows us to separate planets into smaller radius bins and to observe the fine details in planet occurrence.
  • 2.  
    GK Dwarf: It has been noticed before (Fulton et al. 2017; Wu 2019; Berger et al. 2020a; Cloutier & Menou 2020) that the sizes of both super-Earths and mini-Neptunes rise systematically with stellar mass. Such a trend is as yet unexplained. For our purpose, this tends to smear any features in the planet size distribution. This leads us to focus only on GK dwarfs within a stellar radius range of R* ∈ [0.75R, 1R] (and a similar stellar mass range). It is beneficial to study smaller stars, as small planets are more readily detectable around them. This choice of GK dwarfs also ensures a sufficiently large sample. Finally, we choose our sample based on the stellar radius rather than stellar mass as provided by Berger et al. (2020b). The former has a median uncertainty of 4%, while the latter is larger at 7%.
  • 3.  
    Brightness: We discard stars with Kp magnitudes above 15. Very few Earth-sized planets are discovered around dim stars, and both the false-positive probabilities and radius errors rise for planet candidates around these stars (Petigura et al. 2013).
  • 4.  
    NoisyTargetList is a list of Kepler stars with noise too high to apply the detection efficiency model (Burke & Catanzarite 2017b). Thus, we exclude stars on this list.
  • 5.  
    False Positives: we remove planet candidates with a false-positive probability FPP > 0.1, based on the false-positive probability (FPP) table produced by Morton et al. (2016). This alleviates confusion by eclipsing binaries. For the remaining ones, we assume a reliability of 100% (more on this below).
  • 6.  
    Planet Radius: we remove large planets with radii above 4R and small planets below 0.6R. Below 0.6R, the detection efficiency drops too low to allow any secure statement.
  • 7.  
    Period range: a major distinction in this study is our focus on short-period planets, inward of 16 days. By zooming in to the region where small planets can potentially be discovered (detection efficiency above ∼10%), we contain the uncertainties inflated by rare detections. While this reduces our planet sample, it actually helps to increase our sensitivity to small planets.

Table 1. Star and Planet Sample Cuts

Cut np N*
Exoplanet Archive4034200,038
Gaia-able3270162,657
0.75 < R*/R < 1111234,399
Kp ≤ 1559013,498
NoisyTargetList58013,297
FPP < 0.1505
0.6 ≤ Rp ≤ 4R 469
1 ≤ Porb ≤ 16298

Download table as:  ASCIITypeset image

Eventually, we are left with 298 planet candidates and 13,297 qualified stars. In the following, we consider the pipeline completeness for this sample.

3. Completeness

For every planet in the Kepler field, its detection depends on a number of factors: geometric probability of transit, pipeline efficiency in identifying the transit as a threshold-crossing event (TCE), and loss during the vetting process. In the following, we discuss the last two factors, together with the issue of reliability.

3.1. Pipeline Detection Efficiency

By injecting simulated transit signals into real light curves and then recovering them using the same search pipeline, one can determine the so-called "pipeline efficiency" as a function of planet size, orbital period, and stellar properties. The Kepler team has published multiple studies of pipeline completeness (e.g., Christiansen et al. 2016; Thompson et al. 2018). The most recent ones are those by Burke & Catanzarite (2017a) and Christiansen et al. (2020) for the DR25 pipeline. The Christiansen et al. (2020) work is based on the so-called pixel-level transit injection tests. Limited by the computational capacities, each Kepler star only receives one planet injection, and the resulting completeness is an average over the entire star sample. In contrast, the much simpler flux-level tests (Burke & Catanzarite 2017a) can inject many different signals into a given star, and one can survey a range of planet parameters (radius and period) on a per-star basis, to obtain the detection efficiency, DE(p, r, θ*), as a function of the orbital period, planet radius, and host-star properties. Such a model captures the target-to-target variations and is recommended for studies involving a small subset of Kepler targets, as is the case here. We will adopt this model and use the KeplerPorts python code (Burke & Catanzarite 2017b) to generate the pipeline efficiency.

In detail, KeplerPorts calculate the detection efficiency, DE(P, Rp , θ*), based on the so-called multiple-event statistics, $\mathrm{MES}(P,{R}_{p},\bar{\theta })$ (Jenkins 2002; Christiansen et al. 2012), with $\bar{\theta }$ being stellar parameters (R*, $\mathrm{log}g$ and Teff). This statistic is closely related to the transit signal-to-noise ratio. The detection efficiency is also empirically found to correlate strongly with the so-called CDPP slope, the temporal behavior of the stellar noise. This dependence is taken into account by the Burke & Catanzarite (2017b) model, based on injection experiments. Furthermore, the method of "MES Smearing" is applied to transits with nonzero impact parameters.

The resultant detection efficiencies are then averaged over the N* = 13,297 GK dwarfs in our sample. For the radius and period ranges of interests, these efficiencies are plotted in the left-hand panel of Figure 1. One observes that, inward of 16 days, transiting planets larger than 1.3R can be detected fairly completely (DE ≥ 90%), and the efficiency drops below ∼50% for planets smaller than 0.9R. The right-hand panel presents the overall efficiency, η(p, r), one that also includes the geometric transit probability. This is calculated using Equation (1):

Equation (1)

where Pgeo,i = 2R*,i /a, and we adopt an estimate for the star mass from Berger et al. (2020b).

Figure 1.

Figure 1. Detection completeness. The left panel shows the averaged pipeline detection efficiency for the GK dwarfs in our sample, as a function of planet radius and period, obtained using the Burke & Catanzarite (2017a) model. Within 16 days, nearly all transiting planets with radii greater than 1.3R are recovered. Moreover, at least half of the planets with radii greater than 0.9R are recovered. The right panel shows the overall efficiency (Equation (1)) when transit probability is also considered.

Standard image High-resolution image

3.2. Vetting Completeness

Binary stars and some astrophysical noise can masquerade as transit signals. So, every TCE needs to be vetted before qualifying as a planet candidate. This can introduce a reduction in completeness as some genuine planets may be vetted out as false positives. This vetting efficiency is presented in Figure 2 for our ranges of interest, produced using the robotic vetting code Robovetter (Coughlin 2017; Thompson et al. 2018). For the short-period planets we are interested in, the vetting efficiency has an average of 93%, with no obvious dependence on planet radius or period. So, in this study, we simply assume this to be 100% for the entire space.

Figure 2.

Figure 2. Vetting efficiency for our ranges of planet radius and orbital period, obtained using Robovetter by Coughlin (2017). On average, 93% of simulated planets within this range pass the vetting tests.

Standard image High-resolution image

3.3. Reliability

Morton et al. (2016) assigned an FPP for each planet candidate using an automated validation procedure. This uses a full range of observational constraints to calculate the probabilities of various hypothetical scenarios (genuine planets, brown dwarfs, eclipsing binaries, etc). We have discarded all false positives, but retained candidates in both the "confirmed" and "candidate" categories, with the proviso that FPP < 0.1. We follow convention and define a reliability index =(1 − FPP). These are shown in Figure 3 by the colors of the dots. Because reliability is high in our sample, we simply ignore this distinction and assume all candidates are genuine.

Figure 3.

Figure 3. The sample of 280 planet candidates in our analysis, plotted as their observed radii vs. periods. The contours indicate the total detection efficiency (including transit probability), and the color of each dot indicates the reliability index (1 − FPP). The size histogram and period histogram are shown to the left and top of the main figure. Even without correcting for completeness, one can already see the two gaps, one between super-Earths and mini-Neptunes at around 2R, and one between sub-Earths and super-Earths at around 1R.

Standard image High-resolution image

4. Size Distribution of Small Planets—A New Gap

Equipped with the planet sample and their detection efficiencies (Figure 3), we proceed to recover the underlying distribution. We first present our analysis using the straightforward IDEM method before elaborating on a Bayesian approach. The two are shown to produce compatible results.

In the following, we divide our planet sample on a logarithmic grid of period and radius. To ensure sufficient radius resolution, we choose to use 20 radius bins from 0.6 to 4R. To ensure a sufficient number of candidates in each bin, we choose a relatively crude period resolution, five bins from 1 to 16 days. Even with this choice, there are some bins that contain zero or only one candidate, and their information contents have to be carefully evaluated (see below).

4.1. IDEM: Inverse Detection Efficiency Method

One straightforward, nonparametric way of obtaining the occurrence rate is called "IDEM" (Howard et al. 2012; Petigura et al. 2013). Assuming the observed population is drawn from an underlying population weighted by the detection efficiency, one can retrieve the differential occurrence rate by dividing the number of planets detected in the bin ${\rm{\Delta }}\mathrm{ln}P{\rm{\Delta }}\mathrm{ln}{R}_{P}$, n(P, RP ), by the detection efficiency,

Equation (2)

where η(P, R) is the detection efficiency, suitably averaged over the relevant grid, and N* is the stellar sample size. We then sum over the period bins to obtain a size distribution

Equation (3)

The dominant source of error is Poisson error, as opposed to measurement errors. To obtain an estimate on the uncertainty, we note that because we are dealing with small candidate numbers in each bin (some bins even contain zero detection), the confidence interval cannot be estimated using the usual $1/\sqrt{n}$ approach. Instead, we adopt Table 6 of Feldman & Cousins (1998) to obtain the 95% confidence limits when presented with a detection n(P, RP ) ≥ 0. When summing over different period bins, we assume that the confidence limits from each bin are independent and adds the errors quadratically. The uncertainty range will then increase with bin number. This, we believe, yields a maximal uncertainty range.

This approach departs from previous IDEM studies (e.g., Howard et al. 2012; Petigura et al. 2013; Kunimoto & Matthews 2020) in providing a more conservative error estimate. To obtain more realistic and smaller uncertainties, one needs to adopt a model-based approach (e.g., Bayesian; see below) that accounts for correlated information among bins, while paying a price for extra assumptions.

The result of this analysis is presented in tandem with the Bayesian model in Figure 5.

4.2. Bayesian Modeling

4.2.1. Model Ansatz

To conduct a Bayesian analysis, we need to make an array of assumptions for the underlying distribution.

First, we will depart from tradition and consider the distribution in planet mass and period. We use the planet core radius (Rc ) as a proxy for its mass (roughly, ${M}_{p}\propto {R}_{c}^{4}$; Fortney et al. 2007). Because mini-Neptunes and super-Earths are of the same origin and therefore likely share the same mass distribution, it seems sensible to combine their signals together by working on planet mass. Moreover, this enables us to separate the distribution in period–radius space (see below).

While Rc RP for super-Earths, mini-Neptunes need some explanation. According to the theory of photoevaporation, the radii of the two populations are spaced by a ratio that is nearly 2. We adopt the number 1.7 taken from Owen & Wu (2017) and Wu (2019). This ratio is the result of fitting the observed distributions with evaporation theory. It is also the ratio between the observed super-Earth peak (∼1.4 R) and mini-Neptune peak (∼2.4 R) in our sample. So in our study, for any planets lying above 2R (the so-called "Fulton" gap), we simply divide their observed radii by 1.7 to obtain their core radii. When correcting for detection completeness, we note that while super-Earths are smaller and are in principle more incomplete compared to mini-Neptunes, the difference is minor and can be ignored (Figure 3).

One naturally wonders if this procedure produces artificial features. We show that folding Neptunes into super-Earths, while enhancing the super-Earth peak, is not critical to our conclusions on the existence and characters of sub-Earths.

Next, we assume that the planet distribution in the PRc plane is separable,

Equation (4)

Such a separability assumption, often adopted in the literature (e.g., Silburt et al. 2015; Mulders et al. 2018; Neil & Rogers 2020), is problematic when describing the distribution of super-Earths and/or mini-Neptunes individually, because their relative proportion depends on distances from the star. It is, however, reasonable when describing the joint distribution. Studies of the planet-mass distribution (e.g., Wu 2019) have shown that it is relatively distance dependent. This is another reason behind our adoption of Rc as the relevant parameter.

In the following, we will experiment with different forms of fR , while using the same broken power law for fP ,

Equation (5)

where Pb , α, and β are free parameters to be determined from data. This form is motivated by previous studies of super-Earths/mini-Neptunes (e.g., Silburt et al. 2015; Mulders et al. 2018; Neil & Rogers 2020). We assume that any new population of planets also satisfies the same rules. We relax this assumption in Model 4.

For the planet-mass (or core-radius) distributions, we adopt the following three models, each with a different physical significance.

Model 1: one Gaussian. Here, we assume that nature produces only one single population of planets, for core radius from 0.6 ⊕ to 2 ⊕ (or from ∼0.2 to 16M, if these planets all share the terrestrial composition). We describe this using a log-normal distribution in core radius,

Equation (6)

We adopt broad linear priors for both μ0 and σ0: μ0 ∈ [0.6, 2]R and σ0 ∈ [0, 1].

Model 2: two Gaussians. To allow for a separate population of planets, we assume the core-radius distribution is a superposition of two Gaussians,

Equation (7)

where the first term describes the super-Earth/Neptune population, and we adopt linear priors of μ1 ∈ [1.3, 1.6]R and σ1 ∈ [0, 0.5]. These narrower priors are intuited from both the above IDEM analysis and previous studies (e.g., Hsu et al. 2019; Neil & Rogers 2020). For the second term, employed to describe a new lower-mass population, we assign broad linear priors of μ2 ∈ [0.6, 1.2] R and σ2 ∈ [0, 1].

Model 3: power-law+Gaussian. While super-Earth/Neptunes appear to be well described by (narrow) Gaussians, this does not appear so for the new population of planets we are after. The above two-Gaussian results, as well as our IDEM analysis, suggest that they may have a broad radius range, compatible with a power-law distribution. We attempt the following power-law/Gaussian combination,

Equation (8)

We select a linear prior of γ ∈ [ − 3, 1]. This is not restrictive, as it allows for both ascending and descending trends toward small sizes. Moreover, neither this model nor the two-Gaussian model presumes the presence of a gap-like feature.

Model 4: two periods. The previous three models adopt the same period distribution for any planet population. Here we relax this assumption by allowing for a separate period distribution for each population (based on model 3),

Equation (9)

Here both period distributions have the same form as Equation (5) but they can have different parameters. Moreover, because super-Earths/Neptunes extend to well beyond 16 days and their period distribution has been well determined (see, e.g., Neil & Rogers 2020), we can adopt narrower priors on their parameters based on previous studies (see Table 2).

Table 2. The Best-fit Value of Parameters for Each Occurrence Rate Density Model with 68% Credible Interval and the Corresponding Akaike Information Criteria

Parameter1 Gaussian2 GaussianPower-Gaussian2 PeriodPrior
A ${0.008}_{-0.002}^{+0.003}$ [0, 0.05]
μ0 ${1.31}_{-0.07}^{+0.13}$ [0.6, 2]
σ0 ${0.50}_{-0.08}^{+0.17}$ [0, 1]
B ${0.0029}_{-0.0010}^{+0.0013}$ ${0.0027}_{-0.0008}^{+0.0010}$ ${0.0031}_{-0.0012}^{+0.0014}$ [0, 0.01]
μ1 ${1.44}_{-0.03}^{+0.02}$ ${1.43}_{-0.02}^{+0.02}$ ${1.43}_{-0.02}^{+0.02}$ [1.3, 1.6]
σ1 ${0.14}_{-0.02}^{+0.03}$ ${0.14}_{-0.02}^{+0.02}$ ${0.13}_{-0.02}^{+0.02}$ [0, 0.5]
C ${0.005}_{-0.003}^{+0.003}$ [0, 0.01]
μ2 ${0.74}_{-0.09}^{+0.12}$ [0.6, 1.2]
σ2 ${0.6}_{-0.3}^{+0.3}$ [0, 1]
D ${0.0028}_{-0.0008}^{+0.0010}$ ${0.0030}_{-0.0009}^{+0.0015}$ [0, 0.01]
γ $-{1.2}_{-0.7}^{+0.5}$ $-{0.9}_{-0.6}^{+0.4}$ [−3, 1]
α ${2.4}_{-0.264}^{+0.293}$ ${2.3}_{-0.2}^{+0.3}$ ${2.3}_{-0.3}^{+0.3}$ ${2.5}_{-0.4}^{+0.3}$ [1, 3]
β ${0.72}_{-0.18}^{+0.16}$ ${0.73}_{-0.19}^{+0.18}$ ${0.73}_{-0.19}^{+0.18}$ $-{0.0}_{-0.5}^{+0.4}$ [−1, 1]
Pb ${4.1}_{-0.5}^{+0.5}$ ${4.2}_{-0.5}^{+0.5}$ ${4.2}_{-0.6}^{+0.5}$ ${4.4}_{-0.6}^{+0.6}$ [2, 10]
αI ${1.9}_{-0.2}^{+0.2}$ [1.5, 3]
βI ${0.3}_{-0.3}^{+0.2}$ [−0.5, 0.5]
Pb,I ${8.8}_{-1.3}^{+0.9}$ [5, 10]
AIC2841280227992801
relative likelihood∼10−10 0.210.4

Note. Best models are the power-Gaussian as it has the lowest AIC. The last row is the likelihoods of other models relative to the power-Gaussian one.

Download table as:  ASCIITypeset image

4.2.2. Bayesian Operations

Here, we apply Bayesian inference to determine the parameters of the maximum posterior probability.

Given the observed properties of a planet sample ("PCs"), the posterior probability of a hypothesis θ (with its attendant parameters) can be calculated using the Bayes' theorem as

Equation (10)

where P({PCs}∣ θ ) is the likelihood and π( θ ) the prior. We use exclusively flat priors.

As the planets are discrete points in the period–radius space, we treat them as a Poisson point process, following earlier works (e.g., Youdin 2011; Burke et al. 2015; Bryson et al. 2020). Defining the occurrence rate within a bin ${\rm{\Delta }}\mathrm{ln}P\,{\rm{\Delta }}\mathrm{ln}{R}_{C}$ as

Equation (11)

where θ stands for an ansatz and its attendant parameters, we can write the total expected number of planet detection as

Equation (12)

where η(P, Rc ) is the averaged detection completeness within the bin (Section 3). We choose a fine grid, with 20 period bins (1–16 days) and 30 core-radius bins (0.6–2R).

The likelihood function is then

Equation (13)

where the product is carried over each detected candidate.

In detail, we first use the package scipy.optimize to obtain values that maximize the likelihood function. We then sample the posteriors using the Markov Chain Monte Carlo (MCMC) algorithm provided by the emcee package (Foreman-Mackey et al. 2013). We use 50 walkers, 1000 "burn-in" steps, and 5000 MCMC steps to find convergence, where the walkers of "burn-in" are initiated in a narrow Gaussian whose mean is the maximum likelihood solution. The parameters corresponding to the median of posterior distribution are shown in Table 2, with Figure 4 illustrating the corresponding size distributions for our best models.

Figure 4.

Figure 4. Corner plot for the power-Gaussian model, showing the covariance between different parameters and their posterior distributions.

Standard image High-resolution image

To compare the relative merits of various models, we employ the Akaike information criterion (AIC), a measure of information loss. This takes account of both the quality of the fit and the simplicity of the model to avoid both under-fitting and over-fitting. Let the latter be quantified by the number of free parameters, n θ , we write

Equation (14)

The best model has the lowest AIC score, and the relative merit (likelihood) between models is quantified as e(−ΔAIC/2). So, any model that has an AIC score higher by ∼10 relative to the best model is clearly rejected.

4.3. Results

Here, we describe the results of our Bayesian inferences and compare them against those from IDEM. Their AICs and relative likelihoods are presented in Table 2.

First, among the models we tested, the Bayesian analysis strongly rejects Model 1 (single population) and requires the presence of at least one other population of planets, in addition to super-Earths. Model 1 is extremely unlikely, with a relative likelihood of $\exp (-40/2)\sim {10}^{-10}$ compared to the two-population models.

Second, the two-period model (Model 4), while allowing for more flexibility in the period distributions, performs slightly worse than the models that insist on a single period distribution. In other words, there is no statistical evidence suggesting that the two populations (super-Earths and sub-Earths) have different period distributions within the limited range that we probe (1–16 days). In fact, the obtained period distributions all feature a power-law rise to a few days, and a roughly logarithmically flat distribution afterwards, analogous to earlier studies using more extended samples (Silburt et al. 2015; Mulders et al. 2018; Neil & Rogers 2020).

Third, Models 2 and 3 are statistically indistinguishable (Model 3 is slightly preferred). Their size and period distributions also look similar (Figure 5). This is because the Gaussian distribution of sub-Earths, in the two-Gaussian model, is very broad and is compatible with a power law. From now on, we choose to focus on Model 3, the so-called power-Gaussian model.

Figure 5.

Figure 5. Comparison of results from both the Bayesian analysis (power-Gaussian model in black line, dark-gray and light-gray zones indicating 68% and 95% credible intervals, respectively) and the parameter-free IDEM analysis (red curve, pink shadow zone being 95% credible interval). The blue curve shows the best-fit two-Gaussian model. All models agree on the fall of super-Earths and the emergence of a separate population at smaller sizes. In addition, the broken cyan line shows the contribution of the power-law component (sub-Earths) in the power-Gaussian model.

Standard image High-resolution image

In Model 3, super-Earths are centered around 1.4R with a narrow width of σ1 ∼ 0.13. In planet mass, these correspond to a centroid of ∼4M and an FWHM of a factor of 3.2. Given that we are focusing on host stars of lower masses (median mass 0.89M), this centroid falls below that obtained by Wu (2019), Mp ≈ 8M(M*/M). The reason for the discrepancy is unclear for now. The FWHM does agrees with that in Wu (2019). The super-Earth population appears to peak narrowly in core mass, a mystery of the formation.

What is of most interest to us is the size distribution of sub-Earths. The power-Gaussian model yields a size index $\gamma =-{1.2}_{-0.7}^{+0.5}$, or the number of sub-Earths most likely rises toward small sizes. This is in drastic contrast to super-Earths, which fall off steeply toward small sizes. In addition, not all objects bigger than Earth belongs to the super-Earth group. Even at the super-Earth peak (∼1.4 R), sub-Earths make a non-negligible contribution (∼24%). Lastly, when we integrate all planets inward of 16 days and with core sizes from 0.6 to 2R, we find that super-Earths and sub-Earths are comparable in number (21% ± 4% versus 26% ± 4%).

There is a possibility that the negative γ value is driven not by sub-Earths below 1R, but by larger planets. Following the advice of the reviewer, we experiment with a more sophisticated model that includes a Gaussian (for super-Earths) and a broken power law. This introduces two new parameters, the break radius, and a new power-law index. We find a break radius of 0.77R, with the power index above this size of −1.8 ± 0.8, even steeper than our original result ($-{1.2}_{-0.7}^{+0.5}$). Due to the lack of objects below this radius, the index below is very uncertain (${0.7}_{-1.7}^{+1.0}$). This allows us to conclude that the rising power law we find is natural to the data and is not an artifact of our adopted model.

To compare the parameter-free IDEM results against the Bayesian results, we first convert the observed planet radii to core radii, as detailed above. We then interpolate the IDEM results between bins, using a third-degree spline smoothing to obtain the desired core-size distribution, ${df}/d\mathrm{ln}{R}_{C}$. The comparison is shown in Figure 5. They generally agree with each other. And as expected, our treatment of uncertainties in the IDEM approach yields larger error bars compared to those from forward models.

5. Discussions

The decisive evidence for the new sub-Earth population is the clear inflection point in the size distribution, located around 1R (Figure 5). From the peak of super-Earths (1.4R) to this point, the occurrence rate of super-Earths drops markedly by a factor of a few, while the sub-Earth population takes over gradually in dominance. The latter is only revealed thanks to the narrow mass distributions of the former. The mass distribution of sub-Earths appear much wider and can be described by a rising power-law toward small sizes. So, in our model, this inflection point is actually a "gap."

Here, we discuss the robustness of these results and compare them against previous works.

5.1. Sample Cuts and Other Variations

Our analysis is performed on a small set of "clean" planet samples (298 in total). Here, we vary the sample criteria and observe how the results for the power-Gaussian model change. The priors remain the same as before. For each experiment, we remove (or add) one constraint on the property of the stars or PCS, keeping other cuts unchanged.

We also experiment with alternative priors and completeness treatment. All experiments yield similar results compared to our default case (see Figure 6). We comment on each briefly below.

  • 1.  
    No stellar radius cut: we experiment with employing all stars, regardless of radii. We exclude stars that satisfy the following crude cuts,
    Equation (15)
    as they are most likely evolved. There are then 48,642 stars and 794 PCs. The stellar sample has expanded by a factor of 3.7, while the planet sample only a factor of 2.7, leading to a slightly lower occurrence (Figure 6). Moreover, the super-Earth peak is broadened, due to the known correlation between planet mass and stellar mass. The sub-Earth slope is γ ≈ −1.1.
  • 2.  
    No Kp cut: while dimmer stars return fewer super-Earths and sub-Earths, the completeness correction largely accommodates this correctly. There is no perceptible change to the occurrence rates.
  • 3.  
    No FPP cut: retaining all objects, regardless of their FPP values, adds 36 new candidates to our list, but produces no change to the results. The newly added candidates do not make the gap features more conspicuous, presumably because they are more likely to be potential false positives. However, we believe that unconfirmed PCs with a low FPP value are reliable enough and the sub-Earth distribution in our default case is likely genuine.
  • 4.  
    Alternative priors: we have chosen broad and flat priors previously. Here, we follow the insights from Wu (2019) and use a narrower prior on super-Earths: μ1 is now normally distributed with a mean at 1.43R and a variance of 0.05R, and σ1 is normally distributed with a mean of 0.1 and a variance of 0.1. The results remain the same.
  • 5.  
    Alternative completeness: while we have adopted Burke & Catanzarite (2017a) to obtain the detection completeness, there exists an alternative approach (Christiansen et al. 2020). In this method, the completeness of a detection is expressed as a function of its MES (multiple-event-statistics) value, as well as some parameters a, b, c that are empirically obtained from pixel-level injection experiments,
    Equation (16)
    Because pixel-level injections are computationally expensive, typically one injection per star is possible. These empirical parameters are therefore the best fits for the entire Kepler stellar sample. This type of correction is used by, e.g., Neil & Rogers (2020) and Hsu et al. (2019). Here, we employ KeplerPORTS to map radius and period into MES for every planet host in our sample, and then adopt a = 29.41, b = 0.284, and c = 0.891 from Neil & Rogers (2020). Running the Bayesian analysis returns results that do not differ significantly from our original ones (see Figure 6).
  • 6.  
    Reliability correction: as some planet candidates are less reliable (a higher false-positive probability) than 100%, we experiment with down-weighting them in the analysis and repeat the IDEM analysis. This makes negligible difference, presumably because all candidates are above 90% reliable, with most close to 100%.

Figure 6.

Figure 6. Comparing the core-radius distribution across models. The black curve and the shaded belts are the same as those in Figure 5, while the colored curves, and the error bars marking their respective 68% credible interval, are from different experiments (only at two select radii for clarity). Regardless of sample selection or other variations, super-Earths are found to have a narrow size distribution. With the exception of the "confirmed" cut, a new component (sub-Earth) that rises toward small sizes is required in all models.

Standard image High-resolution image

5.2. Comparison with Other Works

There are two recent works that employ updated completeness corrections and have included sub-Earths into their modeling efforts (Hsu et al. 2019; Neil & Rogers 2020). We compare our results with theirs here for the period range from 1 to 16 days. The Neil & Rogers (2020) work infers distributions in planet masses. We use simple scaling relations of $M/{M}_{\oplus }={({R}_{C}/{R}_{\oplus })}^{4}$ for RC > 1.2R and $M/{M}_{\oplus }={({R}_{C}/{R}_{\oplus })}^{3}$ for RC ≤ 1.2R to translate their mass distribution into the core-radius distribution. The other study (Hsu et al. 2019) infers the distribution in planet transit radius. We convert from these to the core radius as laid out in Section 4.2 and interpolate between bins where necessary. The comparisons are displayed in Figure 7, and we briefly describe any similarities or differences here.

Figure 7.

Figure 7. Comparison of our core-radius distribution (same model as that in Figure 5) and that from Hsu et al. (2019; green) and Neil & Rogers (2020; blue), for planets with periods from 1 to 16 days. The color-shaded regions indicate their respective 68% credible interval. Three studies differ in sample selection, completeness correction, and inversion method.

Standard image High-resolution image

Similar to our study here, both works show a dropping off of super-Earths toward small sizes, and a new population below about 1R.

In the parameter-free, grid-based inversion model of Hsu et al. (2019), one observes a radius gap between super-Earths (including Neptunes) and sub-Earths. However, because their stellar sample covers a wide range of spectral types (mostly FGK dwarfs with no cuts in stellar radius), their super-Earth peak is broader and lower in amplitude, resembling closely our "no-radius-cut" case in Figure 6. By focusing on a narrow range of spectral types, we end up with a smaller planet sample but can detect more prominent features.

The forward-modeling approach of Neil & Rogers (2020) is very similar in spirit to our Bayesian models. By parameterizing the size distributions as a number of distinct Gaussians, they find that a new population of small planets (which they term "intrinsically rocky") is strongly demanded by the data. However, their overall size distribution does not show a clear inflection point, let alone a "gap" as we find here. We compare their results with our two-Gaussian model in detail. Their best-fit super-Earth peak lies at RC = 1.53R (slightly larger than our 1.44R), but with a much broader dispersion of σ = 0.63 (compared to our 0.14 ± 0.03). We attribute this latter difference to their wide range of stellar properties. They include almost all dwarfs, and because super-Earth properties are known to correlate with stellar properties, this tends to broaden the super-Earth peak. More significantly, their sub-Earths are described by a log-normal distribution with a peak at size RC ≈ 0.96R (compared to our 0.74) and a radius dispersion of σ ∼ 0.4 (same as our 0.4). The fact that their sub-Earths are more massive than ours, and that their super-Earths are more broadly distributed, give rise to a gap-free transition from super-Earths to sub-Earths (Figure 7), as well as a dropping occurrence toward very small sizes. These are qualitatively different from our results and those in Hsu et al. (2019). We believe that our choice of parameter space (narrower while preserving almost all sub-Earth candidates) is better suited for resolving any possible feature in the size distribution. A recent investigation, using their original model but with more relaxed assumptions, leads to better agreements with our results (A. Neil, 2020 private communication).

5.3. Insights from Ultra-short-period Planets

We have selected an optimum range to study sub-Earths. Our choices of 1–16 days and bright hosts minimize the uncertainties due to extreme completeness correction, without sacrificing many candidates. We have high confidence in our results because the corrected model actually does not look dramatically different from the raw data (Figure 8).

Figure 8.

Figure 8. Comparing the observed size distributions of ultra-short-period planets (<1 day; red histogram, uncorrected for completeness) with our default sample (blue histogram, 1–16 days, uncorrected), and our power-Gaussian model (gray curve, corrected). The latter two are scaled to have the same heights as the USPs at 1.3R. The USPs show an excess of sub-Earths than our planet sample, due to better completeness. They therefore provide support to our model.

Standard image High-resolution image

Another support for our model comes from an intriguing class of objects, the so-called ultra-short-period planets (USP for short), planets that have orbital periods shortward of one day (e.g., Rappaport et al. 2013; Sanchis-Ojeda et al. 2014; Winn et al. 2018). At these short periods, Kepler can be sensitive to much smaller planets than otherwise possible. The origin for these boiling worlds is still under debate (e.g., Lee & Chiang 2017; Petrovich et al. 2019; Pu & Lai 2019; Millholland & Spalding 2020), though scenarios involving dynamical migration by companion planets have received some observational support (Sanchis-Ojeda et al. 2014; Adams et al. 2020). If so, their mass distribution should be similar to those farther away. Here, we exploit this advantage to shed light on our model. 2

Figure 8 compares the observed size distribution of USPs (<1 day) against that of our default sample (1–16 days) and against our preferred model. These 127 USPs are discovered by the Kepler main mission, are smaller than 2R, and include both "confirmed" and "candidate" dispositions. Their host magnitudes are mostly between 14 and 16 mag. We have also updated their radii using Gaia stellar parameters (Berger et al. 2020b).

Compared to our planet sample, the USPs show an excess of sub-Earths, thanks to their better completeness in these small sizes. More importantly, one can see a near-perfect agreement between the uncorrected USPs and our power-Gaussian model. If anything, the USPs suggest that the sub-Earth rise may be steeper than our current model, after completeness correction is properly applied.

Other than supporting our model, this result also suggests that USPs are indeed formed similarly as planets farther away.

5.4. Siblings of Sub-Earths

In our sample, there are 71 PCs with radii smaller than 1.1R. These most likely belong to the sub-Earth population, according to our power-Gaussian model. Among them, 33 sub-Earths have at least one (though frequently more than one) transiting sibling planet. These multiple-planet systems are shown in Figure 9. We see that sub-Earths appear to happily coexist with larger planets (super-Earths, mini-Neptunes, even Jovians), with no substantial differences in both their period distribution and their stellar mass distribution.

Figure 9.

Figure 9. About half of sub-Earths in our default sample have transiting planet siblings. They are plotted here according to their sizes. The systems are arranged from top to bottom by decreasing host-star masses.

Standard image High-resolution image

The fact that about half of all sub-Earths have transiting siblings may indicate that the majority of them prefer to reside in systems with other larger planets.

6. Conclusion

The primary goal of the Kepler mission is to find Earth analogs in the habitable zones (Koch et al. 2010). It has had some success in finding rocky planets around low-mass stars where the habitable zones are close in (see e.g., Borucki et al. 2013; Kane et al. 2016), but there are few such solid detections around Sun-like stars (see Figure 3 in Borucki et al. 2019). While Kepler did find an abundance of super-Earths, these are likely stripped-down versions of mini-Neptunes and do not exist beyond a few tens of days.

In this study, by choosing a mostly complete planet sample, by assuming simple models with broad priors, we are able to establish that the core radii of super-Earths peak narrowly around 1.4R (corresponding to ∼4M for a terrestrial composition). At smaller sizes, we find that these planets disappear rapidly in number, with the data showing a clear inflection point and most likely a gap, between super-Earths and a new population of small planets (sub-Earths). The existence of an inflection point is robust, regardless of sample selection, inversion method, or completeness correction. The size distribution of USPs provides additional support to our model.

We find that sub-Earths, dominant in number below ∼1 R, have a similar period distribution as super-Earths, coexist with larger planets in the same planetary systems, and, most interestingly, can be described by a rising power law toward small sizes. This is in contrast to the nearly singular distribution of super-Earths.

This implies that sub-Earths and super-Earths are formed differently. While super-Earths (and their cousins that still retain the primordial H/He envelopes, mini-Neptunes) are most likely formed in gaseous protoplanetary disks (and therefore may be called Generation I planets), sub-Earths, we speculate, may form later in a collisional debris disk, much like the terrestrial planets are theorized to do (e.g., Chambers & Wetherill 1998; Kokubo & Ida 1998; Morishima et al. 2008). In such a debris disk, the final planetary outcome depends on the initial mass budget. One therefore expects the planetary masses to be broadly distributed. We argue that these planets may be thought of as Generation II planets.

If our speculation that these planets are true terrestrial analogs is correct, we expect them to extend to at least the astronomical-unit region. They are then likely the only population of rocky planets in the habitable zones, and they are the most likely harborer of life. While Kepler just fell short of discovering them at distances of astronomical units, more sensitive new missions should be organized to go after these targets.

To provide rough guidance for these future searches, we boldly estimate the occurrence rate of sub-Earths in the habitable zone (Table 3), extrapolating from models that apply only within 16 days. The estimates show substantial uncertainties, largely because of our short lever arm in constraining the period distribution. Even an extension by a factor of 2 in period would have helped firming down the η value substantially.

Table 3. Estimated Occurrences of Sub-Earths (0.6 − 2R) around GK Dwarfs

Models f (1–16 days) f (200–400 days, η)
Power-Gaussian ${26}_{-4}^{+4} \% $ ${143}_{-73}^{+138} \% $
Two-Gaussian ${24}_{-7}^{+6} \% $ ${129}_{-67}^{+135} \% $
Two-period ${23}_{-4}^{+4} \% $ ${8}_{-7}^{+30} \% $

Download table as:  ASCIITypeset image

This work makes use of the NASA Exoplanet Archive, and results from the NASA Kepler mission, the ESA Gaia mission, and the California-Kepler survey. Danley Hsu, Andrew Neil, Eric Ford, and an anonymous referee provided helpful feedback. We acknowledge NSERC for a research grant, including support for Q.Y.S. as a summer undergraduate researcher.

Software: Robovetter (Coughlin 2017; Thompson et al. 2018), KeplerPorts (Burke & Catanzarite 2017b), emcee (Foreman-Mackey et al. 2013), corner (Foreman-Mackey 2016), Scipy (Virtanen et al. 2020), Numpy (Harris et al. 2020), Matplotlib (Hunter 2007), pickle (Van Rossum 2020).

Footnotes

  • 1  

    This is already clear for mini-Neptunes and becomes clear for super-Earths as well in this work.

  • 2  

    There is another advantage to using USPs. Ultra-short-period candidates are less likely to be false positives due to background eclipsing binaries. At such short periods, ellipsoidal variations would have been clearly visible.

Please wait… references are loading.
10.3847/1538-3881/abe632