The following article is Open access

Data-driven Spectroscopic Estimates of Absolute Magnitude, Distance, and Binarity: Method and Catalog of 16,002 O- and B-type Stars from LAMOST

, , , , , , and

Published 2021 March 10 © 2021. The Author(s). Published by the American Astronomical Society.
, , Citation Maosheng Xiang et al 2021 ApJS 253 22 DOI 10.3847/1538-4365/abd6ba

Download Article PDF
DownloadArticle ePub

You need an eReader or compatible software to experience the benefits of the ePub3 file format.

0067-0049/253/1/22

Abstract

We present a data-driven method to estimate absolute magnitudes for O- and B-type stars from the LAMOST spectra, which we combine with Gaia DR2 parallaxes to infer distance and binarity. The method applies a neural network model trained on stars with precise Gaia parallax to the spectra and predicts Ks-band absolute magnitudes ${M}_{K{\rm{s}}}$ with a precision of 0.25 mag, which corresponds to a precision of 12% in spectroscopic distance. For distant stars (e.g., >5 kpc), the inclusion of constraints from spectroscopic ${M}_{K{\rm{s}}}$ significantly improves the distance estimates compared to inferences from Gaia parallax alone. Our method accommodates for emission-line stars by first identifying them via principal component analysis reconstructions and then treating them separately for the ${M}_{K{\rm{s}}}$ estimation. We also take into account unresolved binary/multiple stars, which we identify through deviations in the spectroscopic ${M}_{K{\rm{s}}}$ from the geometric ${M}_{K{\rm{s}}}$ inferred from Gaia parallax. This method of binary identification is particularly efficient for unresolved binaries with near equal-mass components and thus provides a useful supplementary way to identify unresolved binary or multiple-star systems. We present a catalog of spectroscopic ${M}_{K{\rm{s}}}$, extinction, distance, flags for emission lines, and binary classification for 16,002 OB stars from LAMOST DR5. As an illustration, we investigate the ${M}_{K{\rm{s}}}$ of the enigmatic LB-1 system, which Liu et al. had argued consists of a B star and a massive stellar-mass black hole. Our results suggest that LB-1 is a binary system that contains two luminous stars with comparable brightness, and the result is further supported by parallax from the Gaia eDR3.

Export citation and abstract BibTeX RIS

Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

1. Introduction

O-type and B-type (OB) stars constitute the population of massive, young, and luminous stars in a galaxy. They play a significant role in many aspects of astrophysics. They are important factories for element production (Thielemann & Arnett 1985; Timmes et al. 1995; Woosley & Weaver 1995; Chieffi & Limongi 2004; Nomoto et al. 2006) and act as major sources of ionization and energetic feedback to the interstellar medium and intergalactic medium (Freyer et al. 2003, 2006; Hopkins et al. 2014; Mackey et al. 2015; Struck 2020). They often form in binaries and are candidate companions for, or precursors of, black holes and associated gravitational wave events (Abbott et al. 2016a, 2016b, 2017; Belczynski et al. 2016; Liu et al. 2019a). In our Galaxy and elsewhere, they serve as signposts of star formation and diagnostics of the initial mass function (e.g., Lequeux 1979; Humphreys & McElroy 1984; Reed 2005; Bartko et al. 2010). They also serve as tracers of the structure and dynamics of the Galactic disk, including features like spiral arms (e.g., Torra et al. 2000; Shu 2016; Xu et al. 2018; Chen et al. 2019a; Cheng et al. 2019; Li et al. 2019; Wang et al. 2020). Knowledge of the luminosities and distances of the OB stars in the Milky Way is fundamental to all such analyses.

For OB stars close to the Sun, high-precision distance estimates are obtainable using parallaxes from Gaia DR2 (Gaia Collaboration et al. 2016, 2018a; Lindegren et al. 2018), provided that the stars are not binaries. But for distances larger than ∼1.5 kpc, the parallax-based distance estimates for OB stars might become suboptimal (Shull & Danforth 2019). This calls for developing alternative ways to accurately determine the distances to such stars and detect binarity systematically.

Photometric distance estimation has a long and successful history: for stars on the lower main sequence, e.g., G and K dwarfs, unreddened broadband colors alone serve as excellent luminosity predictors, as long as the metallicity is known to within ≲0.5 dex (e.g., Ivezić et al. 2008; Jurić et al. 2008). For OB stars, on the other hand, the time spent on or near their zero-age main sequence is so short that, at a given color, their luminosities can range over an order of magnitude. Distant young OB stars also tend to be in directions of high dust extinction with ongoing star formation, further limiting the accuracy of the derived dereddened colors. On top of that, there is also a lack of color variation beyond the Rayleigh–Jeans tail of the OB stars at ∼4000 Å. This combination of factors makes photometric luminosity and distance estimates for hot stars particularly challenging.

The spectra of OB stars contain much more information than photometric colors and can yield powerful constraints on luminosities and distances. There are two conceptually different approaches to infer spectroscopic distances. The first is to derive the basic stellar parameters, ${T}_{\mathrm{eff}}$, $\mathrm{log}\,g$ and $[\mathrm{Fe}/{\rm{H}}]$, and then, for example, employ the "flux-weighted gravity luminosity" relationship (FGLR; Kudritzki et al. 2003), which relates the ${g}_{f}\,\equiv \,g/{T}_{\mathrm{eff}}^{4}$ of a star to its bolometric luminosity (Kudritzki et al. 2020). With this relationship, the ${T}_{\mathrm{eff}}$ and $\mathrm{log}\,g$ derived from the high-resolution spectra of single stars yield luminosities and distances with precisions of <20% and 10%, respectively, for luminous hot stars. A similar method is from Shull & Danforth (2019), who estimated the spectrophotometric distance of 139 O-type dwarfs using absolute magnitudes inferred by empirical relation with spectral types from the Galactic O-star Spectroscopic Survey (GOSSS; Maíz-Apellániz et al. 2004). Stellar isochrones have also been used to infer stellar absolute magnitudes and distances from stellar parameters ${T}_{\mathrm{eff}}$, $\mathrm{log}\,g$, and $[\mathrm{Fe}/{\rm{H}}]$, as has been widely implemented on spectroscopic survey data sets (e.g., Carlin et al. 2015; Yuan et al. 2015b; Wang et al. 2016; Xiang et al. 2017a; Coronado et al. 2018a; Queiroz et al. 2020; Green et al. 2021). So far the application has been mostly focused on FGK stars, but it should provide decent distance estimates akin to the FGLR method for hot luminous stars.

The second approach is to learn the luminosity and distance directly from the data, without the detour of first deriving the stellar parameters. This can be done if numerous examples of the spectra of stars with independently known distances exist, which is the case in the age of Gaia and extensive spectroscopic surveys. Jofré et al. (2015) inferred stellar distance with spectroscopically identified twin stars that have accurate parallax measurements, and achieved a distance precision better than 10% from high-resolution spectra. Hogg et al. (2019) demonstrated that the distances of red giant branch stars with SDSS/APOGEE (Majewski et al. 2017; Abolfathi et al. 2018) can be inferred to within $\lt 10 \% $ using a data-driven model trained on Gaia parallaxes. Leung & Bovy (2019) also estimated the stellar distance from SDSS/APOGEE spectra but with a "deep-learning" method, which simultaneously generates the zero-point offset of Gaia DR2. From the LAMOST low-resolution spectra, Xiang et al. (2017b) deduced absolute magnitudes for AFGK stars by taking stars in common with the Hipparcos (Perryman et al. 1997) as the training set. Verification with Gaia parallaxes suggests that absolute magnitudes inferred from the LAMOST low-resolution spectra with such a data-driven approach can be precise to 0.26 mag, corresponding to a distance precision of 12% (Xiang et al. 2017a).

For OB stars, such a data-driven approach for spectroscopic luminosity estimation comes with two additional complications. First, a non-negligible fraction of massive stars are in close binaries, often with comparable luminosities (Sana et al. 2012, 2013). Second, the optical spectra of OB stars, such as those from the LAMOST survey, frequently show emission lines arising either from their corona, their surrounding disks, or nearby H ii regions. Eliminating these outliers is crucial for achieving robust spectroscopic luminosity estimates.

In this work, we set up a data-driven method to estimate the spectroscopic luminosity and, by implication, the distance of individual OB stars. We employ a training set built from stars with precise parallaxes from Gaia DR2 in a neural network model to map observed spectra to Ks-band absolute magnitudes ${M}_{K{\rm{s}}}$. The method is designed to account for issues stemming from binarity and emission lines. We combine spectroscopic ${M}_{K{\rm{s}}}$ and Gaia parallaxes to identify binary stars as those that are overluminous compared to the single stars. We apply the method to the LAMOST low-resolution ($R\simeq 1800$) spectra for a set of 16,002 OB stars from Liu et al. (2019b), which is by far the most extensive set of spectra for luminous, hot stars. We prefer this data-driven approach over the FGLR, given the complications of validating the accuracy of the stellar parameters determined for OB stars from low-resolution spectra.

Our validation of the results suggests that the combination of the spectroscopic ${M}_{K{\rm{s}}}$ with the Gaia DR2 parallax yields a median distance uncertainty of only 8% for the LAMOST OB star sample. These distances are presented together with a catalog of ${M}_{K{\rm{s}}}$, extinction, and flags for binary and emission lines for the 16,002 LAMOST OB stars. The results allow a reassessment for the nature of the binary system LB-1, which has been recently suggested to hold a $70{M}_{\odot }$ black hole, the most massive stellar-mass black hole ever found (Liu et al. 2019a).

This paper is laid out as follows: Section 2 gives an overview of the methods. Section 3 introduces the identification of emission lines in LAMOST OB star spectra with a principal component analysis (PCA) reconstruction method; Section 4 introduces our extinction estimation; and Section 5 presents the data-driven method for ${M}_{K{\rm{s}}}$ estimation. Section 6 introduces our method for binary identification. Section 7 describes the inference of distance using the spectroscopic ${M}_{K{\rm{s}}}$ together with Gaia parallaxes. Section 8 discusses our estimates for the absolute magnitude and distance to the LB-1 system. We summarize in Section 9.

2. Method Overview

We derive absolute magnitudes ${M}_{K{\rm{s}}}$ from survey spectra with a data-driven neural network model trained on a subset of selected stars with precise parallaxes from Gaia DR2 (Section 5). In light of the non-negligible impact from emission lines on our data-driven models, we identify the spectra containing emission lines using a PCA reconstruction method (Section 3). For these stars, we derive ${M}_{K{\rm{s}}}$ using a separate neural network model, with the emission-line wavelength regions masked. Considering the neural network model is sensitive to spectral noise, we adopt the PCA-reconstructed spectra, for both emission and non-emission stars, as an approximate of the "noiseless" spectra for ${M}_{K{\rm{s}}}$ estimation. With the spectroscopic ${M}_{K{\rm{s}}}$ estimates, we infer distances in combination with the apparent magnitudes and Gaia DR2 parallaxes (Section 7). Binary and multiple-star systems are discarded iteratively from the training sets used for ${M}_{K{\rm{s}}}$ estimation. To identify these systems, which should be overluminous compared to their single-star counterparts, we measure the deviation between the spectrophotometric parallax and the Gaia astrometric parallax (Section 6). To correct for the extinction of individual stars, we use intrinsic colors estimated from synthetic photometry (Section 4). A schematic description of our method is shown in Figure 1.

Figure 1.

Figure 1. Schematic illustration of our data-driven approach for deriving absolute magnitudes and distances for OB stars, including the identification of stars with emission lines and stars in binary or multiple-star systems. Astrometric parallaxes, spectra, and apparent magnitudes are adopted as inputs.

Standard image High-resolution image

Our approach leverages the information contained in spectra, astrometric parallax, and photometric magnitudes, and can be applied to a wide range of stellar types. For the present study, however, we restrict our analysis to LAMOST spectra of OB stars. We adopt the OB star catalog of Liu et al. (2019b), which contains a total of 16,032 OB stars from the fifth data release (DR5) 11 of the LAMOST Galactic surveys (Deng et al. 2012; Zhao et al. 2012; Liu et al. 2014). These OB stars are identified with line indices and further inspected by eye, leading to a high purity of the sample (Liu et al. 2019b). We also adopt the astrometric parallax measurements from Gaia DR2 (Gaia Collaboration et al. 2018a), following Leung & Bovy (2019) to correct for the zero-point offset as a function of G-band magnitude. For the photometric input, we use the 2MASS Ks-band measurements (Skrutskie et al. 2006).

3. Identifying Emission Lines with PCA Reconstruction

The spectra for a considerable fraction of OB stars can exhibit emission lines, either from their associated H ii regions or from surrounding gas disks. Because this emission does not necessarily correlate with the properties of the star, or its ${M}_{K{\rm{s}}}$, these emission lines can confound our estimation of ${M}_{K{\rm{s}}}$. We therefore opt to remove all objects with emission lines in the spectra from our main training set and devise a separate strategy to estimate ${M}_{K{\rm{s}}}$ for these objects, using only the part of the spectrum free of emission lines. Almost all of the spectra with emission lines are found to exhibit ${{\rm{H}}}_{\alpha }$ emission. Some of them also exhibit other hydrogen lines in the Balmer series and Paschen series, as well as some metal lines. Nonetheless, in most cases, the ${{\rm{H}}}_{\alpha }$ emission line is the most prominent. As such, we adopt ${{\rm{H}}}_{\alpha }$ as a key diagnostic to identify OB stellar spectra with emission lines.

To automatically identify spectra with emission lines, we adopt a PCA reconstruction method as described below. In a nutshell, we first define a set of clean wavelength windows that suffer minimal impact from emission lines with which we will determine the coefficient of the principal component of the spectrum. We then reconstruct the full spectrum using the coefficients as well as the full spectrum eigenbases derived from a sample of non-emission stars. We identify the stars with emission lines through the ${{\rm{H}}}_{\alpha }$ difference between the observed spectra and the PCA reconstruction counterparts. This process is implemented iteratively to obtain a sample without emission lines. In practice, we found that only one iteration is sufficient because further iterations will not make a significant change to the results.

Given a set of M spectra ${{\boldsymbol{x}}}_{i}$ (with i = 1, ..., M) each containing N pixels, PCA projects the spectra in spectral space to an eigenspace, where the eigenvalues represent the amount of variance held by each orthonormal eigenbasis. The eigenbases are obtained by diagonalizing the covariance matrix,

Equation (1)

Equation (2)

where ${\boldsymbol{X}}$ is the N × M-dimensional array of spectra and λ is the eigenvalue associated with the eigenbasis (eigenvector) ${\boldsymbol{\xi }}$. Entries in ${\boldsymbol{X}}$ are standardized by subtracting, from each pixel, the mean flux of the training set and then normalized by the standard deviation. The eigenvalues and eigenvectors are computed using the trired.pro and triql.pro scripts in IDL.

In practice, we first consider only pixels in the clean windows (as shown in Figure 2) and use those pixels to construct the matrix ${\boldsymbol{X}}$ of the training spectra. We adopt the matrix ${\boldsymbol{X}}$ to determine the principal components (eigenvectors/eigenbases) $\{{\boldsymbol{\xi }}\}$ in this restricted space. For any given training spectrum, ${\boldsymbol{x}}$, we calculate the principal component coefficients by projecting ${\boldsymbol{x}}$ onto each principal component ${\boldsymbol{\xi }}$. Because the eigenvectors are by definition normalized, the projection is simply the dot product of the two vectors, $p={\boldsymbol{x}}\cdot {\boldsymbol{\xi }}$. The collection of all principal coefficients ${\boldsymbol{p}}$ for all training spectra constitute a M × K matrix, ${\boldsymbol{P}}$, where K is the number of principal components, and M is the number of training spectra. Here we choose only the top K = 100 principal components in order to denoise and omit irrelevant information in the spectra.

Figure 2.

Figure 2. Two example spectra reconstructed with a PCA method for a non-emission (top) and an emission (bottom) star. In each panel, black is the LAMOST spectrum, while red is the PCA-reconstructed spectrum. Marked in blue are the clean wavelength windows with which we construct the principal component coefficients for the PCA reconstruction. Note that, throughout this work, we normalize the spectra with a pseudo-continuum derived by smoothing the spectra with a Gaussian kernel 50 Å in width, as such some pixels in the continuum-normalized spectra can exceed unity.

Standard image High-resolution image

With matrix ${\boldsymbol{P}}$ in place, we then reconstruct the full spectra as follow. Let $\bar{{\boldsymbol{X}}}$ to be the corresponding full spectra matrix for the training spectra, we search for an array ${\boldsymbol{B}}$ such that ${\boldsymbol{PB}}={\bar{{\boldsymbol{X}}}}^{T}$. In other words, we approximate the eigenbases for the full spectra with which $\bar{{\boldsymbol{X}}}$ shares the same principal component coefficients in the full spectral space as those from ${\boldsymbol{X}}$ in the restricted space. Practically, the matrix ${\boldsymbol{B}}$ is solved by inverting ${\boldsymbol{P}}$ with the Gram-Schmidt orthogonalization method. Let ${\boldsymbol{P}}^{\prime} $ be principal coefficients of the test spectra ${\boldsymbol{X}}^{\prime} $ determined by projecting ${\boldsymbol{X}}^{\prime} $ onto the eigenvectors $\{{\boldsymbol{\xi }}\}$, we can then reconstruct the full test spectra via ${\boldsymbol{X}}{{\prime\prime} }^{T}={\boldsymbol{P}}^{\prime} {\boldsymbol{B}}$. Here ${\boldsymbol{X}}^{\prime\prime} $ is the PCA-reconstructed spectra of the test spectra ${\boldsymbol{X}}^{\prime} $.

Figure 2 shows that the full stellar spectra can be well reconstructed using the PCA method. Objects with emission lines are identified using residuals between the PCA-reconstructions and the LAMOST spectra. Figure 3 shows the mean ${{\rm{H}}}_{\alpha }$ flux residual determined with our technique versus the reduced χ2 measured around the ${{\rm{H}}}_{\alpha }$ line. A clear branch of stars with flux excess at the position of the ${{\rm{H}}}_{\alpha }$ line is present. We deem a star to have ${{\rm{H}}}_{\alpha }$ emission if the flux excess exceeds the red vertical dashed line as delineated in Figure 3. Such a criterion identifies 2074 of the 16,002 unique LAMOST OB stars (13%) as emission-line stars. 12 Figure 3 also shows a small fraction of stars with large χ2. These stars are found to have erroneous spectra due to multiple reasons, such as data artifacts or wrong wavelength calibration. Figure 15 in Appendix A shows a few examples of such erroneous spectra.

Figure 3.

Figure 3. Observed ${{\rm{H}}}_{\alpha }$ flux excess for emission spectra with respect to the PCA reconstruction. The horizontal axis shows the differences in mean ${{\rm{H}}}_{\alpha }$ fluxes in λ6571–6557 Å between the observed LAMOST spectra and the reconstructed PCA spectra. The vertical axis shows the reduced χ2 across the wavelength range that encapsulates the ${{\rm{H}}}_{\alpha }$ features (λ6400–6550 Å, λ6590–6690 Å). Spectra with emission lines exhibit a large difference in the ${{\rm{H}}}_{\alpha }$ line between the LAMOST and PCA-reconstructed spectra and are clearly separated from the majority of stars. The vertical dashed line delineates our criterion to select OB spectra with emission lines.

Standard image High-resolution image

4. Extinction

As OB stars are mostly located in the Galactic disk with ongoing star formation, they can suffer from serious interstellar extinction. Accurate correction for extinction is thus necessary to obtain accurate geometric ${M}_{K{\rm{s}}}$. Throughout this study, we refer to geometric ${M}_{K{\rm{s}}}$ as the "apparent", distance-corrected, ${M}_{K{\rm{s}}}$ inferred using Gaia parallax and 2MASS Ks apparent magnitudes. We note that extinction is an issue even we work with the infrared Ks band. A reddening EBV of 1 mag, which is not uncommon for distant OB stars, can cause a ∼0.3 mag extinction in the Ks band (e.g., Yuan et al. 2013; Wang & Chen 2019), leading to a distance bias of 13%.

There are a variety of possible ways that we could obtain extinction for our OB star sample: e.g., through direct use of an existing 3D reddening map (e.g., Green et al. 2019; Chen et al. 2019b), the application of the Rayleigh–Jeans Color Excess method with infrared colors (RJCE; Majewski et al. 2011), or deducing from the intrinsic colors of OB stars either empirically (e.g., Deng et al. 2020) or theoretically from synthetic models. We adopt the latter in this study.

Deriving the intrinsic color requires a robust estimation of the stellar parameters. To achieve that, we leverage the stellar parameters derived for our OB star sample via the implementation of the spectral fitting codes the Payne (Ting et al. 2019) in the hot star regime (M. Xiang et al. 2021, in preparation). Adopting ${T}_{\mathrm{eff}}$, $\mathrm{log}\,g$, and $[\mathrm{Fe}/{\rm{H}}]$ derived in this companion work, we estimate the intrinsic colors of the stars with the MIST isochrones (Choi et al. 2016). We have tested that using the PARSEC isochrones (Bressan et al. 2012) or the empirical ${T}_{\mathrm{eff}}$–color relation of Deng et al. (2020) only incurs a difference of <0.03 mag for the EBV estimate, which is negligible for the purposes of this study.

We estimate EBV through the observed color excesses in the 2MASS J and H magnitudes (Skrutskie et al. 2006), Gaia DR2 BP/RP (Evans et al. 2018), and the $g,r,i$ magnitudes from the XSTPS-GAC survey (Zhang et al. 2014; Liu et al. 2014). For bright ($r\lt 13$ mag) stars that the XSTPS-GAC photometry saturates, we adopt the APASS $B,V,g,r,i$ photometric magnitudes (Henden et al. 2012; Munari et al. 2014) instead. To estimate EBV (and subsequently AKs ), we opt to avoid using Ks-band photometry as it can be contaminated by emission from surrounding gas and dust disk. With all these N photometry bands, we construct $N-1$ colors using only the photometry of adjacent bands. When compared to the intrinsic color, each observed color gives an estimate of EBV . We take the average of these EBV estimates weighted by uncertainties of the colors, which are assumed to be the quadratic sum of the uncertainties of the two photometric bands. On top of that, the uncertainties in EBV estimates are further derived through the propagation of uncertainties in photometric colors. Note that we have assigned a minimal uncertainty of 0.014 mag in all the colors, including the Gaia BP − RP.

To convert color excesses into EBV , we adopt extinction coefficients determined by convolving the public Kurucz model spectra (Castelli & Kurucz 2003; Kurucz 2005) with the Fitzpatrick (1999) extinction curve, assuming RV  = 3.1. We note that the RV value may vary among different environments, from about 2 to 5 (e.g., Fitzpatrick & Massa 2007). In the case of RV  = 5, there will be a 0.2 mag difference in the extinction estimate for a star with ${E}_{B-V}=1$, an extreme case for our sample stars. This will lead to a 9% bias in the distance estimate. First, we note that, a constant shift of RV will not change the precision of our distance estimate. Furthermore, we expect such cases are rare for our sample stars, both because the RV  = 3.1 has shown to be a good approximation in most cases (for diffuse interstellar medium) (e.g., Fitzpatrick & Massa 2007; Schlafly et al. 2016) and that the EBV values of our sample stars are moderate as they are mostly in the Galactic anticenter direction. In addition, the extinction coefficient can vary among stars with different stellar parameters according to the SEDs (e.g., Green et al. 2021). However, considering we only focus on the hot stars, this effect is negligible because all the SEDs for hot stars are similar in the Ks band (the Rayleigh–Jeans tail). With these taken into account, we argue that the effect of RV variation is likely to be subdominant, and we choose to derive the extinction coefficients using a fixed Kurucz spectrum with Teff = 12,000 K, $\mathrm{log}\,g=4.5$, and $[\mathrm{Fe}/{\rm{H}}]=0$.

Figure 4 shows a comparison of the derived EBV with EBV interpolated from the 3D map of Green et al. (2019), using the Gaia distance from Bailer-Jones et al. (2018). The differences for the majority of stars are smaller than 0.1 mag, which corresponds to a negligible extinction difference of ≲0.03 mag in the Ks band. There are a small fraction of stars with large EBV differences. Many of them turn out to be stars with emission lines. The large discrepancy could be caused by the fact that these stars are distant objects that are outside the feasible range of the Bailer-Jones et al. (2018) distance and/or the Green et al. (2019) reddening map. Alternatively, especially for stars with emission lines, they could suffer from additional extinction from the dense H ii regions or surrounding disks. Due to the possibility of contamination in the Ks-band flux by the surrounding environment, we caution about distance estimates from stars with an emission-line flag in this study.

Figure 4.

Figure 4. Comparison of the EBV derived in this work and the EBV from the 3D map of Green et al. (2019). We only consider stars with reliable Gaia parallax measurements ($\varpi /{\sigma }_{\varpi }\gt 5$). For stars without emission lines, the mean and dispersion of the EBV difference are 0.04 mag and 0.09 mag, respectively, whereas for stars with emission lines in the spectra we find a shifted mean of 0.11 mag and a standard deviation of 0.13 mag. The larger EBV for stars with emission lines indicates that these stars are more extincted by the associated H ii regions or by their surrounding gas and dust.

Standard image High-resolution image

5. Data-driven ${M}_{K{\rm{s}}}$ Estimation

The absolute magnitude (luminosity) is an intrinsic astrophysical property of a star that is derivable from a stellar spectrum. It is related to the stellar parameters via the Stefan–Boltzmann equation

Equation (3)

as well as the gravity equation

Equation (4)

where ${T}_{\mathrm{eff}}$, R, M, and g are, respectively, the effective temperature, radius, mass, and surface gravity of the star. Equations (3) and (4) together yield

Equation (5)

Recall that ${T}_{\mathrm{eff}}$ and $\mathrm{log}\,g$ are basic stellar parameters that are derivable from stellar spectra. Furthermore, the stellar mass itself also implicitly depends on ${T}_{\mathrm{eff}}$, $\mathrm{log}g$, abundance [X/H] and rotation velocities, all of which are stellar properties that are readily measurable from stellar spectra. Consequently, one would expect that there exists an empirical relation that connects stellar spectra to the absolute magnitude of the stars. In this study, we will model this relation with a neural network.

5.1. The Neural Network Model

We consider a feed-forward multilayer perceptron neural network model that maps the LAMOST spectra to the absolute magnitude ${M}_{K{\rm{s}}}$ of the stars. Adopting the Einstein sum notation, our neural network contains two-layers and can be succinctly written as

Equation (6)

where σ is the Sigmoid activation function, w and b are weights and biases of the network to be optimized, the index i denotes the neurons, and λ denotes the wavelength pixels. We adopt 100 neurons for both layers. The training process is carried out with the Pytorch package in Python. To reduce overfitting in the training process, we also employ the dropout method (Srivastava et al. 2014) with a dropout parameter of 0.2.

Considering the impact of emission lines, we set up two neural network models. One neural network is constructed for stars without emission lines, using the whole wavelength range except for λ5720–6050 Å, λ6270–6380 Å, λ6800–6990 Å, λ7100-7320 Å, λ7520–7740 Å, and λ8050–8350 Å; these wavelength windows contain prominent absorption bands of the Earth atmospheres and strong Na i D absorption lines from the interstellar medium. The other neural network is for stars with emission lines, using only wavelength windows that are devoid of strong emission lines as demonstrated in Figure 2. Finally, considering that the neural network model is sensitive to spectral noise, we attempted to denoise spectra through the PCA reconstruction, using only the first 100 PCs. For non-emission stars, the PCA-reconstructed spectra are reconstructed using the full wavelength range, while for emission stars, the PCA-reconstructed spectra are reconstructed using only the clean wavelength windows.

5.2. The Training and Test Set

We define two training sets that correspond to the two neural network models set up above, for stars with or without emission lines. We also define a test set that we use to verify the ${M}_{K{\rm{s}}}$ estimation.

The training and test sets adopt stars with good parallax measurements ($\varpi /{\sigma }_{\varpi }\gt 10$). To derive a more robust empirical relation, we require the training stars to have a spectral S/N (per pixel) higher than 50. Stars that meet these requirements are divided into two groups: four-fifths of them are adopted as the training set to train the neural network model, while the remaining one-fifth constitute the test set, in combination with stars with good parallaxes but lower spectral S/N. Roughly 200 stars that are not in the Liu et al. (2019b) sample but have ${M}_{K{\rm{s}}}\lt -1.5$ mag are also added to the training set. The inclusion of these stars is to enlarge the sample size at the brighter end where there is a limited number of training stars. Although not in the original LAMOST OB stars catalog, all of these additional training stars have ${T}_{\mathrm{eff}}\gt 7000$ K according to the LAMOST DR5 stellar parameter catalog of Xiang et al. (2019). Their spectra are further manually inspected to ensure they are early-type stars. About half of them are OB stars that exhibit He absorption lines, while the others are likely late-B or early A-type stars. In total, we obtain 6861 stars for the training set for stars without emission lines and 7161 stars for training set for stars with emission lines.

Because the geometric "apparent" ${M}_{K{\rm{s}}}$ for binaries are biased, the training process is iterated to exclude the binaries. Binaries are singled out based on significant deviation between the predicted ${M}_{K{\rm{s}}}$ and the geometric ${M}_{K{\rm{s}}}$ (see Section 6 for details). In practice, two iterations are implemented, as we find negligible change in the resultant ${M}_{K{\rm{s}}}$ estimates after these iterations.

Figure 5 shows the comparison of the resultant spectroscopic ${M}_{K{\rm{s}}}$ estimates with the geometric ${M}_{K{\rm{s}}}$ for the test stars with ${\rm{S}}/{\rm{N}}\gt 20$. The figure shows good overall consistency between the two sets of ${M}_{K{\rm{s}}}$ estimates across the range roughly from −4 mag to 1.5 mag. Below ${M}_{K{\rm{s}}}\sim 0$ mag, more stars have fainter spectroscopic ${M}_{K{\rm{s}}}$ than the geometric ${M}_{K{\rm{s}}}$. This is due to a contribution to the geometric ${M}_{K{\rm{s}}}$ from binaries at these magnitudes. Recall that the geometric ${M}_{K{\rm{s}}}$ is derived from the apparent magnitudes, where both stars contribute.

Figure 5.

Figure 5. Comparison between the geometric ${M}_{K{\rm{s}}}$ inferred from Gaia parallaxes and the spectroscopic ${M}_{K{\rm{s}}}$ derived from LAMOST spectra for test stars. We only show results for test stars that have precise Gaia parallax measurements ($\varpi /{\sigma }_{\varpi }\gt 10$). The left panel demonstrates the spectroscopic ${M}_{K{\rm{s}}}$ derived from the full spectrum for stars without emission lines. The right panel illustrates the spectroscopic ${M}_{K{\rm{s}}}$ derived using only wavelength pixels that are devoid of emission lines for both non-emission stars (blue/green background) and emission-line stars (red dots). In both panels, the solid line delineates the one-to-one line. The dashed line shows an offset of 0.75 mag from the one-to-one line, the offset one would expect from equal-mass binaries. Stars that are close to the dashed line have geometric ${M}_{K{\rm{s}}}$ brighter than the spectroscopic ${M}_{K{\rm{s}}}$, signaling the possibility of binaries or multiple systems.

Standard image High-resolution image

We note that there are some subdwarfs and white dwarfs in the LAMOST OB star sample. They typically have geometric ${M}_{K{\rm{s}}}$ fainter than 2 mag and are not shown in the figure. For these stars, our spectroscopic ${M}_{K{\rm{s}}}$ estimates, which are trained primarily on normal OB star spectra, can be problematic. The spectroscopic ${M}_{K{\rm{s}}}$ for stars fainter than 2 mag should therefore be used with caution.

Figure 5 also demonstrates that the scatter between the spectroscopic ${M}_{K{\rm{s}}}$ and the geometric ${M}_{K{\rm{s}}}$ is larger for stars with emission lines than for the non-emission stars. Particularly, there are a number of emission-line stars that exhibit spectroscopic ${M}_{K{\rm{s}}}$ much brighter than their geometric ${M}_{K{\rm{s}}}$. Peculiarly, we found that both spectroscopic ${M}_{K{\rm{s}}}$ and geometric ${M}_{K{\rm{s}}}$ of these stars appear to be robust. On the one hand, as illustrated in Figure 6, the spectroscopic ${M}_{K{\rm{s}}}$ for these stars is consistent with the luminosity predicted by the temperature-weighted gravity (Kudritzki et al. 2020). On the other hand, for about half of these stars, their Gaia renormalized unit weight error (RUWE) 13 values are around 1.0, suggesting that at least half of these outliers have decent astrometry measurements. Investigating the nature of these stars is beyond the scope of this study, but one possibility is that they might be stripped stars as a consequence of binary evolution (e.g., Götberg et al. 2018). As a result of the stripping, they are in reality fainter (as probed by the geometric ${M}_{K{\rm{s}}}$) while exhibiting similar spectra to a main-sequence or subgiant star, and hence a brighter spectroscopic ${M}_{K{\rm{s}}}$ (see also Section 8).

Figure 6.

Figure 6. Temperature-weighted gravity vs. ${M}_{K{\rm{s}}}$ for test stars with emission lines. The temperature-weighted gravity of a star is an indicator of its bolometric luminosity. Spectroscopic ${M}_{K{\rm{s}}}$ measurements are shown with dot symbols while geometric ${M}_{K{\rm{s}}}$ are shown with star symbols. The solid line indicates the best-fit linear relation from the sample of nonemission stars.

Standard image High-resolution image

Figure 7 illustrates the difference between the spectroscopic ${M}_{K{\rm{s}}}$ and the geometric ${M}_{K{\rm{s}}}$ for the test stars as a function of stellar parameters ${T}_{\mathrm{eff}}$, $\mathrm{log}\,g$, and $[\mathrm{Fe}/{\rm{H}}]$. The stellar parameters were derived from LAMOST spectra in a parallel work (M. Xiang et al. 2021, in preparation). Only results from single stars without emission lines are shown. This figure demonstrates that our spectroscopic ${M}_{K{\rm{s}}}$ estimates do not exhibit bias with respect to stellar parameters.

Figure 7.

Figure 7. Differences between spectroscopic ${M}_{K{\rm{s}}}$ and geometric ${M}_{K{\rm{s}}}$ as a function of stellar parameters. Only results for single stars are shown. The solid lines delineate the median and standard deviation as a function of stellar parameters.

Standard image High-resolution image

5.3. Measurement Uncertainty and Intrinsic Uncertainty

In this section, we evaluate the quality of our spectroscopic ${M}_{K{\rm{s}}}$ estimates. The nature of the uncertainty can be aleatoric or epistemic. For the former, the uncertainty in ${M}_{K{\rm{s}}}$ is caused by uncertainties in the spectra. The latter can be caused by multiple sources. For example, the spectra simply do not contain the full information of the luminosity of the stars. On top of that, our data-driven models might also be suboptimal in extracting such information. In the following, we will quantify both the aleatoric and epistemic uncertainties using the test set.

To make a complete accounting of the uncertainties in our ${M}_{K{\rm{s}}}$ estimates requires careful characterization of both the contribution from uncertainties in the geometric ${M}_{K{\rm{s}}}$ and the additional scatter raised as a result of unresolved binaries. In order to minimize the impact of binary stars, we calculate the measurement uncertainty and intrinsic uncertainty with an iterative approach. In each iteration, we identify and discard all likely binaries that have more than 2σ difference between their spectroscopic ${M}_{K{\rm{s}}}$ and geometric ${M}_{K{\rm{s}}}$.

The measurement uncertainty and intrinsic uncertainty are estimated in a Bayesian framework. In the following, we will denote the ground-truth geometric absolute magnitude of each star as Mg and the ground-truth spectroscopic absolute magnitude as Ms . We assume that there is a linear relation between the expected mean spectroscopic absolute magnitude ${\bar{M}}^{s}$ and Mg , with a slope close to 1, with a correction term ε, and an intercept of δ,

Equation (7)

We further assume that, for a given ${\bar{M}}^{s}$, due to epistemic uncertainty, the Ms of the individual stars are distributed as a Gaussian with an intrinsic uncertainty of σint, i.e.,

Equation (8)

The estimated absolute magnitudes Mo s are themselves assumed to distribute around a given Ms as a Gaussian distribution with width set by the aleatoric measurement uncertainty σs ,

Equation (9)

For simplicity, we assume that the aleatoric measurement uncertainty depends only on, and scales linearly with, the spectral S/N. In particular, we have

Equation (10)

where ${\sigma }_{s}^{100}$ is the uncertainty for spectra with S/N = 100.

Finally, for the geometric absolute magnitude, we assume a flat prior on the Mg . We can deduce that

Equation (11)

Equation (12)

where ϖ and δϖ are the Gaia parallax and its uncertainty in units of milliarcseconds. m0 and $\delta {m}_{0}$ are the dereddened apparent magnitude and its uncertainty, respectively. The latter is computed for each star as the quadratic sum of the uncertainties in the photometric magnitude and the extinction estimate.

Combining all these ingredients, we arrived at the final posterior probability distribution for the parameters ε, δ, σint, ${\sigma }_{s}^{100}$,

Equation (13)

where N is the number of stars, and the likelihood reads

Equation (14)

In this work, we adopt a flat prior for all the parameters and sample the posterior with Markov Chain Monte Carlo (MCMC).

Figure 8 displays the results of the MCMC fitting to non-emission single stars. The mean of posterior suggests an aleatoric measurement uncertainty of ${\sigma }_{s}^{100}=0.10$ mag for spectra with S/N = 100 and an epistemic intrinsic uncertainty σint = 0.25 mag. We find a moderate slope correction of ε = −0.18, which is likely due to the lingering presence of binaries, considering that our 2σ cut can leave a considerable number of binaries with ${M}_{K{\rm{s}}}$ excess smaller than 2σ (≲0.5 mag) in the sample. The presence of lingering binaries also implies that the intrinsic uncertainty from the MCMC posterior is a conservative estimate as binaries can contribute to part of the scatter.

Figure 8.

Figure 8. MCMC fitting of the intrinsic uncertainty and measurement uncertainty in our ${M}_{K{\rm{s}}}$ estimates. Binaries were excluded iteratively. The intrinsic uncertainty σint indicates the epistemic uncertainty in ${M}_{K{\rm{s}}}$ estimates, whereas the measurement uncertainty σs quantifies the aleatoric uncertainty induced by spectral noise. The MCMC results show an intrinsic uncertainty of σint = 0.25 mag and a typical measurement uncertainty of σs  = 0.10 mag at S/N = 100.

Standard image High-resolution image

6. Binary Identification

Binaries are ubiquitous and play important roles in astrophysics (Abt 1983; Duchêne & Kraus 2013; Moe & Di Stefano 2017). In star clusters, where all member stars share the same distance and the same age, binary stars in the lower main sequence are recognizable as they are brighter than all other single stars that distribute along a well-described locus in the color–magnitude diagram (e.g., Hurley & Tout 1998; Kouwenhoven et al. 2005; Li et al. 2013). The identification of binary stars in the field is more complicated due to the mixture of multiple stellar populations. There are a number of tailored approaches that have been implemented for binary identification in the field, such as interferometry (e.g., Raghavan et al. 2010), eclipsing transits (e.g., Qian et al. 2017; Zhang et al. 2017; Liu et al. 2018; Yang et al. 2020), color displacements (e.g., Pourbaix et al. 2004; Yuan et al. 2015a), astrometric noise excess (e.g., Kervella et al. 2019; Penoyre et al. 2020; Belokurov et al. 2020), common phase space motion (e.g., Andrews et al. 2017; Oh et al. 2017; Coronado et al. 2018b; El-Badry & Rix 2018; Hollands et al. 2018), radial velocity variations (e.g., Matijevič et al. 2011; Gao et al. 2014, 2017; Price-Whelan et al. 2017; Badenes et al. 2018; Tian et al. 2018, 2020), spectroscopic binaries with double lines (e.g., Fernandez et al. 2017; Merle et al. 2017; Traven et al. 2017; Skinner et al. 2018; Traven et al. 2020), and full spectral fitting (El-Badry et al. 2018b; Traven et al. 2020).

The various methods listed above have been extensively employed to identify and characterize binaries from large surveys. Belokurov et al. (2020) demonstrated that Gaia RUWE is an efficient tool for identifying unresolved, short-period binaries with low-to-intermediate-mass ratios. Short-period binaries have also been characterized through their double-line spectra from high-resolution spectroscopic surveys (e.g., Fernandez et al. 2017; Merle et al. 2017; Traven et al. 2020), or through radial velocity variations with both high- and low-resolution surveys (e.g., Gao et al. 2017; Badenes et al. 2018; Tian et al. 2018). Unresolved binaries with longer period, which exhibit single lines in their spectra, are generally harder to identify, but not impossible. Based on the full spectral fitting technique, El-Badry et al. (2018b) have characterized thousands of main-sequence binaries from the APOGEE spectra, many of them long-period binaries.

Nonetheless, for long-period binaries, the method presented in El-Badry et al. (2018b) is only mostly effective for systems with intermediate mass ratios ($0.4\lesssim q\lesssim 0.85;$ where $q={m}_{2}/{m}_{1}$). It remains a challenge to identify unresolved single-line binaries with higher mass ratios ($q\gtrsim 0.85$). In this regime, identifying binaries via the binary sequence in the H-R diagram for cool stars with ${T}_{\mathrm{eff}}\lesssim 5200$ K is possible (e.g., Gaia Collaboration et al. 2018b; Coronado et al. 2018a; Liu 2019), while this method is not as applicable to the hotter (≳5200 K) stars or giants due to the larger intrinsic variation of luminosity (at a given ${T}_{\mathrm{eff}}$).

Here we present a method of binary identification that tackles this challenging regime (long-period binaries with hot stars), leveraging differences between spectroscopic ${M}_{K{\rm{s}}}$ and geometric ${M}_{K{\rm{s}}}$, or analogously, differences between the spectrophotometric parallaxes deduced from the spectroscopic ${M}_{K{\rm{s}}}$ and the parallaxes determined with Gaia. The method is particularly efficient for identifying single-line binaries with large mass ratios, e.g., binaries with equal-mass components, thus serving as a complement to the aforementioned approaches. A brief introduction on the philosophy of the method has been laid out and applied to LAMOST AFGK stars in Xiang et al. (2019). Here we present a more detailed description based on the application to LAMOST OB stars.

The basic idea is that, because the observed apparent magnitude of an unresolved binary/multiple-star system is brighter than any individual star in the system 14 , we should expect the geometric ${M}_{K{\rm{s}}}$ for binary systems to be brighter than the spectroscopic ${M}_{K{\rm{s}}}$. This is possible because, while the geometric ${M}_{K{\rm{s}}}$ reflects the contributions from both stars faithfully, the spectroscopic ${M}_{K{\rm{s}}}$ mostly reflects the dominant star in the system. To demonstrate the latter, we build an empirical library of mock binary spectra using the LAMOST spectra for single stars. The mock test allows us to compare, and measure the differences between the spectroscopic ${M}_{K{\rm{s}}}$ derived from composites and that from the individual components.

To generate the composite spectra, we scale the fluxes of the LAMOST spectra to the same distance. We restrict our spectral library to stars with robust Gaia parallax ($\varpi /{\sigma }_{\varpi }\gt 10$) and spectral S/N > 100, to ensure high data quality. Accurate spectral flux calibration is necessary to generate realistic composite spectra. For this purpose, we adopt the LAMOST spectra deduced with the flux calibration method of Xiang et al. (2015). The calibrated spectral SEDs have a relative precision of ∼10% in the wavelength range of λ4000–9000 Å. We ensure that, for spectra in our library, the scaled fluxes are consistent with the photometry in all individual passbands, including the Gaia G-band, the XSTPS-GAC, and APASS g, r, and i band. Each of the spectra are further dereddened using the reddening estimates in Section 4, assuming the extinction curve from Fitzpatrick (1999). We assemble a set of binaries by taking an OB star for the primary star and a star of any (O/B/A/F/G/K) spectral type for the secondary. Figure 16 in Appendix B shows a few examples of the composite spectra.

Figure 9 shows the difference between the ${M}_{K{\rm{s}}}$ derived from the composite binary spectra by simply treating them as single-star spectra and the ${M}_{K{\rm{s}}}$ derived from the spectra of the primaries in these composites. In most cases, the spectroscopic ${M}_{K{\rm{s}}}$ for the binary is comparable to that of the single, primary component. As expected, the spectroscopic ${M}_{K{\rm{s}}}$ for equal-mass binary systems are identical to those of the primary component; the normalized spectra of the two component stars are identical. A similar result also applies to binary systems with small mass ratios. In this case, the secondary contributes minimally to the spectrum. Our mock test suggests that their spectroscopic ${M}_{K{\rm{s}}}$ are fainter than that of the primary (by ∼0.2 mag on average). Note that this is consistent with the findings of El-Badry et al. (2018a), which suggest that, for AFGK stars in binaries, the binary spectrum yields a larger $\mathrm{log}\,g$ than the single star. In any case, this effect facilitates the identification of binaries, as the difference between the geometric ${M}_{K{\rm{s}}}$ and spectroscopic ${M}_{K{\rm{s}}}$ would be even larger. In short, our experiment concludes that, unlike geometric ${M}_{K{\rm{s}}}$, spectroscopic ${M}_{K{\rm{s}}}$ mostly reflects the contribution from the dominant stars, regardless of the mass ratio of the binary systems.

Figure 9.

Figure 9. Differences between the spectroscopic ${M}_{K{\rm{s}}}$ estimates derived from mock composite binary spectra vs. those for the primaries of the composites. Red symbols show OB+OB binary systems and the gray symbols are binary systems composed of an OB star and a companion with another (AFGK) spectral type. In most cases, the composite spectra cause the inferred spectroscopic ${M}_{K{\rm{s}}}$ to be slightly fainter than the primary, further facilitating the identification of binaries through the difference between geometric ${M}_{K{\rm{s}}}$ and spectroscopic ${M}_{K{\rm{s}}}$. For example, the geometric ${M}_{K{\rm{s}}}$ for equal-mass binaries are 0.75 mag brighter than their primaries (dashed line in blue), due to the contribution from the secondary. We note that for some binaries composed of a late-B type primary with ${M}_{K{\rm{s}}}\gtrsim 0.5$ mag and an AFGK-type secondary, the spectroscopic ${M}_{K{\rm{s}}}$ could be brighter than that of the primary. As such, the binary identification in this regime is less efficient (see text for details).

Standard image High-resolution image

Nonetheless, we note that for some binary systems composed of an OB-type primary star with ${M}_{K{\rm{s}}}\gtrsim 0.5$ mag and an A/F/G/K-type secondary star, our spectroscopic ${M}_{K{\rm{s}}}$ estimates from the composite spectra could be brighter than the primary by more than 0.2 mag. This is particularly common for systems with a late-B-type primary. For these systems, the Balmer lines of the composite are shallower than the primary (Figure 16). The neural network model predicts a brighter ${M}_{K{\rm{s}}}$ because the model is trained on single OB stars, for which the strength of Balmer lines decreases with increasing temperature, and hence a brighter ${M}_{K{\rm{s}}}$.

To identify binary stars, we compute the spectrophotometric parallax (in milliarcseconds),

Equation (15)

and derive the S/N of the parallax excess of ${\varpi }_{s}$ with respect to the Gaia astrometric parallax ϖ,

Equation (16)

where δϖs and δϖ are the measurement uncertainty of the spectrophotometric parallax and the Gaia parallax, respectively. The δϖs is defined via

Equation (17)

where σs and ${\sigma }_{\mathrm{int}}^{2}$ are the (S/N-dependent) measurement uncertainty and the intrinsic uncertainty, derived in Section 5.3; ${\sigma }_{{m}_{{Ks}}}$ is the photometric uncertainty for the 2MASS Ks magnitude, and ${\sigma }_{{A}_{{Ks}}}$ the uncertainty of the extinction estimate. We adopt a 2σ criterion, i.e., we assign a star to be a binary if ${\rm{S}}/{{\rm{N}}}_{{\rm{\Delta }}\varpi }\gt 2$. In total, 1597 of the 16,002 LAMOST OB stars in our sample (10.0%) are identified as binary stars, 13,257 (82.8%) are marked as single stars, and 1148 (7.2%) stars are unclassified due to a lack of either a Gaia parallax or 2MASS photometric magnitudes.

Figure 10 shows the differences between the spectrophotometric parallax and Gaia parallax for individual LAMOST OB stars. We show results from stars with robust Gaia parallaxes ($\varpi /{\sigma }_{\varpi }\gt 10$) and decent spectral quality (${\rm{S}}/{\rm{N}}\gt 50$). Especially among stars with ${M}_{K{\rm{s}}}\lt 0$ mag, there is a clear positive tail, contributed by binary/multiple-star systems. These stars are overluminous compared to their single star counterparts.

Figure 10.

Figure 10. The normalized differences between the inferred spectrophotometric parallax and the Gaia parallax for individual LAMOST OB stars. Only the results from stars with robust Gaia parallaxes ($\varpi /{\sigma }_{\varpi }\gt 10$) and decent spectral quality (${\rm{S}}/{\rm{N}}\gt 50$) are shown. The histogram in black shows results from stars of all geometric ${M}_{K{\rm{s}}}$, while the red one shows only results for stars with geometric ${M}_{K{\rm{s}}}\lt 0$. Our method is more effective for finding binaries for the latter. The vertical dashed line delineates the criterion adopted for binary identification.

Standard image High-resolution image

Most of the binary stars selected with our method have a small RUWE value (∼1). This partly reflects that our method is efficient for identifying binaries with large mass ratios, especially equal-mass binaries, for which the Gaia RUWE value is small due to negligible wobbles of the light centroids. Viewed in this way, our method complements approaches that identify binaries through large astrometric wobbles quantified by the Gaia RUWE value (e.g., Belokurov et al. 2020).

Nonetheless, a few caveats apply. As discussed above, many binaries with late-B-type primaries (with ${M}_{K{\rm{s}}}\gtrsim 0.5$ mag) can be missed. This is also illustrated in Figure 10, where the positive tail diminishes as we consider the full sample. The quality of the Gaia parallax is another limit to the effectiveness of this method. Because our method relies on the comparison between the spectrophotometric parallax and the Gaia astrometric parallax, the results are less robust in recognizing distant binary systems that have larger Gaia parallax uncertainty.

7. Distance

The distance to the star can be estimated by combining its Gaia parallax with the spectrophotometric distance derived from the spectroscopic ${M}_{K{\rm{s}}}$. We estimate the distance using the Bayesian scheme presented below.

In terms of ${M}_{K{\rm{s}}}$, ${m}_{K{\rm{s}}}$, extinction ${A}_{{K}_{{\rm{s}}}}$, and Gaia parallax ϖ, the probability distribution function of distance d is

Equation (18)

where

Equation (19)

and

Equation (20)

The likelihood function can be written as

Equation (21)

Equation (22)

Equation (23)

The extinction ${A}_{K{\rm{s}}}$ is derived by

Equation (24)

and ${R}_{K{\rm{s}}}$ is the extinction coefficient in the 2MASS Ks passband. We adopt a flat prior P(d), and $P(m,A| d)$. Note that, for binary systems, we adopt the Gaia parallax alone for distance estimation, as their spectrophotometric distances might be biased.

We sample the posterior distribution function (PDF) for individual stars with a fine distance step of 0.1 pc and adopt the mode of the PDF as the distance estimate and the 16th and 84th percentile as the 1σ estimates. We also sample the PDF in logarithmic distance scale. The PDFs in logarithmic distance are close to Gaussian, and we thus adopt the PDF-weighted mean and standard deviation as the estimates of the logarithmic distance and its uncertainty, respectively. The left panel of Figure 11 shows the distribution of distances to our sample of LAMOST OB stars. While the majority of stars are located within 3 kpc from the Sun, a number of them could lie beyond 10 kpc. Figure 11 illustrates the relative distance uncertainty as a function of distance. The median distance uncertainty of the sample is 8%, and the distance uncertainty only increases moderately with distance; the distance uncertainty is about ∼14% at 15 kpc. The figure also shows the distance uncertainty when only the Gaia parallax is adopted. It illustrates that the inclusion of the spectroscopic ${M}_{K{\rm{s}}}$ outperforms Gaia distance estimates for stars farther than ∼1.5 kpc from the Sun, and the improvement becomes critical for stars more distant than about 5 kpc. This is consistent with Shull & Danforth (2019), who found substantial differences between spectrophotometric and parallax distances at $d\gt 1.5$ kpc. Although not shown, we have also checked the distance of Bailer-Jones et al. (2018) for our sample stars and found good consistency for stars with $d\lesssim 2\,\,\mathrm{kpc}$, a regime where their distance estimates are robust and are not dominated by the priors imposed in their studies. Nonetheless, we note that for binaries that are not identified with our method due to large uncertainties in either the geometric and/or spectroscopic ${M}_{K{\rm{s}}}$, the distance estimates may suffer large systematics, which can reach 35% in the case of unidentified equal-mass binaries when only spectroscopic ${M}_{K{\rm{s}}}$ are available.

Figure 11.

Figure 11. Left: distribution of heliocentric distance of the LAMOST OB star sample. Right: Relative distance uncertainty (${\sigma }_{d}/d=\mathrm{ln}10\times {\sigma }_{\mathrm{log}d}$) as a function of distance. Only results for single stars are shown. The solid line delineates the median value of the relative distance uncertainty at various distances. The dashed line delineates the relative distance uncertainty in the case where only the Gaia parallax is adopted to infer the distance. The improvement to distance estimates using the spectroscopic ${M}_{K{\rm{s}}}$ is clearly visible for distant stars.

Standard image High-resolution image

Figure 12 shows the LAMOST OB star sample in the XY plane in Galactic Cartesian coordinates (X, Y, Z). The Sun is assumed to be located at position $X=-8.1\,\,\mathrm{kpc}$, Y = 0 and Z = 0. The figure highlights the wide spatial range $-16\lt X-5\,\,\mathrm{kpc}$, $-4\lt Y\lt 5\,\,\mathrm{kpc}$ covered by the sample. A small number of stars outside these ranges are not shown here. The data exhibit overdensities at approximately the distance of Perseus arm (e.g., at $X=-10\,\,\mathrm{kpc}$ and $Y\simeq -0.3$ kpc), which is about 2 kpc from the Sun (e.g., Xu et al. 2006).

Figure 12.

Figure 12. Spatial distribution of the LAMOST OB star sample in the disk XY plane in Galactic Cartesian coordinates. The plus symbol designates the position of the Sun ($X=-8.1\,\,\mathrm{kpc}$, Y = 0 kpc). The dashed rings delineate constant distances from the Sun in step of 1 kpc.

Standard image High-resolution image

Finally, our estimates of ${M}_{K{\rm{s}}}$, extinction, and distance, and flags for binary and emission lines for the 16,002 LAMOST OB stars are made publicly available. 15 Table 1 presents a summary of the catalog.

Table 1. Descriptions for the Distance Catalog of 16,002 OB Stars in LAMOST DR5 a

FieldDescription
specidLAMOST spectra ID in the format of "date-planid-spid-fiberid"
fitsnameName of the LAMOST spectral .FITS file
raR.A. from the LAMOST DR5 catalog (J2000; deg)
decDecl. from the LAMOST DR5 catalog (J2000; deg)
uniqflagFlag to indicate repeat visits; uqflag = 1 means unique star, uqflag = 2, 3, ..., n indicates the nth repeat visit
 For stars with repeat visits, the uniqflag is sorted by the spectral S/N, with uqflag = 1 having the highest S/N
star_idA unique ID for each unique star based on its RA and Dec, in the format of "Sdddmmss±ddmmss"
snr_gSpectral signal-to-noise ratio per pixel in SDSS g-band
rvRadial velocity from LAMOST (km s−1)
rv_errUncertainty in radial velocity (km s−1)
${M}_{K{\rm{s}}}$ ${M}_{K{\rm{s}}}$ estimated from LAMOST spectra
${M}_{K{\rm{s}}}$_errUncertainty in ${M}_{K{\rm{s}}}$
${M}_{K{\rm{s}}}$_geoGeometric ${M}_{K{\rm{s}}}$ inferred from Gaia parallaxes and 2MASS apparent magnitudes
${M}_{K{\rm{s}}}$_geo_errUncertainty in ${M}_{K{\rm{s}}}$_geo
disDistance at the mode of the distance probability density function (PDF)
dis_lowDistance at the 16th percentile of the cumulative probability distribution function
dis_highDistance at the 84th percentile of the cumulative probability distribution function
logdisPDF-weighted mean logarithmic distance
logdis_errUncertainty in logdis
ebvReddening estimated in this work
ebv_errUncertainty in ebv
snr_dparallaxExcess in spectrophotometric parallax with respect to the Gaia astrometric parallax
binary_flagFlag of binarity; 1 = binary ($\mathrm{snr}\_\mathrm{dparallax}\geqslant 2$), 0 = single ($\mathrm{snr}\_\mathrm{dparallax}\lt 2$), −9 = unknown
em_flagFlag of emission lines; 1 = with emission lines, 0 = no emission lines
gaia_idGaia DR2 Source ID
parallaxGaia DR2 parallax (mas)
parallax_errorUncertainty in gaia_parallax (mas)
parallax_offsetOffset of Gaia parallax according to the offset—G magnitude relation of Leung & Bovy (2019)
pmraGaia DR2 proper motion in R.A. direction
pmra_errorUncertainty in pmra
pmdecGaia DR2 proper motion in decl. direction
pmdec_errorUncertainty in pmdec
ruweGaia DR2 RUWE
J2MASS J-band magnitude
J_errUncertainty in J
H2MASS H-band magnitude
H_errUncertainty in H
Ks 2MASS Ks-band magnitude
Ks_errUncertainty in Ks
X/Y/Z3D position in the Galactic Cartesian coordinates (kpc)

Note.

a Due to multiple visits of common stars, the catalog contains 27,784 entries for 16,002 unique stars. This is slightly different from the original catalog of Liu et al. (2019b), which contains 22,901 spectra entries for 16,032 stars. We have increased the number of spectra by including all repeat visits in the LAMOST DR5 database. All of the spectra can be found on the LAMOST DR5 website.

Only a portion of this table is shown here to demonstrate its form and content. A machine-readable version of the full table is available.

Download table as:  DataTypeset image

8. The Distance to the LB-1 B-star System

LB-1 (LS V +22 25; R.A. = 92fdg95450, decl. = 22fdg82575) is a binary system discovered by Liu et al. (2019a). It was purported to consist of a B-type star that exhibits periodic radial velocity variations with an amplitude of around 50 km s–1 and a period of 78.9 days; the system also exhibits broad emission hydrogen lines that show periodic radial velocity variations of ∼10 km s–1 (Liu et al. 2019a, 2020). Liu et al. (2019a) interpreted LB-1 as a B star orbiting a ${68}_{-13}^{+11}$ ${M}_{\odot }$ black hole as the unseen primary companion, which, if true, would be the most massive stellar-mass black hole ever found. Since its discovery, there have been various controversies about the origin of this system (Abdul-Masih et al. 2020; El-Badry & Quataert 2020, 2021; Eldridge et al. 2020; Irrgang et al. 2020; Liu et al. 2020; Rivinius et al. 2020; Shenar et al. 2020; Simón-Díaz et al. 2020; Yungelson et al. 2020). Alternative explanations have been proposed. In particular, spectral disentangling (Shenar et al. 2020) has all but demonstrated that LB-1 is an SB2 binary with two luminous components: it shows a Be star, with its broad absorption lines and a surrounding emission-line disk, and a luminous hot star with low log g, presumably recently stripped to its current mass of $\sim 1{M}_{\odot }$. This has led Shenar et al. (2020) and El-Badry & Quataert (2020) to conclude that the low mass and high luminosity of the stripped star leads to the high radial velocity variations in the combined spectrum. There is no need, and presumably no room, for a black hole.

Here we investigate the ${M}_{K{\rm{s}}}$ estimates of LB-1. Our catalog contains spectroscopic ${M}_{K{\rm{s}}}$ and distance measurements from 12 individual LAMOST spectra for the LB-1 system, all with ${\rm{S}}/{\rm{N}}\gt 300$. From these, we obtain a spectroscopic ${M}_{K{\rm{s}}}$ of −0.89 ± 0.30 mag, which is fainter than the geometric ${M}_{K{\rm{s}}}$ based on Gaia parallax and 2MASS photometry (Figure 13). Particularly, Gaia eDR3 (Brown et al. 2020) yields a parallax of 0.359 ± 0.030 mas for LB-1. The zero-point offset correction of Lindegren (2020) suggests a zero-point offset of −0.051 mas for LB-1. These lead to a geometric ${M}_{K{\rm{s}}}$ of $-{1.67}_{-0.16}^{+0.15}$ mag, with a significantly better precision compared to the Gaia DR2 value ($-{1.39}_{-0.47}^{+0.39}$ mag).

Figure 13.

Figure 13. Comparison of spectroscopic ${M}_{K{\rm{s}}}$ with geometric ${M}_{K{\rm{s}}}$ for a test star sample with precise Gaia parallax ($\widetilde{\omega }/{\sigma }_{\widetilde{\omega }}\gt 10$). The solid line delineates the one-to-one line, while the dashed line delineates an offset of 0.75 mag to the one-to-one line. The inferred spectroscopic ${M}_{K{\rm{s}}}$ and geometric ${M}_{K{\rm{s}}}$ of the LB-1 system are highlighted by the pink circle with error bars. The plus symbol in pink shows the ${M}_{K{\rm{s}}}$ (−2.9 mag) inferred from the distance in Liu et al. (2019a), which is at odds with our spectroscopic ${M}_{K{\rm{s}}}$. The red dot with error bars shows the LB-1 geometric ${M}_{K{\rm{s}}}$ based on the Gaia eDR3 parallax, which has a significantly smaller error bar compared to the Gaia DR2 result. It shows that LB-1 is a binary system, for which the component stars are comparably bright.

Standard image High-resolution image

These results suggest that the geometric ${M}_{K{\rm{s}}}$ of LB-1 is significantly brighter than the spectroscopic ${M}_{K{\rm{s}}}$ by more than 0.6 mag, implying the LB-1 is a binary system that contains two luminous stars. This is largely consistent with the conclusion of Shenar et al. (2020) and El-Badry & Quataert (2020) and is in tension with the conclusion of Liu et al. (2019a) that LB-1 is a system composed of one luminous B star and a massive stellar-mass black hole.

As an independent check, we adopt stellar parameters for LB-1 derived from the LAMOST spectra by fitting the Kurucz ATLAS12 model spectra (Kurucz 1970, 1993) with The Payne (M. Xiang et al. 2021, in preparation) and evaluate the temperature-weighted gravity (Kudritzki et al. 2020) as a luminosity indicator. Note that, for hot stars, the rotation may play a significant role in determining the luminosity. While here we have ignored the rotation effect due to the lack of accurate rotation measurements for our sample stars. Nonetheless, Figure 14 illustrates that there is a good relation between the inferred spectroscopic ${M}_{K{\rm{s}}}$ and the ${T}_{\mathrm{eff}}$-weighted gravity. The LB-1 stellar parameters are in line with such a relation, leading credence to our spectroscopic ${M}_{K{\rm{s}}}$ estimate of LB-1. In short, our exploration suggests that both the geometric ${M}_{K{\rm{s}}}$ from the Gaia parallax and the spectroscopic ${M}_{K{\rm{s}}}$ from the LAMOST spectra for LB-1 are likely to be robust. The intrinsic difference between the geometric ${M}_{K{\rm{s}}}$ and the spectroscopic ${M}_{K{\rm{s}}}$ could be explained if LB-1 is a binary system with two luminous component stars.

Figure 14.

Figure 14. The temperature-weighted gravity vs. the geometric ${M}_{K{\rm{s}}}$ (the left panel) and the spectroscopic ${M}_{K{\rm{s}}}$ (the right panel). The temperature-weighted gravity is adopted as a "ground-truth" luminosity indicator. Only stars with precise Gaia parallax ($\widetilde{\omega }/{\sigma }_{\widetilde{\omega }}\gt 10$) are shown. The solid line is a second-order polynomial fit to the geometric ${M}_{K{\rm{s}}}$ as a function of the temperature-weighted gravity. The circle with error bar in pink highlights the LB-1 B-star companion. The spectroscopic ${M}_{K{\rm{s}}}$ estimate for LB-1 is perfectly consistent with the temperature-weighted gravity. The geometric ${M}_{K{\rm{s}}}$ is brighter than the prediction from the gravity.

Standard image High-resolution image

9. Summary

In this study, we have presented a data-driven approach for deriving Ks-band absolute magnitudes ${M}_{K{\rm{s}}}$ for OB stars from low-resolution ($R\simeq 1800$) LAMOST spectra. Our method uses a neural network model trained on a set of stars with good parallaxes from Gaia DR2. Applying to a test data set, we find that the neural network is capable of delivering ${M}_{K{\rm{s}}}$ with 0.25 mag precision from the LAMOST OB star spectra. We have also applied the method separately to stars with emission lines in their spectra. The emission-line spectra are identified through comparing the observed spectra and the PCA reconstruction of the spectra.

We verify that the ${M}_{K{\rm{s}}}$ estimated from the composite spectrum of a binary system is comparable to, or slightly fainter than, the ${M}_{K{\rm{s}}}$ of the primary star. This is in contrast to the geometric ${M}_{K{\rm{s}}}$ calculated from Gaia parallaxes, as both components of the binary contribute to the geometric ${M}_{K{\rm{s}}}$. We propose a new method of binary identification, leveraging differences between the spectroscopic ${M}_{K{\rm{s}}}$ and the geometric ${M}_{K{\rm{s}}}$. The method is particularly effective for identifying equal-mass binaries or multiple-star systems, because the geometric ${M}_{K{\rm{s}}}$ of these systems are much brighter than their primaries. Our method is generic and can be applied to any combined astrometric and spectroscopic data beyond this study.

With the spectroscopic ${M}_{K{\rm{s}}}$ determinations, we derive accurate distances to 16,002 OB stars from the LAMOST sample of Liu et al. (2019b). The median distance uncertainty for our sample stars is 8%, and the distance uncertainty for the most distant stars at more than 10 kpc away is about 14%. We present a value-added catalog of OB stars for future studies of the structure and dynamics of the Galactic disk. Besides absolute magnitudes and distances, the catalog presents also emission-line flags and binary flags for the LAMOST OB stars, significantly expanding the number of known emission-line objects and binaries for massive stars, especially those with mass ratios close to unity. Our method yields a spectral ${M}_{K{\rm{s}}}$ of 0.89 ± 0.30 mag from the LAMOST spectra of LB-1. However, the geometric ${M}_{K{\rm{s}}}$ of LB-1 derived from the Gaia parallax, both Gaia DR2 and Gaia eDR3, is significantly brighter than the spectroscopic ${M}_{K{\rm{s}}}$, suggesting that LB-1 is likely a binary system that contains two luminous stars with comparable brightness. This supports the previous conclusion of Shenar et al. (2020) and El-Badry & Quataert (2020), and contradicts Liu et al. (2019a), who argued that LB-1 is a system composed of one luminous B star and a massive stellar-mass black hole.

H.-W.R. acknowledges funding by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation)—Project-ID 138713538—SFB 881 ("The Milky Way System", subproject A03). M.X. is grateful for Dr. Bodem for the successful dental surgery and the attentive care from him during recovery. Y.S.T. is grateful to be supported by the NASA Hubble Fellowship grant HST-HF2-51425.001 awarded by the Space Telescope Science Institute.

This work has made use of data acquired through the Guoshoujing Telescope. Guoshoujing Telescope (the Large Sky Area Multi-Object Fiber Spectroscopic Telescope; LAMOST) is a National Major Scientific Project built by the Chinese Academy of Sciences. Funding for the project has been provided by the National Development and Reform Commission. LAMOST is operated and managed by the National Astronomical Observatories, Chinese Academy of Sciences.

This work has also made use of data from the European Space Agency (ESA) mission Gaia, processed by the Gaia Data Processing and Analysis Consortium (DPAC). Funding for the DPAC has been provided by national institutions, in particular the institutions participating in the Gaia Multilateral Agreement.

Appendix A: Examples of Problematic LAMOST Spectra

As shown in Figure 2 in the main text, there are some stars with large χ2 in the residuals between the LAMOST spectra and the PCA reconstruction. We find that, for these stars, the LAMOST spectra are often problematic due to various reasons, including instrument problems, erroneous wavelength calibration, and other reasons. Figure 15 shows a few examples of the erroneous LAMOST spectra.

Figure 15.

Figure 15. A few typical examples that exhibit abnormal residuals between the LAMOST spectrum and the PCA reconstruction. The top panel shows a spectrum with problematic fluxes over the wavelength range of λ4700–5000 Å, leading to poor PCA reconstruction. The middle panel shows a spectrum with artifacts in the LAMOST spectra across λ5000–7000 Å. The PCA reconstruction is visually reasonable, but the χ2 between the LAMOST and PCA-reconstructed spectra is suboptimal. The bottom panel shows a spectrum that has a problematic wavelength calibration from the LAMOST pipeline in the wavelength range of λ5800–7000 Å.

Standard image High-resolution image

Appendix B: Examples of Mock Binary Spectra

As discussed in Section 6, in order to study the spectroscopic ${M}_{K{\rm{s}}}$ derived from composite spectra, we build an empirical library of binary spectra using the LAMOST spectra of single stars. Figure 16 shows a few examples of our mock composite spectra as well as their inferred spectroscopic ${M}_{K{\rm{s}}}$. The spectroscopic ${M}_{K{\rm{s}}}$ estimates of the binary spectra typically agree with those from the spectra of the primary stars, with a difference of ≲0.2 mag.

Figure 16.

Figure 16. A few typical examples of mock composite spectra. We focus on the wavelength range of λ3800–5100 Å. The spectra of the primary, secondary, and binary are shown in black, gray, and red, respectively. For all these cases, the primary is a B-type star. The secondary are B-, A- and G-type stars from the top to bottom, respectively. The spectroscopic ${M}_{K{\rm{s}}}$ estimates applied to the primary, secondary, and composite binary spectra are also shown in the figure. The spectroscopic ${M}_{K{\rm{s}}}$ estimate of the binary spectra typically agrees with the one from the primary spectra, with a difference ≲0.2 mag.

Standard image High-resolution image

Footnotes

  • 11  
  • 12  

    We note that there could be multiple visits for the same star in the LAMOST database. For those stars, we adopt the results from the visit with the highest signal-to-noise ratio (S/N).

  • 13  

    A detailed explanation on the RUWE can be found in the public DPAC document from L. Lindegren, titled "Re-normalizing the astrometric chi-square in Gaia DR2", via https://www.cosmos.esa.int/web/gaia/public-dpac-documents (at the bottom of the page).

  • 14  

    In general, the light centroids of binaries exhibit only little variation, resulting in only minor systematics in the astrometry, especially for binaries with similar mass companions.

  • 15  

    The catalog is published online as a machine-readable version of Table 1. It can also be accessed via a temporary link at https://keeper.mpdl.mpg.de/f/56d86145cfb0417eb8a8/?dl=1.

Please wait… references are loading.
10.3847/1538-4365/abd6ba