Introduction

The genus Gemella has been described for the first time by Berger 1961 (Berger). Members of this genus are usually Gram-positive, coccoid-shaped, facultatively anaerobic, and do not produce any catalase activities (Collins 2006). The first species of this genus such as Gemella morbillorum and Gemella haemolysans are commensals of mucous membranes of humans but are sometimes responsible for human infections (Kilpper-Bälz and Schleifer 1988).

Gemella bergeri and Gemella sanguinis were recovered from human clinical specimens (Collins et al. 1998a, b), whereas Gemella palaticanis was isolated from a dog (Collins et al. 1999). Although the pathogenicity of members of this genus is not yet proven, it seems likely that they are also residents of the mucous membranes.

During a project on the human microbiota, we studied sputum samples by culturomics as previously described (Lagier et al. 2018) which allowed us to isolate a new bacterial strain belonging to the phylum Firmicutes. Herein, we report a taxonogenomic description (Fournier et al. 2015) of Gemella massiliensis sp. nov., which is previously announced by our research group (Fonkou et al. 2018).

Materials and methods

Growth conditions

A bacterial strain was isolated from a sputum sample from a healthy Frenchman by culturomics to explore the human microbiome. The study was approved by the ethics committee of the Institut Federatif de Recherche IFR48 under the number 09-022 and then the patient gave his formal agreement by signing the informed consent. Thus optimal growth conditions of strain Marseille-P3249 were evaluated using various culture conditions. Culture assays were done at 28, 37, 45 and 55 °C under anaerobic (GENbag anaer, bioMérieux), microaerophilic (GENbag Microaer, bioMérieux) and aerobic conditions. Tolerance to acidity and halotolerance were evaluated independently with growth assays at pH 6, 6.5, 7 and 8.5 and by using 0, 5, 10, 50, 75 and 100 g/L NaCl concentrations, respectively.

Morphological, biochemical and antibiotic susceptibility analysis

The main biochemical features of strain Marseille-P3249T were tested using API strips (ZYM, 50CH and 20A (bioMérieux, France)). Motility and Gram stain were checked using a DM1000 photonic microscope (Leica Microsystems, Nanterre, France). Additionally, sporulation was evaluated after exposing a bacterial suspension to a 20 min heat shock at 80 °C. Cell morphology images were obtained using a scanning electron (SEM) microscope (TM4000 Plus, Hitachi High-Technologies Corp., Tokyo, Japan).

Cellular fatty acid methyl ester (FAME) analyses were performed with GC/MS with 10 mg of bacterial biomass per tube. GC/MS and FAME analyses were performed as previously reported (Elsawi et al. 2017).

The minimal inhibitory concentrations (MIC) of strain Marseille-P3249 were evaluated using Etest (bioMérieux) for benzylpenicillin, amoxicillin, cefotaxime, ceftriaxone, imipenem, erythromycin, daptomycin, amikacin, rifampicin, minocycline, teicoplanin, vancomycin, metronidazole, and colistin.

DNA extraction and genome sequencing

A total of 82.1 ng/µL of genomic DNA (gDNA) were extracted from strain Marseille-P3249 as previously described (Elsawi et al. 2017). gDNA was sequenced using the MiSeq technology (Illumina Inc, San Diego, CA, USA) with the Mate-pair strategy and were run and barcoded with 11 additional projects using the Nextera Mate-Pair sample prep kit (Illumina) as formerly described (Elsawi et al. 2017). The DNA fragment size ranged from 1.5 kb up to 11 kb with an optimal size of 6.29 kb. No size selection was done and 177.24 ng of tagmented fragments were circularized. The circularized DNAs were sheared mechanically to smaller fragments with an optimal size at 1393 bp on the Covaris device S2 in T6 tubes (Covaris, Woburn, MA, USA). Using a high sensitivity bioanalyzer LabChip (Agilent Technologies Inc, Santa Clara, CA, USA), the library profile was visualized with a final concentration of 15.59 nmol/L. The latter were normalized at 2 nM and pooled with other samples, and finally diluted to 15 pM. Automated cluster generation and sequencing run were performed in a single 2 × 251-bp run. Total information of 9.5 Gb was obtained from a 1050 K/mm2 cluster density with a cluster passing quality control filters of 92.5% (18,644,000 passing filter paired-reads). Within this run, the index representation for strain Marseille-P3249T was determined to 4.67%. The 870,362 paired reads were trimmed, assembled, annotated and analyzed.

Genome-to-Genome Distance Calculator (http://ggdc.dsmz.de) was used for digital DNA–DNA hybridization (dDDH) estimates with confidence intervals under recommended settings (Formula 2, BLASTP).

Phylogenetic analysis

For phylogenetic analyses, 16S rRNA gene sequences of closely related species were recovered from the Genbank database (https://www.ncbi.nlm.nih.gov/genbank/). Muscle was used for sequence alignment and phylogenetic inferences were generated using the approximately-maximum-likelihood method within the FastTree software (Edgar 2004; Price et al. 2009). In addition, a phylogenetic tree based on housekeeping genes such as groES, groEL, recA, gyrA, and rpoB was performed using iTOL software online (https://itol.embl.de/). Genes are extracted from annotated genomic sequences and then concatenated for each strain.

Results and discussion

Strain identification

MALDI-TOF MS failed to identify strain Marseille-P3249T. Therefore, 16S rRNA gene sequencing was performed and using a blast comparison against the NCBI nucleotide database, strain Marseille-P3249T exhibited a 98.3% sequence similarity with Gemella bergeri strain 617-93T, being the phylogenetically closest species with standing in nomenclature (Fig. 1) (Collins et al. 1998a). Thus, and according to Kim et al., this strain may be classified within a new bacterial species within the Gemella genus as it exhibits more than 1.35% sequence divergence with its phylogenetically closest species with a validly published name (Kim et al. 2014). Furthermore, the MLSA tree performed with concatenated genes shows that G. massiliensis strain Marseille-P3249T is positioned within the Gemella species but is clearly distinct from them on a single branch (Fig. 2).

Fig. 1
figure 1

16S rRNA gene sequence phylogenetic analysis highlighting the position of strain Marseille-P3249 relative to other species. This tree is formally already published but it was remade with slight changes (Fonkou et al. 2018). Sequence alignment and phylogenetic inferences were obtained using the maximum likelihood method within MEGA 7 software. The scale bar represents a 2% sequence divergence using 1000 replicates. GenBank accession numbers are indicated in parenthesis

Fig. 2
figure 2

Neighbour-joining tree displaying the relationships among species of the genus Gemella based on concatenated groES, groEL, recA, gyrA, and rpoB sequences

General characteristics of strain Marseille-P3249

Cells from strain Marseille-P3249T were Gram-positive cocci. Colonies grew in optimally at 37 °C in aerobic conditions with pH range between 6 and 8.5 and NaCl concentrations below 50 g/L and measured from 0.5 to 1.2 mm in diameter after 24 h of incubation. Cells were not motile and non-spore forming with a mean diameter of 0.78 µm. They metabolize d-fructose, amygdalin, and l-sorbose possessed enzymes such as esterase, leucine arylamidase, and naphthol-AS-BI-phosphohydrolase. Biochemical criteria of strain Marseille-P3249T are compared with those of closely related species in standing in nomenclature (Table 1).

Table 1 Differential characteristics of Gemella massiliensis strain Marseille-P3249T (GMA), Gemella bergeri 617-93T (GBE) (Collins et al. 1998a), Gemella assaccharolytica EU427463T (GAS) (Ulger-Toprak et al. 2010), Gemella cuniculi AJ251987T (GCU) (Hoyles et al. 2000), Gemella morbillorum L14327T (GMO) (Kilpper-Bälz and Schleifer 1988), and Gemella sanguinis Y13364T (GSA) (Collins et al. 1998b)

The major fatty acids were hexadecanoic acid (34%), 9-Octadecenoic acid (28%), octadecanoic acid (15%) and 9,12-octadecadienoic acid (13%). A wide variety of other fatty acids were described but present with low amounts (Table 2).

Table 2 Cellular fatty acid composition (%)

Strain Marseille-P3249 exhibited MICs with benzylpenicillin (0.012 µg/mL), amoxicillin (0.016 µg/mL), cefotaxime (0.016 µg/mL), ceftriaxone (0.016 µg/mL), imipenem (0.016 µg/mL), erythromycin (0.19 µg/mL), daptomycin (> 6 µg/mL), amikacin (0.125 µg/mL), rifampicin (0.03 µg/mL), minocycline (0.64 µg/mL), teicoplanin (0.032 µg/mL), vancomycin (0.75 µg/mL), metronidazole (> 256 µg/mL) and colistin (> 256 µg/mL).

Genome characteristics of strain Marseille-P3249

The genome was 1,804,813 bp long with a 30.5 mol% G + C content (Fig. 3). It is composed of 7 scaffolds (composed of 8 contigs). Of the 1727 predicted genes, 1677 were protein-coding genes and 50 were RNAs (5 genes were 5S rRNA, 2 genes were 16S rRNA, 2 genes were 23S rRNA, and 41 genes were tRNA genes). A total of 1 276 genes (76.09%) were assigned a putative function (by cogs or by NR blast). Twenty-six genes were classified as ORFans (1.55%). The remaining genes were annotated as hypothetical proteins (304 genes (18.13%)). The distribution of genes into COG functional categories is detailed in supplementary Table S1.

Fig. 3
figure 3

Graphical circular map of the chromosome. From outside to the center: genes on the forward strand colored by COG categories (only genes assigned to COG), genes on the reverse strand colored by COG categories (only gene assigned to COG), RNA genes (tRNAs green, rRNAs red), GC content and GC skew

Genome comparison

The draft genome sequence of strain Marseille-P3249T was larger than those of Gemella cuniculi DSM 15828T, Gemella sanguinis ATCC 700632T and Gemella haemolysans ATCC 10379T, but smaller than those of Gemella asaccharolytica WAL 1945JT, Gemella bergeri 617-93T and Gemella morbillorum NCTC11323T (Table 3).

Table 3 Genome information of the species involved in the genomic comparative analyses

Additionally, the G + C content of strain Marseille-P3249T is smaller than those of G. asaccharolytica WAL 1945JT, G. cuniculi DSM 15828T, G. sanguinis ATCC 700632T and G. bergeri 617-93T, but larger than those of G. morbillorum NCTC11323T and G. haemolysans ATCC 10379T. In the same way, the gene content of strain Marseille-P3249T was compared with the closely related Gemella species.

Strain Marseille-P3249T shared the highest number of orthologous proteins with G. cuniculi (1039). Furthermore, this bacterium shared 1031, 1032, 1054, and 778 orthologous proteins with G. haemolysans, G. morbillorum, G. sanguinis and G. asaccharolytica, respectively. Strain Marseille-P3249T exhibited the highest OrthoANI values of 94.8% with G. bergeri and 70.3% as the lowest value G. asaccharolytica (Fig. 4). dDDH values obtained during analysis were not exceeded 59.7% between G. massiliensis strain Marseille-P3249 and G. bergeri (Table 4). Based on DDH values below 70%, the recommended threshold for delineating a new species(Wayne 1988; Tindall et al. 2010), we consider this strain Marseille-P3249 to be a new species of the genus Gemella.

Fig. 4
figure 4

Heatmap generated with OrthoANI values calculated using the OAT software between Gemella massiliensis sp. nov., strain Marseille-P3249 and other closely related species with standing in nomenclature

Table 4 Digital DNA–DNA hybridization values (%) obtained by strain Marseille-P3249T with other closely-related species using the GGDC formula 2 software (dDDH estimates based on identities/HSP length)

Description of Gemella massiliensis sp. nov.

Gemella massiliensis (mas.si.li.en’sis. L. fem. adj, massiliensis, pertaining to Massilia, the Latin name of the city of Marseille, where this bacterium was discovered). Strain Marseille-P3249T is a facultative anaerobic bacterium but grows optimally at 37 °C under aerobic conditions. Using a 50 CH strip, this strain exhibits positive reactions for d-fructose, amygdaline, and l-sorbose. Positive reactions are also observed for esterase (C4), esterase lipase (C8), leucine arylamidase, phosphatase acid, and naphthol-AS-BI-phosphohydrolase. In addition, using an API 20A (bioMérieux), positive reactions are observed for esculin ferric citrate only. The genome is 1.80 Mbp with 30.5 mol% G + C content.

The type strain Marseille-P3249T (= CSURP3249 = DSM103940) was isolated from the sputum sample of a healthy French man.

The 16S rRNA and whole-genome sequences of G. massiliensis sp. nov., were deposited in EMBL-EBI under accession numbers LT628479 and FQLS00000000, respectively.