1 Introduction

Oral diseases such as dental caries, tooth decay, periodontal disease, and tooth loss are among the most serious healthcare issues that cause significant medical and economic burdens in our lives (Peres et al. 2019). In 2017, approximately 2.3 billion people, including 530 million children, were affected by dental caries, which cause pain, tooth loss, and, in extreme cases, death (James et al. 2018). Many children, particularly those in developing countries, continue to suffer from oral diseases such as dental caries, gingivitis, and dental malocclusion (Giriraju and Lakshminarayan 2017). In most cases, dental caries can be effectively addressed by regularly monitoring and managing dental health (World Health Organization 2020).

However, because dentists and dental clinics are not evenly distributed in most countries, a significant number of potential dental patients do not have adequate access to dental clinics and services (Peres et al. 2019). Moreover, because the economic and medical burdens of odontopathy increase exponentially with its severity, home dental care systems for checking and managing dental conditions are among the most in-demand services in several countries for individual dental treatment (World Health Organization 2020). The home dental care system has the potential to increase access to preventive oral health services, thereby reducing disease disparities.

Furthermore, the global pandemic caused by the coronavirus disease outbreak in 2019 has severely hampered public’s access to routine dental care (Yakubov et al. 2020). Appointment availability at private medical and dental practices has been severely reduced as practices have closed their doors, in response to stay-at-home orders and in response to reduced clinical volume (Contrera 2020). In these circumstances, dental care delivered through home dental care system can help to promote public’s health and quality of life, and the demand for home dental care systems will increase.

In light of this, a number of innovative and unique technologies, such as the emergence of the Internet of Things (IoT) and artificial intelligence technologies, have been applied to the operation of home dental care systems to enhance their convenience (Hwang et al. 2019). In line with this trend, the current study introduces a new smart system that uses state-of-the-art deep learning technologies to accelerate home dental care services.

Our main contributions of this paper are summarized as follows:

  • We propose a home dental care system consisting of an oral image acquisition device, deep learning models and mobile controller. The system allows users to easily take general oral/teeth images and can contribute to home dental management by facilitating the early detection of tooth problems in daily life.

  • We construct a dataset for tooth disease. The dataset contains 5251 tooth ROI images classified as normal or diseased, extracted from 610 maxillary and mandibular teeth images of 305 subjects. The collected images were annotated pixel-wise and evaluated by two dental professionals.

  • The experimental results using our dataset show that our system has accuracies greater than 96% and 89% in the tooth disease and NPDT classifications, respectively. Based on these results, the system has the potential to more effectively manage users’ dental healthcare by providing effective and appropriate dental information.

The remainder of this paper is organized as follows: an overview of previous studies is first presented, followed by an introduction to the new system. Next, both the study methodologies and the results are described. Finally, the academic and practical implications of this study, and some notable limitations are discussed.

2 Literature review

2.1 Smart system

Several studies on smart systems in fields such as interdisciplinary security  (Sangaiah et al. 2019b), energy (Sangaiah et al. 2019a), and healthcare (Chen et al. 2018), have been conducted to address industry-related problems, making our lives easier. Dimitrov (2016) presented a framework to support smart healthcare systems in a cloud environment that is based on Internet of Things, big data, and smart sensors. Furthermore, they argued that a smart healthcare system can help users avoid disease and live healthier lives in general. Smart healthcare systems, in particular, are emerging in the field of dentistry for diagnostic and treatment purposes. Balaji Ganesh and Sugumar (2021) introduced the concept of smart healthcare systems, as well as their benefits and drawbacks, and their potential role in dentistry in the future. They argued that in the coming decades, the smart healthcare system will play a paramount role in the clinical advancement of diagnosis and in the management of various oral diseases.

2.2 Deep learning in the field of dentistry

Since the introduction of artificial intelligence and deep learning technologies, notable approaches to address recent dental issues at the clinic level have been proposed (Hwang et al. 2019). The majority of these approaches employ machine learning methodologies such as convolutional neural networks (CNNs) on collected tomographic and radiographic image datasets to assist dentists in providing dental treatments (Park and Park 2018). Miki et al. (2017) applied the AlexNet to classify tooth types in dental cone-beam computed tomographic images (CT images), based on which they performed dental filing procedures for forensic identification. Based on 52 CT cases divided into 42 training and 10 testing cases, the trained network could classify seven tooth types with and accuracy of 88.8%.

Vinayahalingam et al. (2019) employed deep learning approaches to assess potential risks such as nerve damage and subsequent sensory disturbances in dental patients their third molars were removed. U-Net, which was developed using 81 dental panoramic radiographic inspections, was suggested as a deep learning approach. This CNN-based deep learning model was designed to detect and segment third molar (M3) roots and inferior alveolar nerves (IANs), achieved 94.7% (M3) and 84.7% (IAN) dice-coefficients.

As a representative example, Kim et al. (2019) presented a CNN-based transfer learning method, DeNTNet, in an automated diagnostic support system for detecting periodontal bone losses in panoramic dental radiographic images. The trained DeNTNet achieved a 75% F1-score across 12,179 radiographic images, which was higher than the average performance of dentists (69%). Chen et al. (2019) proposed a CNN-oriented tooth detection model for detecting tooth types in 1250 dental periapical X-ray images. The model outperformed a junior dentist in the divided datasets of 800 training and 250 testing images, achieving more than 90% precision and recall with approximately 91% mean intersection over union (IoU) between the detected boxes and ground truths.

Several researchers have recently attempted to use CNNs in deep learning applications for classifying, detecting, and segmenting dental radiographic and near-infrared transillumination images. Ali et al. (2016) used deep neural networks with stacked sparse auto encoders and softmax methods to detect and identify dental caries in X-ray images, achieving 97% precision and recall levels. Prajapati et al. (2017) used a CNN and a VGG neural network to classify tooth disease types, achieving an overall accuracy of approximately 88% across 251 radio visiographic (RVG) X-ray images. Casalegno et al. (2019) proposed a CNN-based model for semantically segmenting the detection and localization of dental lesions in near-infrared transillumination light images. On a five-class segmentation task, the proposed model with 185 images achieved a 72.7% overall mean IoU score. Although previous studies have found significant potential for deep learning approaches in dental research, the majority of these studies have focused on how these approaches can help and collaborate with dentists. Moreover, they used medical datasets that can only be provided by specialized medical equipment.

Because radiographic images are not available for home dental care, several researchers have attempted to analyze general oral images for dental disease detection rather than radiographic images. Liu et al. (2019) presented a smart dental health IoT system built using portable hardware, deep learning, and mobile terminal modules, as well as a MASK R-CNN diagnosis model with 10,080 images. The model can detect and classify seven different tooth diseases, including decayed teeth, dental plaque, fluorosis, and periodontal disease, with approximately 90% model accuracy across 2520 test images. Moreover, the authors demonstrated both academic and practical possibilities for home-based dental care services. However, exact images of each tooth were required. This implies that more patient-friendly dental management approaches should be developed. As a result, the current study introduces a smart home dental care system that includes mobile devices and applications, as well as deep learning techniques. The system allows users to easily take general oral/teeth images and detect tooth diseases from the images. The system can contribute to home dental management by facilitating early detection of tooth problems in our daily lives.

3 Study method

3.1 Smart home dental care system

The proposed smart home dental care system is divided into multiple components, which include an oral image acquisition device, data collection/annotation, teeth detection via a region of interest (ROI) detection network, and classifications of tooth disease and the need for professional dental treatment (NPDT). First, an oral image acquisition device was designed and developed to obtain images of maxillary and mandibular teeth. To create the training dataset, the device was used to capture images of a patient’s maxillary and mandibular teeth at a dental clinic in Seoul, Korea. The collected images were annotated pixel-wise and evaluated by a dentist with a Doctor of Dental Surgery (DDS) and another with a Doctor of Dental Medicine (DMD). We then trained a ROI detection network model to extract the image’s teeth region. Using the model’s results, a tooth disease model and an NPDT model were trained to predict the presence of dental diseases on each tooth and to recommend whether a visit to a dental clinic for treatment was necessary. Finally, Poisson image blending techniques were applied to distinguish the areas that required dental treatments (Pérez et al. 2003). The detailed procedures are described in the following sections.

Fig. 1
figure 1

Printed circuit board and system diagram of the oral image acquisition device

3.2 Oral image acquisition device

The oral image acquisition device was developed with the following design goals in mind for patients to take pictures of their maxillary and mandibular teeth: (1) image generation of all the maxillary teeth in one take, and all the mandibular teeth in another, (2) confirmation of the occlusal surface of each tooth in the generated images, (3) thin thickness and narrow diameter considering the patient’s oral cavity, (4) about \(120^{\circ }\) viewing angle and high-definition resolution (1280 \(\times\) 720) with a focal length of 20 mm, (5) biocompatibility to prevent harmful effects and waterproofing for demisting.

The oral image acquisition device is consists primarily of a camera module and the control board. The printed circuit board is integrated with a low-power and ultra-compact complementary metal oxide semiconductor (CMOS) sensor  (OmniVision 2017), light emitting diodes (LEDs), and a macro lens to form a module. The main board, which includes a processing chip, memory, switches, LEDs, and power management, performs functions such as interface control using a USB video class (UVC) driver (The Linux Kernel documentation 2021) and task processing. The system architecture and components are shown in Fig. 1.

3.3 Data collection and annotation

Because there are no standard image datasets for tooth disease, we collected 610 maxillary and mandibular teeth images of 305 subjects from a private dental clinic in Korea and removed all identifiable patient information from the images. Informed consent was obtained from all patients who provided their tooth images before collection.

We generated 8140 tooth images for analysis using the teeth ROI detection deep learning model. After removing damaged and low-resolution images, the remaining 5251 tooth ROI images were classified into two types: normal tooth and diseased tooth (e.g., occlusal caries, proximal caries, and cavitation corrosion). Dental professionals with more than five years of dental clinical experience performed pixel-wise annotation of each class (type) and provided corresponding class numbers as the ground truth (GT). The 5251 annotated images were randomly divided into three groups: 3357 for the training dataset (63.9%), 946 for the validation dataset (18.0%), and 948 for the testing set (18.1%). Table 1 presents an overview of the datasets collected. All datasets were anonymized and processed in accordance with the guidelines of Sungkyunkwan University’s Institutional Review Board (IRB).

Table 1 A summary of the collected data

3.4 Deep learning model for teeth ROI detection, tooth disease and NPDT classification

In this subsection, we describe the details of the proposed end-to-end pipeline for teeth ROI detection, tooth disease and NPDT classification. The overall pipeline of the teeth ROI detector, tooth disease, and NPDT classifiers is presented in Fig. 2.

The variability of the visual and contextual information in each tooth due to the diversity of the patients’ oral environments created significant difficulties in classifying whether each tooth had specific diseases and whether there was NPDT from all images of the maxillary and mandibular teeth. This impeded the classification of tooth diseases. Thus, it was essential to extract the distinguishable area of each tooth from the entire set of images. To meet this requirement, we trained a detection network model to automatically extract the ROI of the teeth.

Fig. 2
figure 2

The overall pipeline of the proposed teeth ROI detector, and tooth disease and NPDT classifiers

The overall architecture of the teeth ROI detector, tooth disease, and NPDT classifiers is presented in Fig. 3. We used RetinaNet (Lin et al. 2017b) deep learning detection network to detect accurately and rapidly the ROI of the teeth. Both ResNet (He et al. 2016) with 152 layers and a feature pyramid network (FPN) (Lin et al. 2017a) were used in the backbone, while the off-the-self Pytorch RetinaNet convolutional networks were used to extract potential features from the entire set of images. To train, validate, and test the teeth ROI detection network, we used 430, 90, and 90 images of maxillary and mandibular teeth for the training, validation, and test sets, respectively. All images were pre-processed by Multiscale Retinex with a color restoration (MSRCR) (Petro et al. 2014) algorithm for adaptively enhancing local image contrast. All images were then adjusted to a resolution of 1024 \(\times\) 800. Both the focal loss (Lin et al. 2017b) and smooth L1 regularization (Girshick 2015) methods were used in the teeth ROI detection network. The teeth ROI detection network was trained over 50 epochs using the Adam optimizer (Kingma and Ba 2015) with an initial learning rate of 0.0001 and early stopping due to hyper-parameter tuning with a batch size of 4.

Fig. 3
figure 3

The overall architecture of the proposed teeth ROI detector, and tooth disease and NPDT classifiers. The detector was based on a RetinaNet with a ResNet152 backbone (a) and a feature pyramid network (b). Subnetworks (c) were fine-tuned with maxillary and mandibular teeth images. All the teeth ROIs were detected (e) using NMS (d) was used to select one entity (e.g., bounding boxes) out of many overlapping entities. To classify each tooth, the ROI of the teeth was cropped (f) from the detected images. Both tooth diseases and NPDT were classified using a ResNeXt network with a convolutional layer, pooling layer, batch normalization, and ResNeXt blocks. The results were classified as normal or diseased teeth (g), and NPDT was also classified

After training, the teeth ROI detection model in RetinaNet generated a total of 8140 tooth ROIs from 610 datasets containing both maxillary and mandibular teeth. The 5251 tooth ROI images were divided into 4804 normal teeth and 447 diseased tooth images and adjusted to 224 \(\times\) 224 resolution for training the tooth disease and NPDT classifications. The tooth disease images were classified into three types (Table 1).

Tooth disease and NPDT classification layers were used. Because the system is designed to assist users with no dental knowledge in independently determining whether dental treatments are required, the classification results from the amalgamated maxillary and mandibular teeth images are presented to the users. Based on these images, users can easily visually identify any teeth that require dental treatment, as well as their locations and numbers.

To determine the tooth disease NPDT, a ResNeXt (Xie et al. 2017) with 101 convolutional layers, a cardinality of 32, and a bottleneck width of 9 was trained, validated, and tested with 3357, 946, and 948 images, respectively. All images were augmented by 50% probability of random horizontal and vertical shifts, an 80–120% scale adjustment, and color jitter with modulations of the brightness, contrast, and saturation in the 80–120%, 70–130%, and 70–130% ranges, respectively. We then used a focal loss function to correct the image imbalance (Lin et al. 2017b) in the tooth disease classification model. Both the tooth disease and NPDT models were trained over 300 epochs using stochastic gradient descent (Ruder 2016) with an initial learning rate of 0.003, momentum of 0.9, and early stopping as a result of hyper-parameter tuning with a batch size of 16.

3.5 Baseline models

Three widely used Object Detection models were used as our baseline models for Teeth ROI Detection: YOLOv3, Faster R-CNN, and RetinaNet.

  • YOLOv3: is a real-time, single-stage object detection model. It adopts a Darknet-53 backbone network with residual connections and three different scales to extract features from Redmon and Farhadi (2018).

  • Faster R-CNN: is a two-stage object detection model that utilizes a region proposal network with the CNN model. Both ResNet with 101 layers and FPN were used in the backbone network (Ren et al. 2015).

  • RetinaNet: is a single-stage object detection model that utilizes a focal loss function to address class imbalance during training. Both ResNet with 152 layers and FPN were used in the backbone network (Lin et al. 2017b).

As our baseline models for Tooth Disease and NPDT, we used four widely classification models: ResNet, Shufflenet V2, DenseNet, and ResNext.

  • ResNet: is a CNN that learns residual functions with reference to the layer inputs, instead of learning unreferenced functions. we stacked 101 layers using these residual blocks (He et al. 2016).

  • Shufflenet V2: is a CNN designed specifically for mobile devices. It utilizes pointwise group convolutions, bottleneck-like structures, and a channel shuffle operation to reduce the computational cost while maintaining accuracy (Ma et al. 2018).

  • DenseNet: is a CNN that efficiently utilizes Dense Blocks to connect all layers directly with each other to feed-forward own feature-maps to all subsequent layers (Huang et al. 2017).

  • ResNext: is a CNN that repeats a building block that aggregates a set of transformations with the same topology. We stacked 101 layers using these ResNext blocks with a set of 32 transformations (Xie et al. 2017).

3.6 Evaluation matrices

The tooth ROI detection model was tested on the maxillary and mandibular dental arch datasets. In particular, the AveragePrecision (AP) (Everingham et al. 2010) evaluation metric (Eq. 1) was used to evaluate the model performance across 90 images. The AP summarizes the area under the precision-recall curve (Davis and Goadrich 2006), and is defined as the mean precision at a set of 11 equally spaced recall levels:

$$\begin{aligned} {AP=\frac{1}{11}\sum _{r\in \left\{ 0,0.1,\ldots \ldots 1\right\} }P_{interp}(r)} \end{aligned}$$
(1)

The \(P_{interp}(r)\) is interpolated by taking the maximum precision measured for a method for with recall values that exceed r:

$$\begin{aligned} {P_{interp}(r)=\max _{{\tilde{r}}:{\tilde{r}}\ge r}p({\tilde{r}})} \end{aligned}$$
(2)

where \(p(\tilde{r})\) is the measured precision at recall \(\tilde{r}\).

The tooth disease and NPDT models were both evaluated using datasets containing 447 images and four evaluation metrics (Hossin and Sulaiman 2015): precision (Eq. 3), recall (Eq. 4), F1-score (Eq. 5), and accuracy (Eq. 6). TP, FN, TN, and FP represent true positive, false negative, true negative, and false positive, respectively.

$$\begin{aligned} Precision= & {} \frac{TP}{TP + FP} \end{aligned}$$
(3)
$$\begin{aligned} Recall= & {} \frac{TP}{TP + FN} \end{aligned}$$
(4)
$$\begin{aligned} F1{\text {-}}score= & {} \frac{2 \times precision \times recall}{precision + recall}, \end{aligned}$$
(5)
$$\begin{aligned} Accuracy= & {} \frac{TP + TN}{TP + FN + TN + FP} \end{aligned}$$
(6)

4 Results

Tables 2 and  3 summarizes the results of the teeth ROI detection and tooth disease classification models, respectively. In comparison to other teeth ROI detection models, RetinaNet achieved the highest 97.26% AP, demonstrating superior performance.

Among the four baseline tooth disease classification models, ResNext achieved the highest 98.09% precision, 97.86% recall, and 97.98% F1-score for the non-tooth disease class, respectively. Furthermore, ResNext achieved the highest 83.49% precision, 85.05% recall, and 84.26% F1-score for the tooth disease class, respectively.

Table 2 Model evaluation of teeth ROI detection
Table 3 Model evaluation of tooth disease classification

The NPDT model, which indicates whether a patient has a tooth disease and should visit a dental clinic, was then evaluated. Table 4 provides the summary of the NPDT evaluation results. Among the four baseline NPDT classification model, ResNext achieved the highest 97.14% precision, 82.93% recall, and 92.24% F1-score for the non-NPDT class, respectively. Furthermore, ResNext achieved the highest 83.33% precision, 97.22% recall and 89.74% F1-score for the NPDT class, respectively. The proposed models can thus be used to detect potential users suffering from tooth diseases. Moreover, users who need to visit dental clinics can be assisted by the proposed system.

Table 4 Model evaluation of NPDT classification

5 Discussion

We proposed a smart home dental care system comprising an oral image acquisition device and a deep learning architecture. The trained deep learning models showed more than 97% accuracy in teeth ROI detection, as well as 96% and 89% accuracy in tooth disease and NPDT classifications, respectively. The system evaluation metrics demonstrate the systems’ potential for efficiently managing patientss’ dental healthcare by providing effective and appropriate dental information.

As presented in Table 4, the results indicate that the proposed system can detect whether there is a tooth disease and classify whether professional dental treatments are required using general oral images instead of radiographic images. The results suggest the potential and possibility of using mobile camera modules in the dental industry, which currently uses radiographic images to determine whether a specific tooth disease exists.

Oral diseases, such as dental caries, periodontal disease, and tooth loss, not only cause inconvenience for people’s daily lives, but also place significant strains on national healthcare systems and services. Fortunately, most dental diseases are treatable when they are detected early  (World Health Organization 2020). Prior research on the applications of deep learning approaches to dental problems  (Chen et al. 2019; Vinayahalingam et al. 2019; Kim et al. 2019) has focused on assisting dentists in patient screening. Thus, most of these approaches utilized radiographic images of dental patients, which are not easily accessible in the patients’ daily lives.

The main goals of the current study were to allow potential dental patients to know their dental conditions, to detect tooth diseases in their daily lives, and to judge whether professional dental treatments are required at the right time. To meet these goals, a specially designed oral image acquisition device for maxillary and mandibular teeth, as well as deep learning models for image analysis were used. Moreover, the proposed system can be extended by combining the system with patient profiling approaches and mobile applications.

Although the current study presented notable findings, several limitations remain. One notable limitation is that the study did not consider the individual characteristics of the users and their effects. Moreover, only a limited number of oral images were used for training. This may lead to class imbalance issues. It means that one of our methodological contributions is the use of focal loss for weight balancing and various image augmentation techniques (horizontal and vertical shifts, scale adjustments, and color jitter with modulations of the brightness, contrast, and saturation to address the potential issue of class imbalance). Nonetheless, class imbalance issues may impair the performance of the proposed system. Based on these limitations, future research should focus on improving the model accuracy through additional maxillary and mandibular teeth image collection and small dataset processing methods such as synthetic image augmentation and semi-supervised learning for image classification. Furthermore, since the ratio of tooth diseases is generally lower than that of normal teeth, future research should address not only the problem solving of imbalance class classification, but also the approach to solving anomaly detection problems.