Introduction

According to World Report on vision [27], it is reported that cataract is the leading cause for blindness and vision impairment, approximately 65.2 million people are suffering from moderate or severe cataract. These cataract patients can improve their vision and life quality through efficient cataract surgery or early intervention, reducing the bilateral cataract-blindness burden for society.

Nuclear cataract (NC) is one of the most common cataract types, and the clinical manifestations include the gradual clouding and progressive hardening of the nuclear region of the crystalline lens [25]. Ophthalmologists have applied several ophthalmic images to NC diagnosis based on gold cataract grading protocols over the past years. Lens opacity classification system III (LOCS III) [34] is a commonly well-accepted cataract grading protocols built on slit-lamp images. E.g., ophthalmologists usually grade NC’s severity levels based on the slit-lamp images and LOCS III in the clinical diagnosis. This manual NC classification mode is subjective and error-prone; moreover, it is easily affected by the ophthalmologist’s experience and professional knowledge.

Anterior segment coherence tomography (AS-OCT)image is one type of OCT imaging technique, which is capable of capturing the whole anterior structure, including the crystalline lens structure information. Compared with other ophthalmic images like the slit lamp image, it is non-invasive, objective, user-friendly, high-resolution, and quick. Furthermore, it can measure the opacities of the lens quantitatively and objectively. According to the opacity pathology development of NC, it generally can be divided into three stages on LOCS III [28]. (1) Stage 0: Normal (non-nuclear cataract), without nuclear opacity. (2) Stage 1: Low-grade (NC grade \(= 1\) or NC grade \(=2\)), is asymptomatic. (3) Stage 2:High-grade (NC grade is \(\ge \) 3). For subjects with low-grade nuclear cataract, clinical intervention, such as Kary Uni eye drops, can slow the nuclear cataract progress; while for subjects with high-grade nuclear cataract, it is necessary to undergo cataract surgery and progress follow-up. Figure 1 provides three severity levels of nuclear cataract on AS-OCT images.

Over the past years, ophthalmologists have increasingly used AS-OCT images to diagnose anterior segment ophthalmic diseases, e.g., glaucoma, corneal diseases [1, 11, 12]. Researchers have recently begun to study the opacity relationship between NC grades and the lens nucleus region on AS-OCT images quantitively and objectively. Wong et al. [33] first used the linear fitting method to build a opacity relationship between NC grades and mean density of nuclear region on AS-OCT images, and statistical results showed that the opacity relationship is strong. Literature [5, 6, 15, 26] also obtained similar statistical results in clinical research, but [26] gets weak opacity relationship on down nucleus region compared with whole nucleus region [5, 6]. Further, these statistical results provided a potential contribution for AS-OCT image-based cataract surgery planning and the clinical diagnosis support for automatic NC classification. Motivated by clinical AS-OCT image-based NC research, [43] applies a deep learning model to NC classification automatically on the whole lens region of AS-OCT images. It only obtained about 58% accuracy, indicating that it is a challenging for automatic NC classification on AS-OCT.

This paper presents a simple yet effective nuclear cataract classification framework on AS-OCT images, assisting ophthalmologists in diagnosing nuclear cataract accurately and objectively. It includes three steps: feature extraction, feature importance analysis, and classification, as shown in Fig. 2. In the feature extraction step, we devise a clinical global–local feature extraction method to extract 20 image features from the whole nucleus region, up nucleus region, and down nucleus region, respectively. It is motivated by clinical NC research [5, 6, 15] and opacity locations of nuclear cataract subtypes. Moreover, according to the literature [19], two nuclear size features are also extracted: nuclear thickness and nuclear diameter. Hence, the total number of extracted features from AS-OCT images is 62. In the feature importance analysis step, we use Pearson’s correlation coefficient (PCC) and recursive feature elimination method (RFE) to analyze feature importance, considering both the clinical research and classification performance requirements. We then use an ensemble multiclass logistic regression (EMLR) further to improve NC classification performance in the classification step, in which two different optimization methods are used for two multiclass logistic regression classifiers. Finally, a clinical AS-OCT image dataset is used to evaluate the proposed feature extraction-based framework. The dataset contains 543 subjects and the total number of AS-OCT image is 11,442. The results demonstrate that the proposed feature extraction-based learning framework is simple and effective, compared with strong baselines. Moreover, it can potentially be a computer-aided diagnosis (CAD) tool for AS-OCT image-based cataract diagnosis and cataract surgery planning.

Fig. 1
figure 1

Three nuclear cataract severity’s levels based on AS-OCT images. Normal a denotes the nuclear region without nuclear opacity; low-grade b denotes the nuclear region with slight nuclear opacity but asymptomatic; high-grade c with nuclear opacity but symptomatic

Fig. 2
figure 2

Flowchart of the proposed feature extraction-based framework. First, we crop the nucleus region from the AS-OCT image and use the global-local feature-based extraction method to extract features from the up, whole, and down nucleus region. Then, we use PCC and RFE methods to analyze feature importance. Finally, we present an ensemble multiclass logistic regression to distinguish three severity levels of the nuclear region

In general, the main contributions of this paper are summarized as follows:

  • To obtain more useful features from the nuclear region on AS-OCT images, this paper proposes the global-local feature extraction method, inspired by clinical research of nuclear cataract. Furthermore, we extracted two nuclear size features to boost the NC classification results.

  • Using PCC and RFE method to analyze feature importance, to eliminate less important features as well as select useful features. To further enhance the overall NC classification results, we propose an ensemble multiclass logistic regression classifier by considering the effects of different optimization methods for the single multiclass logistic regression classifier.

  • The results on the AS-OCT image dataset demonstrate that the proposed feature extraction-based framework achieves state-of-the-art performance compared with strong baselines.

The rest of this paper is organized as follows. The section “Related work” reviews related work. The section “AS-OCT image dataset” introduces the AS-OCT image dataset. The section “Methodology” elaborates the proposed feature extraction-based framework for automatic AS-OCT image-based NC classification. Experiment settings and evaluation measures are presented in the section “Experiment settings and evaluation measures”. We analyze and discuss nuclear classification results in the section “Result analysis and discussion”. The section “Conclusion and future work” presents conclusions and future work.

Related work

In this section, we review recent advances in automatic cataract classification and AS-OCT-based ocular disease diagnosis.

Automatic cataract classification

Over the past years, researchers have developed various artificial intelligence (AI) algorithms for automatic cataract classification based on several ophthalmic imaging modalities (slit-lamp images and fundus images), ranging from conventional machine learning methods to deep learning methods.

Conventional machine learning methods. Literature [20,21,22,23, 42] develops an automatic nuclear cataract grading system based on slit lamp images, comprised of lens contour detection, feature extraction, and classification. They used linear regression (LR) as the classifier and achieved a 0.36 mean error in their work. Literature [38] adopts bag of words (BOW) method to extract features and got 82.5% accuracy via group sparsity regression (GSR) method on slit-lamp images. Cheng [7] presented sparse range-constrained learning (SRCL) method for slit lamp image-based nuclear cataract classification and obtained higher accuracy than previous works [38, 39]. Caixinha et al. [2] used ultrasound images for automatic cataract classification based on the animal model. They achieved 95% accuracy of nuclear cataract hardness classification using a multiclass SVM classifier on a small dataset. [4] proposes the improved Haar wavelet method for cataract screening on fundus images. However, fundus images can not detail opacity information of different cataract types, only can be used for cataract screening.

Deep learning methods. Gao et al. [14] combined the convolutional neural network (CNN) and recurrent neural network (RNN) for automatic slit-lamp image-based nuclear cataract classification and achieved 84.2% accuracy. Literature [36] proposes an end-to-end deep learning framework for both the nuclear region contour detection and nuclear cataract classification automatically. Using Faster R-CNN, they achieved 84.7% accuracy. Wu et al. [35] designed a deep learning platform for slit-lamp image-based cataract screening. Xu et al. [37] proposed a hybrid CNN model for cataract screening on retinal images by fusing different region information of retinal images. The results on fundus images showed that the hybrid CNN improved cataract screening results. In [41], researchers use a deep convolutional neural network ((DCNN) to fundus image-based cataract screening and achieved good screening results.

AS-OCT-based ocular disease diagnosis

As stated in the section “Introduction”, AS-OCT images are noncontacted, non-invasive, user-friendly, objective, and quantitative. Moreover, they can capture 2D (two-dimensional) and 3D (three-dimensional) information of the eye’s anterior structure. Ophthalmologists have gradually used AS-OCT images for ocular disease diagnosis (E.g., corneal diseases) and scientific research purposes due to characteristics of AS-OCT. Literature [9, 16] proposes a deep CNN-based segmentation method for corneal structure segmentation, which can help clinicians diagnose corneal diseases accurately. Fu et al. [11,12,13] applied AS-OCT images to diagnose angle-closure glaucoma through deep learning models, which can assist ophthalmologists objectively diagnose glaucoma. Wong et al. [33] studied the correlation relationship between nuclear cataract grades and mean density of the whole nucleus region through the linear fitting method. The statistical results show that the relationship between them is strong. Literature [5, 6, 15] also gets similar results between nuclear cataract grades and the whole nucleus region on AS-OCT images. [26] uses the down nucleus region to study the opacity relationship between nuclear cataract grades and mean density, but gets a weak opacity relationship. All in all, these clinical AS-OCT image-based cataract research can be a potential contribution to nuclear cataract surgery planning and provide clinical support for automatic nuclear cataract classification.

According to a review of related works, we can get points as follows. (1) Previous results have achieved high cataract classification performance via different ophthalmic images, but most of them focused on cataract screening. (2) Feature extraction methods can obtain competitive performance through comparison to deep learning methods. Moreover, deep learning methods need massive data to train a good deep learning model, and the clinical explanation of learned feature representations is poor. (3) Automatic nuclear cataract classification works only based on slit-lamp images, but they cannot measure nuclear cataract opacity objectively and quantitatively. (4) AS-OCT images overcome shortcomings of slit lamp images, but AS-OCT image-based nuclear cataract classification research has not widely been studied.

AS-OCT image dataset

This paper collects a clinical AS-OCT image dataset through CASIA2 ophthalmology device, Tomey Corporation, Japan. AS-OCT image captures whole anterior structure information of an eye, as shown in top left corner of Fig. 2. Only the lens nucleus region is essential for NC classification according to clinical cataract research [6, 15, 32], as shown in Fig. 1. We use the deep segmentation network [3] to get coarse segmentation results of the nuclear region. To get accurate nuclear region segmentation results, we use ImageJ software to correct nuclear region segmentation results manually.

Considering there is no clinical nuclear cataract classification system built on AS-OCT images. We construct the mapping relationship between AS-OCT images and slit-lamp images through LOCS III to acquire nuclear cataract grades for AS-OCT images. Three experienced ophthalmologists labeled the subject’s NC grades using silt lamps, which confirmed the label quality and reliability for AS-OCT images. This paper converts NC’s severity levels into three stages based on clinical AS-OCT-based classification research, as introduced in the section “Introduction”. Stage 1: the subject’s lens nuclear region without opacity is normal (non-nuclear cataract); the subject with the NC grade 1 or grade 2 is asymptomatic (low-grade); the subject with the NC grade is greater than or equal to 3 are symptomatic (high-grade).

The AS-OCT image dataset contains 543 subjects, including 422 right eyes and 440 left eyes. The gender and age information of some subjects are missed. The number of male and female subjects are 135 and 335, respectively. Four hundred ninety-four subjects have age information, and the age ranges from 15 to 94. Each subject contains 128 images. This paper selects AS-OCT images based on the interval mode by considering the repeatability of adjacent AS-OCT images; thus, 64 AS-OCT images of each subject are used. The available AS-OCT images of each eye range from 1 to 64, because we manually remove poor-quality images with an ophthalmologist’s guidance. Considering opacity levels of each subjective’s eyes may have mutual effects on each other, we split the AS-OCT image dataset based on the number of subjects into disjoint subsets: training dataset and testing dataset. The training dataset and the testing dataset contain 7831 and 3611 AS-OCT images, respectively, and the total number of AS-OCT images is 11,442. Table 1 summarizes the three different NC severity-level distribution on the AS-OCT image dataset.

Methodology

In this paper, we propose a simple yet effective NC classification framework on AS-OCT images, as illustrated in Fig. 2, comprised of feature extraction, feature importance analysis, and classification. In the feature extraction part, we apply the global–local feature extraction method to obtain features from three nuclear regions: whole region, up region, and down region, respectively. Additionally, nuclear size features include nuclear thickness and nuclear diameter are also extracted. Followed by feature importance analysis, we use both PCC and RFE to keep useful features while delete redundant features. Finally, an ensemble multiclass logistic regression classifier is presented to distinguish the different NC’s severity levels.

Table 1 The distribution of NC stages on AS-OCT image dataset

Global–local feature extraction

Refs. [5, 6, 33] and [26] study the opacity relationship between NC grades and mean density through the whole nucleus region and down nucleus region based on AS-OCT images, respectively. We found that opacity relationship value on whole nucleus region is higher than on down nucleus region, which is caused by opacity locations of nuclear cataract subtypes. Motivated by the clinical research finding, this paper extracts features from three different regions: whole, up, and down, as shown in Fig. 2. We extract 20 features from each region using the intensity-based statistics method and intensity histogram method [17, 24, 44,45,46]. Hence, obtained features can be divided into intensity statistics features and intensity histogram features.

Intensity-based statistical features

Using the intensity-based statistics method, we extract 17 intensity-based statistics features from each lens nucleus region as follows:

  1. 1.

    Mean \(\mu \): the average intensity of each nucleus region on AS-OCT images, which is an important indicator for clinical AS-OCT image-based nuclear cataract diagnosis

    $$\begin{aligned} \mu =\frac{1}{N}\sum _{k=1}^{N}X_{k}; \end{aligned}$$
    (1)

    \(X_{k}\) and N denote the intensity value of nucleus region pixel and the total number of intensities on AS-OCT images, respectively.

  • Minimum [2.]: the lowest intensity value of the nucleus region on AS-OCT image.

  1. 3.

    Maximum: the highest intensity value in the nucleus region on AS-OCT image.

  2. 4.

    Median (M): the median is an intensity value that can separate the higher half from the lower half of intensities.

  3. 5.

    \(10\mathrm{Th}\) intensity percentile ((\(P_{10}\))): the 10th percentile intensity value of all nucleus region intensity values on AS-OCT image in ascending order. \(P_{10}\) is a more robust alternative to the minimum intensity values.

  4. 6.

    \(25\mathrm{th}\) intensity percentile (\(P_{25}\)): the 25th percentile intensity value of all nucleus region intensity values on AS-OCT image in ascending order.

  5. 7.

    \(75\mathrm{th}\) intensity percentile (\(P_{75}\)): the 75th percentilenucleus region intensity value of all nucleus region intensity values on AS-OCT image in ascending order.

  6. 8.

    \(90\mathrm{th}\) intensity percentile (\(P_{90}\)): the 90th percentile of nucleus region intensity value of all nucleus region intensity values on AS-OCT image in ascending order. \(P_{90}\) is a more robust alternative to the maximum intensity value.

  7. 9.

    Intensity range : the difference between the maximum intensity value and the minimum intensity value of the nucleus region on AS-OCT images.

  8. 10.

    Intensity interquartile range (IRQ): the interquartile range of nucleus region intensities and can be defined as follows:

    $$\begin{aligned} \mathrm{IRQ} = P_{75}-P_{25}; \end{aligned}$$
    (2)

    \( P_{75}\) and \(P_{25}\) denote the 75th percentile nucleus region intensity value and the 25th percentile nucleus region intensity value.

  9. 11.

    Energy: considering nuclear sizes are different, here, energy is average of total nucleus region intensity square

    $$\begin{aligned} \mathrm{Energy} =\frac{1}{N}\sum _{k=1}^{N}X_{k}^{2}. \end{aligned}$$
    (3)
  10. 12.

    Variance: it measures how far the nucleus region intensity values are spread out from the average intensity value.

  11. 13.

    Standard deviation (SD): it measures the dispersion of the nucleus region intensity values.

  12. 14.

    Mean absolute deviation (Mad): it is a measure of dispersion from the average intensity

    $$\begin{aligned} \mathrm{Mad} = \frac{1}{N}\sum _{k=1}^{N} \left| X_{k} - \mu \right| . \end{aligned}$$
    (4)
  13. 15.

    Skewness \(\tilde{\mu }_{3}\): in probability theory and statistics, skewness is an indicator to measure the asymmetry of nuclear region intensity distribution and can be expressed via the following equation:

    $$\begin{aligned} \tilde{\mu }_{3}=\frac{\frac{1}{N}\sum _{k=1}^{N}(X_{k}-\mu )^{3}}{\left( \frac{1}{N}\sum _{k=1}^{N}(X_{k}-\mu )^{2}\right) ^{3/2}}, \end{aligned}$$
    (5)
  14. 16.

    Kurtosis \(\tilde{\mu }_{k}\): it is used to measure peakedness [46] in the nuclear region intensity distribution on the AS-OCT image and we compute it through Eq. (6)

    $$\begin{aligned} \tilde{\mu }_{k}=\frac{\frac{1}{N}\sum _{k=1}^{N}(X_{k}-\mu )^{4}}{\left( \frac{1}{N}\sum _{k=1}^{N}(X_{k}-\mu )^{2}\right) ^{2}} -3, \end{aligned}$$
    (6)
  15. 17.

    Root-mean-square intensity (RMS): it also called the quadratic mean and can be computed as follows:

    $$\begin{aligned} \mathrm{RMS} =\sqrt{ \frac{1}{N}\sum _{k=1}^{N}X_{k}^{2}}. \end{aligned}$$
    (7)

Intensity-based histogram features

Apart from the above 17 intensity-based statistics features, we also apply the intensity histogram method to extract three intensity histogram features from AS-OCT images. Then, nuclear region intensity (density) value is between 0 and 255. The interval value for each bin is 25 in the histogram; hence, the number of histogram bins is 11.

  1. 18.

    Uniformity: the sum of probability squares of different intensity value intervals in the histogram [24]. It enables to measure the randomness of a histogram.

  2. 19.

    Entropy: it is an information-theoretic concept that provides a metric for the AS-OCT image intensity information of nuclear cataract severity levels. This paper uses the following equation to express:

    $$\begin{aligned} \mathrm{Entropy} = -\frac{1}{N}\sum _{i=1}^{N}P_{i}\log P_{i}, \end{aligned}$$
    (8)

    where \(P_{i}\) denotes of probability of each bin, which is determined by the number of intensity values in a bin.

  3. 20.

    Histogram-based energy (HBE): it measures the intensity distribution, and large values imply that intensity distribution is uneven.

Nuclear size-based features

Ref. [19] has studied the opacity correlation relationship between nuclear size-based features and nuclear cataract severity levels through the linear fitting method. The statistical results show that the relationship between nucleus size features and nuclear cataract grades is strong. In this paper, we extract two features from nuclear size: thickness and diameter, which are represented by height and width of the nucleus region AS-OCT images. Figure 3 presents nuclear thickness (red) and nuclear diameter (green) of the nuclear region on AS-OCT images.

Fig. 3
figure 3

Two nuclear size features: nuclear thickness (red) and nuclear diameter (green) of nuclear region

Overall, the total number of extracted features from the nuclear region is 62, and for detailed feature information, see Table 2.

Table 2 Nuclear cataract classification performance on different nucleus region features and nuclear size features using four machine learning methods

Feature importance analysis

Considering both clinical research and NC classification performance requirements, this paper uses two different feature selection methods to analyze feature importance: Pearson’s correlation coefficient (PCC) [22] and recursive feature elimination method (RFE) [18]. The motivation to use the PCC method is that it is widely used in clinical scientific research. Hence, we construct correlation relationships between nuclear cataract severity levels and the nuclear region’s extracted features on AS-OCT images through the linear fitting. This paper uses the following equation to the PCC:

$$\begin{aligned} r = \frac{n\sum f_{K}y_{K}-\sum f_{K}\sum y_{K}}{\sqrt{n\sum f_{K}^{2} - \left( \sum f_{K}\right) ^{2}}\sqrt{n\sum y_{K}^{2} - \left( \sum y_{K}\right) ^{2}}}, \end{aligned}$$
(9)

where \(f_{K}\), \(y_{K}\), and n denote the extracted features, NC severity levels, and the number of AS-OCT images. K is K-th AS-OCT image. r indicates the PCC value between the extracted features and NC severity levels.

RFE is another widely used feature selection method for feature importance analysis, which selects features by recursively using smaller and smaller feature set. The multiclass logistic regression method is used for RFE based on NC classification performance. Moreover, we only use 59 features for RFE, because the nuclear region’s minimum density values are 0. To compute feature importance efficiently, we use recursive feature elimination with cross-validation (RFECV) for training dataset. Before feature selection. Tenfold cross-validation is adopted in this paper [29], which training dataset is divided into tenfold, ninefold for training and onefold for testing. Because, this strategy can enable multiclass logistic regression to have a good generation ability. The Z-score method is utilized to transform one feature vector space into another feature vector space through the following equation:

$$\begin{aligned} \hat{x}=\frac{x-\mu }{\sigma }, \end{aligned}$$
(10)

where \(\hat{x}\) is transformed feature vector space, x is original feature vector space, and \(\mu \) and \(\sigma \) denote mean and standard deviation of each feature vector. It maps features with different scales into the same feature scales and deletes feature background correlation information.Then, we apply the RFECV to analyze feature importance and get two feature subsets: important feature subset and unimportant feature subset. Important feature subset denotes that features are for classification, while unimportant feature subset indicates that features are not used for classification. Finally, we use multiclass logistic regression to determine the number of selected features based on the classification performance.

Automatic nuclear cataract classification via ensemble multiclass logistic regression

This paper uses the logistic regression method (LR) for automatic nuclear cataract classification, because previous works have shown LR achieved promising classification results on various learning tasks [18]. Considering nuclear cataract classification is a multiclassification task. Thus, this paper uses multiclass logistic regression (MLR) through the following equation:

$$\begin{aligned}&p(y=i|\phi )=y_{i}(\phi )=\frac{e^{a_{i}}}{\sum _{j}e^{a_{j}}}, \end{aligned}$$
(11)
$$\begin{aligned}&\begin{aligned} a_{i}&= w_{i}^{T}\phi \\ {}&= w_{0,i}x_{0,i} +w_{1,i}x_{1,i}+ w_{2,i}x_{2,i}+\cdots +w_{M,i}x_{M,i}, \end{aligned}\nonumber \\ \end{aligned}$$
(12)

where \(i\in \)0,1,2, \(\phi \) denotes the feature vectors \(x_{0},x_{1},x_{2},...,x_{M}\), M is the number of feature vectors, \( w_{i}^{T}\) is the learned parameters for kth class, and \(p(y=i|\phi )\) is the predicted output of ith class.

In the training, the parameters of multiclass logistic regression can be optimized through the following cost function:

$$\begin{aligned} J(w) = -\sum _{j}y_{j}\log p(y=j|\phi ) + \frac{1}{2}w^{T}w. \end{aligned}$$
(13)

Equation (13) also named cross-entropy error function.

In the experiments, we found MLR classifier with different weight optimization methods that obtain different NC classification results. Specifically, different weight optimization methods enable MLR classifier to pay attention to different nuclear cataract severity levels. Therefore, we present an ensemble logistic regression (EMLR) framework in which two different optimization methods [8, 30, 31] are used for two MLR classifiers based on the classification performance, respectively. SAGA (stochastic average gradient ascent) and LBFGS (limited-memory Broyden–Fletcher–Goldfarb–Shanno) optimization methods are used in this paper [8, 30] according to experimental results.

The predicted output of EMLR can be expressed as follows:

$$\begin{aligned} p_\mathrm{EMLR} = p_{\mathrm{MLR}_\mathrm{lbfgs}}+ p_{\mathrm{MLR}2_\mathrm{saga}}, \end{aligned}$$
(14)

where \(p_{\mathrm{MLR}_\mathrm{lbfgs}}\) and \(p_{\mathrm{MLR}2_\mathrm{saga}}\) denote MLR uses LBFGS and SAGA optimization methods, respectively.

Experiment settings and evaluation measures

Experiment settings

We implement experimental codes using Python language, OpenCV package, and Pytorch platform. To demonstrate the proposed feature extraction-based framework’s performance comprehensively, this paper conducts the following comparable experiments.

  • Performance comparison of different nucleus region features. This paper extracts features from three lens nucleus regions include the whole nucleus region, up nucleus region, and down nucleus region correspondingly, as shown in Fig. 2. We use four classical machine learning methods to evaluate the NC classification performance of extracted AS-OCT image-based features: MLR, Gaussian naive Bayes (NB), ridge regression (RE), and random forest (RF). These four machine learning methods represent different machine learning types and can demonstrate the robustness of extracted features.

  • Results of feature importance analysis. To get significant features and delete redundant features through analyzing feature importance, we use two feature importance analysis methods: PCC and RFE. In the RFE, MLR with the LBFGS optimization method is used to select features. To test which optimization method works well for MLR on the extracted features. This paper uses five optimization methods: SAG (Stochastic Average Gradient) [8], LIBLINEAR [10] (A Library for Large Linear Classification), Newton-CG [40] (Newton with Dual Coordinate Descent), LBFGS, and SAGA.

  • Baseline methods. To verify the performance of the proposed ensemble method comprehensively, this paper not only uses state-of-art machine learning methods like Gradientboosting, Adaboost, Multilayer perceptron (MLP), and support vector machine (SVM), but also uses advanced convolutional neural networks (CNNs) like AlexNet, VGGNet, MobileNet, and ResNet. CNN models use AS-OCT images of nuclear region as inputs.

Table 3 Feature coefficients of 62 features based on PCC

Evaluation measures

To evaluate the overall performance of methods, we calculate the following commonly accepted evaluation measures: accuracy (ACC), macro precision, macro-sensitivity (Sen), and macro-F1 score. These evaluation measures can be expressed by the following equations:

$$\begin{aligned}&\mathrm{ACC}=\frac{\mathrm{TP} + \mathrm{TN}}{\mathrm{TP} + \mathrm{TN} + \mathrm{FP} + \mathrm{FN}}, \end{aligned}$$
(15)
$$\begin{aligned}&\mathrm{Sen} =\frac{\mathrm{TP}}{\mathrm{TP} + \mathrm{FN}}, \end{aligned}$$
(16)
$$\begin{aligned}&\mathrm{Precision} = \frac{\mathrm{TP}}{\mathrm{TP} + \mathrm{FP}}, \end{aligned}$$
(17)
$$\begin{aligned}&\mathrm{F}1 = \frac{2 * \mathrm{precision} * \mathrm{recall}}{\mathrm{precision} + \mathrm{recall}}, \end{aligned}$$
(18)

where TP, FP, TN, and FN denote the numbers of true positives, false positives, true negatives, and false negatives, respectively.

Result analysis and discussion

Performance comparison of different nucleus region features

Table 4 presents the NC classification performance on features of different lens nucleus regions and nuclear size via four machine learning methods. It can see that compared to RF, NB, and RE, MLR achieves the best NC classification results (86.71% accuracy and 87.44% macro-F1) on three lens nucleus region features and nuclear size features and improves over 1% accuracy. Four machine learning methods generally achieve better NC classification results on the whole, up, and down nucleus regions. These results indicate that the fusion of different nuclear region features can boost the classification performance. We can also see that NC classification results in this paper agree with linear regression results on clinical works using different nuclear regions of AS-OCT images.

RE achieves the best accuracy of 66.99% on two nuclear size features, which keeps agreement with clinical works. Four machine learning methods generally achieve better NC classification results on three regions plus nuclear size features. MLR achieves the highest improvement of about 5% on the fusion of up nucleus region features and nuclear size features. The results demonstrate that nuclear size features can enhance NC classification results. Moreover, the fusion of different nuclear region features and nuclear size features is more robust than single nuclear region features and nuclear size features based on the NC classification results, four machine learning methods achieve over 80% accuracy, and three machine learning methods obtain than 86.00% accuracy.

Feature importance analysis results

Table 2 presents PCC values between 62 features and NC’s severe levels, and we can see that the correlation relationship between the severity levels of NC and IQR is stronger than other features on three nuclear regions. The feature importance of uniformity is second only to IQR. These two features have the potential as clinical indicators for the clinical NC diagnosis, because they are explainable. Moreover, the PCC value of minimum density is 0, because the minimum density value of the nuclear region is 0. Thus, we do not use minimum density for the following feature importance analysis and NC classification, that is, only 59 features are useful. PCC value of nuclear diameter is low, which is conflicted with clinical founding. Mainly because we cannot extract the right edge and left edge of nuclear size accurately, as shown in Fig. 3, which is effected by scanning angle and environment.

Table 4 Nuclear cataract classification performance on different features

Table 3 presents NC classification results of different features. Features on three nuclear regions with PCC values \(>0.700\) are selected. For each selected feature in three regions with the highest PCC value is used. It can see that features with high PCC values generally achieve better NC classification performance. It also demonstrates that the machine learning-based classification results have good agreement with clinical works. MLR achieves the best accuracy of 75.71% using skewness than other single features, while RF achieves the best accuracy of 71.73% through IRQ.

To further study PCC values’ effects on NC classification performance, we select the highest PCC values of each feature extracted from three regions, the selected feature subset named Hybrid. According to Table 4, the hybrid subset achieves better performance than other region feature subsets using MLR and RF. It demonstrates that high PCC values of features can improve NC classification performance, and feature information of three regions is different, contributing to boosting NC classification results.

Figure 4 presents feature selection results of the RFE method using MLR. The horizontal axis represents the number of features that are used based on their coefficient values. The vertical axis presents the accuracy values change with each number of features. 46 features (important feature subset) are selected when MLR achieves the best accuracy on the training data. Table 5 shows the feature importance rankings of unselected features (unimportant feature subset). The higher the feature importance ranking value is, the more unimportant feature is (ranking value starts with 2). Figure 5 presents feature coefficient values of MLR based on RFE for every nuclear cataract severity level.

Fig. 4
figure 4

Accuracy chart for sorted features based on their coefficients via MLR method on AS-OCT image dataset

Table 5 Unimportant feature subset of MLR on the training dataset
Fig. 5
figure 5

Values of features’ coefficients for three nuclear cataract severity’s levels using multiclass logistic regression method based on recursive feature elimination

Figure 6 shows the results of the different number of features based on MLR when deleting unimportant features. It can be inferred that MLR achieves the best results (86.82% accuracy) when the number of features is 51. The following features are not used: \(P_{10}\), Down \(P_{10}\), Up \(P_{10}\), \(P_{90}\), Down HBE, Kurtosis, Up kurtosis, and Down entropy, which may provide a reference for the future work. Furthermore, comparable machine learning methods also use 51 features as input in the following experiments.

Fig. 6
figure 6

Accuracy chart for the number of features on MLR

Fig. 7
figure 7

Accuracy chart for optimization methods based on MLR

Table 6 Nuclear cataract classification results of machine learning methods and deep learning methods

Performance comparison of machine learning methods and deep learning methods

Figure 7 presents the NC classification results of MLR with different optimization methods. The horizontal axis denotes the optimization methods for MLR, and the vertical axis represents the accuracies of each optimization method. It can conclude that MLR achieves better performance through saga and lbgs optimization methods than other optimization methods. Hence, this paper adopts these two optimization methods for EMLR.

Table 6 presents the NC classification results of machine learning methods and deep learning methods. We can see that the proposed EMLR achieves the best accuracy and the best precision with 86.96 and 87.31% on the AS-OCT image dataset, respectively. GoogleNet achieves the best F1 and the best sensitivity of 88.01% and 89.98%. The proposed EMLR and GoogleNet achieve better NC classification results than other machine learning methods and deep learning methods. The main reason to explain the classification results of EMLR is that it considers the advantages of optimization methods for MLR and characteristics of features based on feature importance analysis methods.

Fig. 8
figure 8

Confusion matrix of EMLR on AS-OCT image dataset

Compared with deep learning methods like ResNets and VGGNets, EMLR, MLR, RE, and SVM achieve competitive classification performance, which confirms the effectiveness of the proposed global–local feature extraction method. Machine learning methods have better explanation ability than deep learning methods, because used features are interpretable, which are significant for clinical disease diagnosis. Moreover, the proposed method outperforms literature [43] approximately 30%, because this paper uses the nuclear region for NC classification, while [43] uses the whole lens region as inputs.

Furthermore, the proposed feature extraction-basedframework’s hardware environment requirements are lower than deep learning methods; it also requires less training time and is easy to be deployed on photography devices. Figure 8 presents the confusion matrix of EMLR. We can conclude that EMLR classifies all normal AS-OCT images correctly, and specificity is 90.44%. The precision value of low-grade is 71.58%, which may be caused by an imbalanced dataset problem.

All in all, the proposed feature extraction-based framework is able to achieve state-of-art nuclear cataract classification results as well as has a good explanation. Nevertheless, low-density values occupy a large proportion of density values, which makes machine learning methods hard to distinguish different nuclear cataract severity levels. This challenge would be investigated in the future work.

Conclusion and future work

This paper proposes a simple yet effective feature extraction-based framework to distinguish different nuclear cataract severity levels on AS-OCT images, comprised of global–local feature extraction, feature importance analysis, and ensemble multiclass logistic regression. The global–local feature extraction method is applied to obtain features from three nuclear regions for enhancing classification performance. Feature importance analysis conduces to select useful features. Ensemble multiclass logistic regression considers the advantages of different optimization methods. The results on the AS-OCT image dataset demonstrate that the proposed feature extraction-based framework achieves state-of-art nuclear cataract classification results through comparison to advanced machine learning methods and deep learning methods. Moreover, the proposed framework has the potential as a computer-aided diagnosis tool for nuclear cataract diagnosis and cataract surgery planning.

In the future work, we will incorporate different nuclear region information based on AS-OCT into the deep neural network models, which may further improve nuclear cataract classification results.