Introduction

The quality of food can change along the supply chain from harvest through to processing, quality assessment, and retail display for consumers (Parfitt et al., 2010). Shelf life and ripening time vary among individual fruit and vegetables, and so techniques that evaluate the ripening time of individual fruit are valuable for determining postharvest storage and handling strategies. A total of 14% of food is lost between harvest and retail globally (FAO, 2019). Sorting fresh fruit into homogeneous classes with a similar ripening time could help to reduce food waste and increase customer satisfaction.

Hyperspectral imaging (HSI) has been popular as a non-destructive technology for predicting shelf life or estimating the maturity of many fruits, where partial least squares regression (PLSR) models have been commonly applied to meet these goals (Rajkumar et al., 2012; Wei et al., 2014; Pu et al., 2016; Li et al., 2018). However, HSI generates highly dimensional data that are difficult to analyse due to its high computational complexity (Han et al., 2020; Huang et al., 2014). Deep learning techniques such as deep convolutional neural networks (DCNN) have the ability to deal with high computational complexity and have been used in feature learning and feature extraction from images (Bengio et al., 2013; Schmidhuber, 2015). Various deep learning techniques applied to hyperspectral imagery have been used to detect disease, recognise food and drink images, and estimate nut and meat quality (Mezgec & Koroušić Seljak, 2017; Ma et al., 2018; Han et al., 2020; Liu et al., 2020). However, DCNN approaches have found little application, thus far, in determining the ripening time of fruit (Steinbrener et al., 2019; Garillos-Manliguez & Chiang, 2021).

Global market demand for avocado products has increased exponentially in recent years (New Zealand Avocado, 2022). However, avocado fruit are highly perishable when compared with many other fruits, and the inability to predict avocado ripening time is one factor that leads to consumer dissatisfaction and fruit loss (Gamble et al., 2010; Perkins et al., 2020; Kämper et al., 2020). The maturity of avocado fruit can be assessed based on dry matter concentration. Fruit with low dry matter concentration do not ripen into edible fruit whereas fruit with high dry matter concentration ripen quickly (Sivakumar et al., 2011). Dry matter concentrations of individual fruit can vary greatly within the same farm or the same tree depending on, for example, the location of the fruit in the tree canopy (Alcobendas et al., 2013; Carvalho et al., 2015), affecting the ripening time of mature fruit after harvest.

This study aimed to use two avocado cultivars, ‘Hass’ and ‘Shepard’, to determine the potential of hyperspectral imaging for rapidly predicting the ripening time of mature, unripe avocado fruit. Specifically, PLSR models, DCNN regression models, and DCNN classification models were developed to predict the ripening time of ‘Hass’ and ‘Shepard’ avocado fruit. The performances of DCNN regression and classification models were compared with traditional PLSR models to identify the best approach. This is the first study, to our knowledge, to examine the combination of DCNN and HSI for predicting the ripening time of avocado fruit. Rapid fruit ripening assessment can help to decrease postharvest fruit loss.

Materials and methods

Sample collection and preparation

Avocado fruit are harvested in dry weather from a commercial avocado orchard (25°08′ 19″ S 152°15′48″ E) near Childers, Queensland, Australia. A total of 160 ‘Hass’ and 160 ‘Shepard’ fruit were harvested on 15 April 2019 and another 156 ‘Hass’ fruit on 10 June 2019. All fruit were harvested from trees in a large ‘Hass’ block that included a single row of ‘Shepard’ trees. On each collection date, the fruit were harvested from three to five different trees per sampled cultivar and kept in the shade until they were moved into a cold room at 4 °C for ‘Hass’ and 7 °C for ‘Shepard’ on the same day according to the recommended temperature standards for these two different cultivars (Kämper et al., 2020; Ledger et al., 2016). ‘Hass’ and ‘Shepard’ fruit were stored in the cold room for 8 days and 7 days, respectively. All fruit were moved to room temperature (21 °C), imaged, and kept at room temperature to allow the onset of ripening. The ripeness of each fruit was inspected daily and confirmed by measuring skin firmness with a handheld sclerometer (8 mm head; Lutron Electronic Model: FR-5120). The fruit was considered ripe when the maximum force needed to press the sclerometer tip 1 mm deep was < 15 N (Smith et al., 1997; Flitsanov et al., 2000; Hofman et al., 2013). The day of full ripeness was recorded.

Imaging system and spectral profile extraction

An image of the skin on one side of the avocado fruit was captured using a hyperspectral imaging system. All images were captured with a 12-bit line scanner camera (Pika XC2, USA) containing a lens with 23 mm focal length and four current-controlled wide-spectrum quartz-halogen lights. The spectral resolution of the camera was 1.36 nm, resulting in 462 bands between 388 and 1005 nm. Each fruit was placed on a black tray on a translation stage moving at 1.23 mm s−1 and the exposure time was set to 19.4 ms. The DCNN regression and classification were based on the full images but, for PLSR, the HSI data were then exported using Spectronon Pro software package (Version 2.112) by Resonon. The raw reflectance (\({R}_{0}\)) of each hyperspectral image was extracted by marking a region of interest (ROI), which excluded the fruit centre that had intense light reflection. The corrected relative reflectance \(\left(R\right)\) was calculated within Spectronon using Eq. 1:

$$\begin{array}{c}R=\frac{R_0-D}{W-D}\end{array}$$
(1)

where \(D\) is the reflectance of a reference dark image (camera lens covered) and \(W\) is the reflectance of a white Teflon sheet that reflects around 99% of incident light (Ariana et al., 2006). This method helped to correct the spectral curve of the fruit surface. The 100% reflectivity was scaled to 10,000 (integers) by default.

Partial least squares regression (PLSR)

The ROI of each fruit was treated as a sample when PLSR was applied. The spectral outliers in the samples, if any, were detected and removed using a Hotelling’s T2 test within a 95% confidence level (Morellos et al., 2016). The reflectance data were divided into calibration sets and test sets using the Kennard-Stone algorithm (‘Hass’: N = 256 for the calibration set and N = 60 for the test set; ‘Shepard’: N = 128 for the calibration set and N = 32 for the test set). Spectral data transformations such as Savitzky-Golay first, second and third derivatives, standard normal variate (SNV), orthogonal signal correction (OSC) and multiplicative scatter correction (MSC) were performed on the calibration sets. These transformations aim to reduce undesired effects such as light scattering or uncontrolled external factors (Rinnan et al., 2009; Bai et al., 2018). PLSR models were developed using both the raw and transformed spectral data to correlate the number of days until ripe with the relative reflectance measured in the full spectral range (Wold et al., 2001). Models were developed with the raw number of days and the log-transformed number of days until ripe, and the model with the better fit was selected. The root mean square error (RMSE) values were exponentiated if the best model was based on the log-transformed number of days until ripe, so that the RMSE values could be interpreted and compared more easily with the DCNN regression and classification models. The models were cross-validated using a 10-fold cross-validation technique. The coefficient of determination (R2) and RMSE were used as assessment metrics (Guo et al., 2021). For more detail on the PLSR specifics and calculations, see Tahmasbian et al. (2017, 2018a). After finding the best transformation for the data set, the number of spectral wavelengths was reduced stepwise by leaving the wavelengths with low β-coefficients out of the model (Tahmasbian et al., 2017). Removing wavelengths with low β-coefficients was continued until the model fit decreased. Removing unimportant wavelengths can facilitate the computation of the model and increase its accuracy (Wold et al., 1996; Kamruzzaman et al., 2012; Tahmasbian et al., 2018b). Then the ratio of prediction to deviation (RPD) was calculated using Eq. 2:

$$\begin{array}{c}RPD=\frac{{SD}_t}{{RMSE}_t}\end{array}$$
(2)

where \({SD}_{t}\) is the standard deviation of the observed values and \({RMSE}_{t}\) is the root square error of the prediction from the test set (t). Transformations, outlier detection and removal, and all parts of model development were performed with Unscrambler software (CAMO, Norway, Version: 10.5.1).

Deep convolutional neural network (DCNN)

The regression and classification models based on DCNN were developed for ‘Hass’ and ‘Shepard’ fruit, respectively. One model per cultivar was designed for each method because the two cultivars have different physical and chemical properties. All DCNNs used in our study were trained on the University of Melbourne’s SPARTAN HPC system (Lafayette et al., 2016).

Data augmentation and preprocessing

The datasets were preprocessed due to size limitations before training the DCNNs. The size of each HSI was over 1000 × 1000 pixels and each HSI had 462 bands, which was too large as input for a DCNN. Principal component analysis (PCA) was performed to reduce the dimensionality of the data (Khoshelham & Oude Elberink, 2012; Sifre & Mallat, 2013; Sun et al., 2019). The first six principal components (PC), which contained over 99.5% of the overall information of each HSI, were selected (Fig. 1).

Fig. 1
figure 1

The sample a HSI of an avocado fruit and its bk 1st to 10th principal components

The reflectivity of the whole surface of each fruit was not homogenous due to the illumination condition (Fig. 1). Therefore, each HSI was segmented into smaller sub-images to ensure that the DCNN was trained with sample images exhibiting some variance, and to enable a robust and reliable prediction. An image segmentation method was applied to exclude the tray as background because the spectral reflectance of the tray did not differ significantly from those of the fruit surface in some wavelength bands (Supplementary Fig. 1). The top left corner point of each sub-image was fixed and then these sub-images were cropped out of the raw HSI to eliminate background interruption. A total of 12 sub-images from each ‘Hass’ image and 8 sub-images from each ‘Shepard’ image were finally cropped out due to the different shapes of the cultivars (Fig. 2). The size of each sub-image was set to 150 × 150 pixels. The spectral reflectance of all sub-images was normalized to the interval of [0, 1] based on the maximum (150,000) and minimum (− 150,000) pixel values. This batch processing method avoided the interruption of inconsistent normalization standards between different sub-images. In total, 3792 sub-images for ‘Hass’ and 1280 sub-images for ‘Shepard’ were obtained, and all of these sub-images were randomly split into training (70%), validation (20%), and test (10%) sets (Supplementary Table 1).

Fig. 2
figure 2

Sub-images cropped out from the whole HSIs of a ‘Hass’ and b ‘Shepard’ avocado fruit

DCNN regression models

A DCNN for regression was developed to predict the number of days until each avocado fruit became fully ripe. ‘Hass’ and ‘Shepard’ sub-images shared identical architecture but their fully connected layers differed slightly (Fig. 3). All convolutional kernels were set to 1 × 1 to ensure that the DCNN could be trained sufficiently when the training samples were limited (Han et al., 2020). Selecting a large kernel size would extract and emphasize the edge, corner, or other spatial features of each sub-image, which may lead to inaccurate predictions because these visual properties of an avocado fruit surface cannot reflect the quality accurately (Vega Díaz et al., 2020).

Fig. 3
figure 3

Architecture of deep convolutional neural networks for regression on two avocado fruit datasets

Each convolutional layer in the DCNN for regression was followed by a batch normalization layer except for the last layer of each cultivar. Batch normalization of layers helps to keep the input distribution consistent and accelerates the convergence (Cooijmans et al., 2017). Two dropout layers followed by fully connected layers were also inserted before the output layers to avoid overfitting (Bisong, 2019). All activation functions were set to Leaky ReLU instead of the widely used ReLU because Leaky ReLU performs better on small datasets and assigns the non-zero output to retain the feature information of the input (Xu et al., 2015; Zhang et al., 2017). The two DCNN regression models built for ‘Hass’ and ‘Shepard’ sets were both trained using the same hyperparameters but different learning rates and batch sizes, which were experimentally set to 0.0002 and 128 for ‘Hass’ and 0.0005 and 96 for ‘Shepard’ avocado fruit, respectively (Supplementary Table 2). The initialized biases of our DCNNs were both set to zero and our initialized weights were set to be ‘Glorot_uniform’.

The Sigmoid function was chosen as the activation function for the output layer because all the inputs were normalized to the range of 0–1. Mean squared error (MSE) (Eq. 3) was chosen as the loss function for both regression models. A higher MSE means the prediction deviates more from the actual values.

$$\begin{array}{c}MSE=\frac1n\sum_{i=0}^n{(y_i-\widehat{y_i})}^2\end{array}$$
(3)

where \(n\) is the number of samples, \({y}_{i}\) is the measured value of the \(i\)th sample, and \(\widehat{{y}_{i}}\) is the predicted value of the \(i\)th sample.

For further assessment, R2 and RMSE of DCNN regression results were also computed to consistently compare with PLSR results.

DCNN classification models

A classification model would be appropriate if the end users prefer an estimate in the form of a discrete variable or category label such as ‘unripe’ and ‘fully ripe’. One DCNN classification model was designed for each cultivar (Fig. 4) to test whether classification performs better than regression. All inputs were normalized to the interval [0, 1] as in the DCNNs for classification. Each of the convolutional layers was followed by a batch normalization layer and all activation functions were chosen to be Leaky ReLU except for the last output layer, which used the Softmax function instead. These two DCNNs were also trained using different batch sizes (Supplementary Table 3) and all ‘Hass’ and ‘Shepard’ fruit were classified into 14 and 7 categories, respectively, representing the number of days until ripe (Supplementary Table 4).

Fig. 4
figure 4

Architecture of deep convolutional neural network for classification of a ‘Hass’ and b ‘Shepard’ sub-images

For both DCNN classification models, categorical cross-entropy (CCE) was used as the loss function. CCE represents the difference between two probability distributions and is defined by Eq. 4 (West & O’Shea, 2017):

$$\begin{array}{c}CCE=-\frac1n\sum\nolimits_{i=0}^n\sum\nolimits_{j=0}^m\left(y_{\left(i,j\right)}\times log\left({\widehat y}_{\left(i,j\right)}\right)\right)\end{array}$$
(4)

where \(n\) is the number of samples, \(m\) is the number of one-hot codes for each class, which equals the number of categories for each dataset, \({y}_{\left(i,j\right)}\) refers to the \(j\)th binary value of the \(i\)th sample and \({\widehat{y}}_{\left(i,j\right)}\) refers to the predicted value of \({y}_{\left(i,j\right)}\).

Assessing metrics such as the overall accuracy, precision, recall, and F1-score are common practices for evaluating the classification performance, but cannot properly reflect the discrepancy (in days) between the predicted ripening time and the ground truth value (GT). Instead of these typical metrics, RMSE was used to address this issue and measure the accuracy of the prediction, which also enabled a better comparison between our classification model and the other regression models.

Results

Partial least squares regression (PLSR)

PLSR models that predicted the ripening time of ‘Hass’ avocado fruit with RPD ≥ 1.4 (Fig. 5a) were fitted. Models with an RPD value above 1.4 provide good predictions (Bellon-Maurel et al., 2010). The best-fit PLSR model for the ripening time of ‘Hass’ fruit provided R2cal = 0.74 and RMSEcal = 1.18, followed by R2val = 0.68 and RMSEval = 1.20 in the cross-validation using the OSC transformed dataset (Fig. 6). This model provided prediction abilities for the test set of R2 = 0.76 and RPD = 1.82 after wavelength reduction. We could not fit a model with high prediction accuracy of the ripening time for ‘Shepard’ fruit (Fig. 5b). The best-fit PLSR model for ripening time of ‘Shepard’ provided R2cal = 0.47 and RMSEcal = 1.13 for calibration, and R2val = 0.43 and RMSEval = 1.13 in the cross-validation after the spectra were OSC transformed (Fig. 6). This model provided prediction abilities for the test set of R2 = 0.50 and RPD = 1.13 after wavelength reduction.

Fig. 5
figure 5

Predicted and measured ripening time (days) of a ‘Hass’ and b ‘Shepard’ avocado fruit by PLSR

Deep convolutional neural network (DCNN)

DCNN regression

The training loss and the validation loss curves converged after 5000 epochs in the DCNN regression models for both cultivars (Supplementary Fig. 2). The DCNN regression models provided R2 of 0.77 and 0.59, and RMSE of 1.43 and 0.94 days for the ‘Hass’ and ‘Shepard’ test sets, respectively (Table 1, Fig. 6).

Table 1 Root mean square error (RMSE) and the coefficient of determination (R2) of predictions by two regression methods for ‘Hass’ and ‘Shepard’ avocado fruit
Fig. 6
figure 6

Predicted and measured ripening time (days) of a ‘Hass’ and b ‘Shepard’ avocado fruit by DCNN regression

DCNN classification

The training and validation accuracies of DCNN classification stabilized after 5000 epochs for ‘Hass’ and ‘Shepard’ avocado fruit (Supplementary Fig. 3). Most samples were correctly classified although there was no correct prediction for some classes, like for the ground truth class label of 4 days to ripen in the ‘Hass’ test sets (Fig. 7). A similar problem occurred for ‘Shepard’ fruit that took more than 11 days to ripen (Fig. 7), with the RMSE of this class reaching a maximum of 3.00 (Table 2). However, this classification result was still acceptable because the mean RMSEs of both cultivars were below 2 days (Table 3).

Fig. 7
figure 7

Confusion matrix of the overall DCNN classification results on the a ‘Hass’ and b ‘Shepard’ avocado-fruit test sets

Table 2 Root mean square error (RMSE) of each ripening time category for ‘Hass’ and ‘Shepard’ test sets
Table 3 Accuracies, categorical cross-entropy (CCE), and root mean square error (RMSE) of the DCNN classification model for ‘Hass’ and ‘Shepard’ avocado-fruit test sets

Discussion

PLSR and DCNN regressions showed different capabilities of prediction but both performed well, having low RMSE and high R2 and, thus, providing high accuracy in predicting the ripening time of avocado fruit. Predicting the ripening time of ‘Hass’ and ‘Shepard’ avocado fruit is, therefore, possible using HSI and the spectral recognition based upon it. It was also found that DCNN-based classification can be used as an alternative approach to predict the ripening time of avocado fruit, although this method requires improvement.

The accuracy of predicting the ripening time of ‘Hass’ and ‘Shepard’ fruit was model-dependent. It was found that the R2 in predicting the ripening time of ‘Hass’ fruit was similar using either PLSR or DCNN regressions, but that PLSR provided lower RMSE than did the DCNN regression model. In contrast, the DCNN regression model outperformed PLSR for the ‘Shepard’ fruit. The R2 of our DCNN regression model was about 0.1 higher than using PLSR, and the RMSE of the DCNN regression model was about 0.2 days shorter than using PLSR (Table 3). Convolutional neural networks (CNN) provided higher accuracy than PLSRs for predicting dry matter concentrations of mango fruit or estimating the moisture concentrations and solid soluble concentrations of pear fruit (Mishra et al., 2021; Mishra & Passos, 2022). Foliar N concentrations of tomatoes and the geographical origins of narrow-leaved oleaster fruits have been predicted with similar accuracies using both CNN and PLSR on hyperspectral images (Gao et al., 2019; Pourdarbani et al., 2021). In our study, the prediction accuracy of DCNN regression was only higher than PLSR when applied to ‘Shepard’ avocado fruit. PLSR applied to ‘Hass’ provided a slightly more precise prediction than that from DCNN regression. Hence, our study revealed that regression performance was cultivar specific in avocados. Moreover, our optimal prediction in ripening time showed greater prediction accuracy than that of another study which predicted the ‘Hass’ avocado ripeness based on Vis-NIR spectroscopy, R2 of 0.77 vs. R2 of 0.63, respectively (Melado-Herreros et al., 2021). In comparison, our prediction not only presented a far better result where the highest R2 reached 0.77 but also straightforwardly provided the specific time to ripen rather than indirect indices.

R2 and RMSE values for the ‘Hass’ test sets calculated by either PLSR or DCNN regression were higher than for ‘Shepard’. A larger sample size enables deep learning models to better fit nonlinear relationships (Nasir & Sassani, 2021). In our study, the sample size and the number of sub-images for ‘Hass’ were higher than for ‘Shepard’, potentially explaining the higher R2 for ‘Hass’ compared with ‘Shepard’. Although a balanced dataset usually leads to superior regression performance, the wider range of labels of the ‘Hass’ training sets can explain the higher RMSE values for ‘Hass’ compared to ‘Shepard’. For example, the maximum error possible for a prediction on ‘Hass’ was 14 days (2–16 days) but the maximum error for ‘Shepard’ was only 6 days (6–12 days), and thus the influence of a balanced dataset was reduced due to the differences in data ranges. Each individual prediction could lead to higher RMSE values in ‘Hass’ compared to ‘Shepard’. Despite the poor performance on ‘Shepard’ avocados, the prediction on ‘Hass’ avocados was reliable, with the R2 of all methods above the acceptable standard value of 0.66 (Williams et al., 2019; Posom et al., 2021; Wei et al., 2022).

Our classification results showed similar prediction accuracies between ‘Hass’ and ‘Shepard’, with R2 of the test sets being 67.28% and 64.06%, respectively. The models can be improved in future studies, and will need to include samples from various locations, seasons and cultivars. Some studies have shown that, when the sample size is small and the distribution of values is unbalanced, the classification model provided superior prediction performance when compared with regression (Han et al., 2020). In our study, however, the sample size was large and the distribution of classes was rather balanced, which could explain why the prediction of ripening time in classification was less accurate than regression. Additionally, the larger number of classes may have increased the complexity of the classification tasks, causing inaccurate results in the classification models (Mezgec & Koroušić Seljak, 2017). More importantly, the classification loss did not truly reflect the accuracy of prediction, either in training or in inference. For example, for a ground truth value of x days, a prediction of x + 1 and a prediction of x + 10 both had the same classification error. This resulted in poor classification performance as the model learned to minimize any loss rather than make an accurate prediction. In our study, 14 and 7 classes for ‘Hass’ and ‘Shepard’ cultivars were presented, respectively. Hence, our classification study was more complex than previous studies (Han et al., 2020; Liu et al., 2020), which only had 3 or 4 categories. Our study confirmed that using regression would lead to higher prediction accuracy than would classification when high numbers of samples and classifications exist.

Conclusion

Our study showed that combining HSI technologies with PLSR regression, DCNN regression, and DCNN classification is useful to predict the ripening time of ‘Hass’ and ‘Shepard’ avocado fruit. Our prediction straightforwardly presented the days to ripen, unlike most previous studies that focused on various indirect indices to estimate ripening stages. The optimal prediction of ripening time in days was with an RMSE between 0.94 and 1.52 days, regardless of method and cultivar. DCNN was applied for the first time to predict the days to ripen for avocado fruit and it provided acceptable prediction accuracy. Deep learning approaches were proven better compared with PLSR to predict avocado ripening time, and the prediction on ‘Hass’ avocado was more accurate than that of ‘Shepard’ avocado. A larger and more balanced dataset can help build a more reliable system to predict the ripening time of avocado. Our study highlights the strong potential of HSI to predict the ripening time of mature, unripe avocado fruit, which would allow processors and retailers to optimize the duration of fruit storage, select the most suitable timing for retail display, and minimize losses throughout the food supply chain.