Introduction

Effective storage of potato tubers is important to maintain quality attributes in fresh tubers. These attributes include specific gravity, carbohydrate content, glycoalkaloids content, flesh and skin color, and the absence of internal and external defects such as bruises, physiological disorders and sprouting [1]. Such attributes have significant effects on the final value of the tubers and the quality of any final potato products. Purchasers of fresh potatoes always look for tubers with no visual signs of defects. For the processed product markets, several quality factors should be preserved during storage including dry matter, sugar content, size, in addition to the absence of external damage [1]. Potato tubers, as with other fruits and vegetables, continue their biochemical activity, especially respiration, after harvest and during storage [2]. Potato tubers usually experience a dormancy period, which lasts 5 to 19 weeks after harvesting which is mainly dependent on the cultivar and storage conditions. During dormancy, tubers do not sprout, even with the availability of optimum sprouting conditions [3, 4]. Following dormancy and when tubers are subjected to relatively optimal conditions for sprouting, including warmer temperatures (10–20 °C), sprouts [meristematic regions of the tubers (eyes)] grow at a low rate, which increases until one sprout is dominant over others [5]. Examples of a sprouted eye and leaf primordial in a potato tubers are shown in Fig. 1 [6]. Sprouting is related to physiological aging, a concept that was first introduced to study the effect of various physiological stages of seed tubers on crop growth [7]. Later, it was stated that physiological age is “the physiological state of the tuber at any given time which is illustrated by the degree of visible sprout development” [8]. Potato sprouting can be shown by the leaf primordia that are initiated from the tuber shoot apex [9]. Leaf primordia production in potato sprouts are usually signaled by the initiation of the inflorescence, which in turn is associated with a change in hormonal production [9]. In rare cases, the hormonal balance may be compromised and the inflorescence is not produced resulting in continued production of leaf primordia. The number of leaf primordia within the developing sprouts is an indirect indication of tuber maturity or physiological age [9]. The most important factor affecting the physiological age during storage is the temperature, which controls the biochemical activity in tubers after harvest [10]. Uncontrolled sprouting results in several drawbacks including a reduction in the marketability of tubers and an increase of tuber physiological age [11]. Moreover, sprouting causes weight loss due to dehydration and respiration, increases levels of glycoalkaloids (that are toxic), increases sugar levels (undesirable for frying), and decreases vitamin content [2]. Thus, uncontrolled sprouting decreases the end-use quality of potato tubers, and requires continuous monitoring during storage to determine when sprout suppressants should be applied. Factors that affect sprouting rate include cultivar, previous and current storage conditions, presence of damage and diseases, and degree of maturity [2].

Fig. 1
figure 1

a A tuber eye and its leaf scar and axillary bud, b Stolon attachment heel end, and cc rose end [6]

Several chemical and non-chemical methods are used to inhibit sprouting of potato tubers during storage. Among chemical sprouting inhibitors, isopropyl N-phenylcarbamate (ICP, propham), isopropyl N-(3chlorophenyl) carbamate (CIPC, chloro-IPC, chloropropham), and maleic hydrazide (MH) are widely used around the world [12,13,14]. ICP and CIPC cannot be applied on seed potatoes, as their sprouting inhibition is irreversible [2, 14]. Non-chemical sprouting inhibitors involve low temperature storage (a common technique to store seed tubers), irradiation treatment, controlled atmosphere storage, and developing cultivars that have longer dormancy periods [2, 9, 14]. The application of non-chemical sprouting routines mainly depends on the end-use, i.e., potato cultivars dedicated for processing should not be stored in low temperatures (< 4 °C) to avoid the increase of reducing sugar levels which results in browning color of tuber products after frying [15]. While sprouting inhibitors are effective in extending the shelf life of potatoes for up to 10 months, they are expensive and should only be applied when the likelihood of sprouting is high [15]. Therefore, there is a need for measurement techniques that can rapidly detect primordial LC and therefore sprouting activity so that an effective sprouting suppressing strategy can be developed.

Spectroscopic systems have been used extensively for rapid and noninvasive monitoring of quality attributes in agricultural and food products. Ultraviolet (UV), visible (Vis), and near-infrared (NIR) spectroscopic techniques have been successfully developed into commercial technologies for sorting and grading fruits and vegetables based on shape, size, defect presence, and color. In the case of potatoes, several studies have been conducted for using spectroscopic systems in evaluating various constituents including specific gravity [16], dry matter [16,17,18], and carbohydrates [19,20,21,22].

The evaluation of sprouting activity of potatoes has been studied using diffuse reflectance spectroscopy. Spectra in the range of 400–2500 nm were acquired from tubers of Superior and Atlantic cultivars over the 2004 and 2005 seasons [23]. Laboratory measurements were conducted by weighing sprouts resulting from each tuber placed within a dark environment for 4 weeks at 18 °C, and expressed as the percent of weight of sprouts with respect to the fresh tuber weight [24]. Calibration models were cross-validated and the coefficient of determination (R2) was as high as 0.93 with a tandard error of calibration using cross validation (SECV) of 0.40. The potential of using Vis/NIR spectroscopic and hyperspectral imaging for building predictive models for primordial leaf count (LC) for Frito Lay 1879 (FL1879) and Russet Norkotah (R. Norkotah) cultivars has been conducted [19]. Results showed that the best regression model had a correlation coefficient (Root Mean Square Error of Prediction (RMSEP)) or R values of 0.89 (0.3285) and 0.77 (0.3560) for FL1879 and R. Norkotah, respectively.

In another study conducted by Garnett [25], an intensive investigation using spectroscopic systems was conducted to track the sprouting process of potato tubers in a real-time. Several spectroscopic sensors were utilized with the following wavelengths: 500–1100 nm, 200–1100 nm, and 900–2300 nm. It was deduced that the spectral data could be correlated with the sprouting process through the detection of chlorophyll produced on the tubers’ skin. The study also showed that the use of such spectroscopic systems in the range of 600–750 nm could effectively track the change in chlorophyll levels, which is linked to the early sprouting activity of tubers. Ji et al. [26] used a hyperspectral imaging system (400–1000 nm) to detect different defects in potatoes including sprouting. A Multiclass Support Vector Machine (MSVM) was chosen to classify tubers based on various defects. For sprouting, it was shown that sprouts could be identified with an accuracy of 90.9%. Multi-sensor data fusion is not a new technique and it has been extensively used to enhance the quality evaluation of food products. Fusion of Fourier-Transform mid-infrared (FT-MIR) and NIR sensors was implemented to enhance origin identification of Panax notoginseng using ensemble methods and the classification accuracy values increased from 91.2% to 97.1 with fusion methods [27]. Other examples included detecting fish freshness during preservation using electronic nose (E-nose) and electronic tongue (E-tongue) fused data [28], fusing computer vision, NIR, and texture data to assess fish status from fresh to frozen-thawed [29], fusing computer vision, E-nose, and NIR to determine the total volatile basic nitrogen in pork [30], and combining E-nose, E-tongue, NIR, and FT-IR data to classify the botanical origin and evaluate adulteration of honey [31].

While previous studies showed good correlation between sprouting activity or primordial leaf count as an indicator for the sprouting versus electronic measurements, no studies investigated the classification of potatoes based on sprouting activity during storage or investigated multi-sensor data fusion methods. Moreover, no generic classification models were developed that could be used to determine sprouting activity on any cultivar. Such models are valuable for helping storage managers decide the best times for applying sprouting inhibitors to maintain sprouting levels that do not affect tuber end-use. Consequently, the objectives of this study were to:

  1. 1

    Compare the ability of three different spectroscopic techniques (Vis/NIR spectroscopic, Vis/NIR hyperspectral imaging, and NIR transmittance) to identify sprouting activity for two different potatoes cultivars under different storage conditions.

  2. 2

    Build generalized classification models to determine sprouting activity in potatoes from measurements from the different spectroscopic systems on whole tubers and sliced samples.

  3. 3

    Investigate how different classification methods, spectra pre-processing methods and sensor fusion techniques affects the performance of the classification models.

Materials and methods

Raw materials, sampling, and measurements of primordial leaf count

The experiments were conducted on cultivar or cv. Frito Lay 1879 (FL1879), commonly used for frying (chips and French fries), and cv. Russet Norkotah (R. Norkotah), mainly used for baking and boiling Counts of leaf primordia were conducted by taking samples of meristematic tuber tissue (eyes) from each tuber (n = 3) on the apical end of the tuber. The sprouts were stored in 5 ml Eppindorf tubes in an ethanol:acetone solution (1:1) until evaluation. Sprouts were mounted on slides and examined at 10x magnification under a dissecting Olympus microscope (model No. ZT40, Olympus Corp., Tokyo, Japan). Leaf initials were removed sequentially from the outside to inside of the sprout using a scalpel until the apical dome was exposed [9, 19]. The primordial leaf count per tuber was defined as the average of leaf primordium counts obtained for the three eyes. The samples were sliced form the stem end. Three slices, each 12.7 mm thick were produced and the third slice was used for measurements. This slice was therefore 15.4 mm from the stem end of the tuber. [19]. There were 200 samples of each cultivar making a total of 400 tested tubers. While testing whole tubers is beneficial for applications such as tuber sorting or grading there is also value in performing measurements on sliced samples. Slices will not be affected by the tuber skin or any dirt on the external surface. In addition, there is often the need to measure key quality parameters of peeled and sliced potatoes during processing, e.g. chip production.

Spectroscopic measurement

Electronic measurements were conducted on whole tubers and sliced samples. Incident light was directed on the middle area of the tuber, which was approximately the center point of the longitudinal axis. Each tuber was scanned once with each spectroscopic system. Each sliced sample was scanned such that the light was directed towards the middle area of the slice. Three systems were used to acquire measurements; Vis/NIR spectroscopic system operating in interactance mode, Vis/NIR hyperspectral imaging, and NIR transmittance systems. Table 1 shows a comparison between the spectroscopic systems used for measurements in this work.

Table 1 The main characteristics of the electronic systems used to scan the potato samples to determine primordial leaf count

Vis/NIR interactance system

The interactance system used in the experiments contained a spectrometer (model No. USB 4000, Ocean Optics, Inc., Dunedin, FL, USA) equipped with a 200 µm diameter fiber optic that has an optical resolution of 0.3 nm (full width half maximum or FWHM), a radiometric power supply with a maximum power of 250 W (model No. 68931, Oriel Inst., Irvine, CA, USA), and a light source (model No. 66881, Oriel Inst., Irvine, CA, USA) with the same maximum power and wavelength range of 446 to 1125 nm (Fig. 2a). More information on the system description and the calculation of the relative reflectance can be found on Rady et al. [20]. The system was used to scan whole tubers and sliced samples.

Fig. 2
figure 2

Schematic representation of a the visible/near-infrared interactance system, b the near/infrared transmittance system, and c the visible/near-infrared hyperspectral imaging system (after Rady et al. 2019 with modification)

NIR transmittance system

The transmittance system used in this study and shown in Fig. 2b, compromises of a radiometric power supply that has a maximum power of 300 W (model No. 68931, Oriel Inst., Irvine, CA, USA), a light source that includes a quartzes tungsten halogen lamp (model No. 66881, Oriel Inst., Irvine, CA, USA), and an InGAas spectrometer (model No. NIR512L-1.7T1, Control Development, Inc., South Bend, IN, USA). The signals were acquired in the wavelength range of 900–1685 nm [19]. The relative transmittance was calculated using a reference disk made from Teflon® using the same technique used for calculating the relative interactance. The reference disk had approximate diameter and thickness of 25 mm and 15 mm, respectively. Only sliced samples were scanned using the transmittance system as the spectra acquired from whole tubers contained a considerable amount of noise.

Vis/NIR hyperspectral imaging system

In this study, a hyperspectral imaging system was used to capture reflected, back-scattered light, under the reflectance mode, from whole tubers and sliced samples in the range of 400 to 1000 nm. The system consisted of a Hamamatsu dual mode cooled CCD camera (model No. C4880, Hamamatsu Photonics, Hamamatsu, Japan), an imaging spectrograph directly attached to the CCD camera (ImSpector V10, Spectral Imaging Ltd., Oulu, Finland) along with a power supply control (model No. 69931, Oriel Instruments Irvine, CA, USA), a digital exposure controller (model No. 68945, Oriel Instruments, Irvine, CA, USA), and a light source (model No. 66881, Oriel Instruments, Irvine, CA, USA) containing a 250 W quartz tungsten halogen lamp and having a lens material transmittance range of 350–2500 nm. A fiber optic cable coupled with a lens focusing assembly was used to deliver a broadband light beam of 1.5 mm diameter, arranged at a 15° angle away from the vertical axis, onto the middle of the sample which was 1.6 mm away from the scanning line (Fig. 2c). The imaging spectrograph acquired spectral information for each scanning line, which created a 2-D image containing 256 × 256 pixels with a spatial resolution of 0.2 mm/pixel and a spectral resolution of 2.35 nm. The sample holder could move vertically with a motorized stage so that images could be recorded at different locations on the samples. The distance between two successive scans was set at 1 mm, and a total number of 10 images (scans) were acquired for each tuber, covering a 9 mm longitudinal distance along the sample. The acquisition time for each image was 200 ms, so the total time to scan one slice (10 images) was 2 s [32].

Data analysis

Extraction of mean reflectance spectra (MRS) and wavelength selection

The average reflectance spectra for the hyperspectral system were calculated using all wavelengths in the range of 400–1000 nm. The arithmetic mean was calculated for the intensity values covering the spatial scattering distance of 16 mm [32]. The spectra were then normalized by the Teflon® reference and the mean reflectance spectra (MRS) were extracted for each sample [19]. The overall data analysis procedure applied in this study is shown in Fig. 3.

Fig. 3
figure 3

A schematic representation of the data analysis procedure

Reducing the number of variables, in multivariate analysis, can overcome potential overfitting problems associated with relatively high dimensional data [33]. Therefore, in the case of spectroscopic systems, wavelength selection yields an improvement of correlation robustness and computation time without a considerable decrease in performance [34, 35]. To extract the most effective wavelengths from the data, Sequential Forward Selection (SFS) was implemented. This method begins with an empty model and a new feature is added at each time and testing the importance of the added feature [36, 37]. If the added feature boosts the classification performance, then it is kept in the model; Otherwise, the features is discarded. In this study, the most influential wavelengths were selected by applying the K-nearest Neighbor (Knn) classifier and the criterion was the classification error. SFS was implemented using the statistical and machine learning MATLAB® Toolbox.

Classification of potato tubers based on primordial leaf count

Various machine learning techniques were applied in this study for the classification of potato tubers based on levels of primordial leaf count. K-nearest neighbor (Knn), Partial Least Squares Discriminant Analysis (PLS-DA), Artificial Neural Network (ANN), and Soft Independent Modeling of Class Analogy (SIMCA) were implemented for determining the classification models for potato samples based on the primordial leaf counts. Knn is a nonparametric classification method that requires no model to fit or classify the point or the sample. The distance between the point and the selected neighbors (K) is calculated and the sample is then classified to the nearest class or the class having more data points in the neighbor of the test point [38, 39]. Moreover, the Knn classifier is simple to implement and suitable for multicategory classification tasks [33]. The K value selected in this study was 5. PLS-DA is a linear regression classification-based method that has the advantage over Partial Least Squares Regression (PLSR) technique of tolerance to collinear data, i.e. spectroscopic data, and noise reduction [34]. In the PLSR technique, the original data set is transformed into new space using the information on independent and dependent variables [33]. The advantage of PLSR is that it can handle data that is collinear such as spectroscopy data without having the problem of overfitting [33]. The PLS-DA classifier included 20 latent variables with no data scaling. The ANN classifier implemented in is a feed forward which is a nonlinear, supervised learning, adaptive classification technique [40]. In this study. The ANN classifier contained three layers, the first is the input layers containing the features, a hidden layer with 50 neurons and a logarithmic transfer function and a scaled conjugate gradient backpropagation function, and an output layer containing the classes. The configuration of classification methods was based on preliminary analysis such that the parameters yielding the lowest classification error for each technique were chosen. In the case of the SIMCA method, Principal Component Analysis (PCA) was conducted individually on the features or predictors of each class and consequently each class might be described by different number of Principle Components (PC). Then a new object or sample was assigned to the class that its PCs were located closest too [41].

The spectral data was preprocessed to reduce noise resulting from various sources in each spectroscopic measurement system [42]. The preprocessing methods utilized in this study were mean centering, smoothing using first derivative, smoothing using second derivative, Standard Normal Variate (SNV), Multiplicative Scattering Correction (MSC), Savitzky-Golay, in addition to the raw data without preprocessing [33]. In the case of first or second derivative, a numerical differentiation was conducted to obtain the desired derivative. Assuming Xij_pre, and Xij_org represent the preprocessed and the original spectrum, respectively, located on the ith row and the jth column in the data matrix, the following equations can be used for each preprocessing method:

$$X_{{ij\_pre}} \left( {mean{-}centered} \right) = ~X_{{ij\_org}} - ~\overline{X}_{j}$$

where is the mean of the jth column.

$$X_{{ij\_pre}} \left( {SNV} \right) = \frac{{X_{{ij\_org}} - \overline{X} _{i} }}{{S_{i} }}$$

where \(\overline{X} _{i}\), and \({S}_{i}\) are the mean, and the standard deviation of the ith row, respectively.

$${X}_{ij\_pre}\left(MSC\right)= \frac{{X}_{ij\_org}- {a}_{i}}{{b}_{i}}$$

where \({a}_{i}\), and \({b}_{i}\) are constants obtained using Ordinary Least Squares (OLS) regression.

$${X}_{ij\_pre}\left(Savitzky{-}Golay\right)= \frac{1}{N} \sum _{h= -k}^{k}{C}_{h} {X}_{i{j\_}_{org}+ h}$$

where N is the, k is the order of the polynomial equation used for fitting, Ch are coefficients that depend on order of the polynomial equation (i.e. k).

In the case of Savizky–Golay, the 5th order polynomial (k = 5) and a window size of 25 (i.e. the number of Ch coefficients is 25) were applied for smoothing. In the case of the LC data, the Box–Cox preprocessing method was chosen to obtain a uniform distribution before feeding the data into the classification algorithms to obtain the two classes mentioned later [33]. The data was divided into a training set (80%) and a testing set (20%). A 4-fold cross validation technique was conducted on the training set to increase the robustness of the classification models and avoid overfitting. Samples were divided into two classes based on the cut-off primordial count value which was chosen as the median value of the concatenated LC values from both cultivars and was 10.33. This value was chosen to create two balanced classes and it also represented relatively low LC values typically found in sprouts after harvest, and effectively a baseline [9]. Preprocessing methodologies were developed using MATLAB® software; Knn, and ANN classifiers were conducted using the statistical and machine learning MATLAB® Toolbox. In the case of PLS-DA, and SIMCA were implemented using the classification toolbox for MATLAB® created by Davide Ballabio (Milano Chemometrics and QSAR Research Group, University of Milano-Bicocca, Milan-Italy) [43]. Fusing data from spectroscopic and hyperspectral imaging systems was also conducted in this study. The selected wavelengths from both systems were concatenated for each sample and data in each wavelength (column) was normalized to be in the range of (− 1, 1) (i.e. dividing each value in the columns by the maximum value in the same column) to prevent any possible bias resulting from the different data scales.

Results and discussion

Constituent distribution and spectra for whole tubers

The distribution of primordial leaf count obtained, using the boxplot technique, for the data fused from both cultivars is shown in Fig. 4. Minimum, median, and maximum values (4.33 and 10.33, and 57.67) of LC with outlier values located above approximately 40. Some outliers with LC values above 40 were also present in the data. Minimum and median values (4.33 and 10.33) of LC with outlier values located above approximately 40. However, considering the purpose of the study as to develop generic classification models, outlier values were not eliminated in subsequent analysis steps.

Fig. 4
figure 4

Distribution of primordial leaf count using boxplots for cv. FL1879 and cv. R. Norkotah potato cultivars

The mean spectra obtained from the three optical sensors for whole tuber and sliced samples are shown in Fig. 5. Each sub figure shows two spectra, the first is the average of all spectra classified as high LC (LC > 10.33) and the second is the average of all spectra classified as low LC (LC ≤ 10.33). Only the spectra (features) in the range of 501.6–1004.2 nm were considered in the statistical analysis for interactance spectra as other signals outside that range had a very low signal to noise ratio. Therefore, the total number of features used for wavelengths selection were 2701 for the interactance data. In the case of hyperspectral MSR data, the difference in the mean spectra was more noticeable beyond 620 nm in the case of sliced samples than whole tubers which is possibly due to the skin on the whole tubers affecting the signal attenuation. However, the interactance mode showed clearer separation of the two LC classes in the case of whole tubers than sliced samples. This could be attributed to signal saturation due to the detector been in contact with the sample. This close contact is less likely to occur with the whole tuber as it does not have a flat surface. It is worth stating that the shown spectra are resulting from concatenating the spectral data for two cultivars that differ in the skin color, and thickness as the color in R. Norkotah tend to be darker and the skin is thicker than those for the FL1879. Thus, it will be more beneficial to perform classification which includes both cultivars together as will result in a technique which is not applicable to only a single cultivar. The mean spectra of sliced samples obtained from the transmittance data also displayed a similar trend to those acquired from the interactance instrument. An example of applying different preprocessing techniques on hyperspectral imaging sample for sliced samples is shown in Fig. 6.

Fig. 5
figure 5

Average spectra for the two classes of primordial leaf count for sliced samples and whole tubers resulting from a hyperspectral imaging, sliced samples, b hyperspectral imaging, whole tubers, c interactance, sliced samples, d interactance, whole tubers, and e transmittance, sliced samples

Fig. 6
figure 6

An example of applying different preprocessing methods on hyperspectral imaging data for potato sliced samples

Results of selected wavelengths (variables)

The number of wavelengths selected via the sequential forward selection algorithm for the three different spectroscopic systems is shown in Fig. 7. (Fig. 7a for sliced samples and Fig. 7b for whole tubers). Additionally, Table 2 shows the values of the selected wavelengths for sliced samples and whole tubers. In the case of sliced samples, the selected wavelengths from the interactance mode (14) had a greater presence of NIR (11) than visible (3) regions of the electromagnetic spectrum. Hyperspectral imaging showed equal number of selected wavelengths (3) from both regions of the spectrum which illustrates the efficiency of the hyperspectral imaging data to explain the variation between samples with a relatively few number of variables. The transmittance system resulted in a larger number of total selected wavelengths, 17, distributed mainly above 1360 nm in the NIR region of the spectrum. In the case of whole tubers, the selected wavelengths calculated from the interactance data was 4 which were all around 900 nm. This was much smaller than those calculated from the hyperspectral imaging (13) which were in the visible range (447–616 nm). Additionally, the selected wavelengths followed a similar trend as shown in Fig. 5a–e where the larger the difference between the two classes, the less wavelengths needed to explain the variation between the samples. This trend was clearly shown with the wavelengths selected from all sensors. It is generally better to achieve the optimal number of selected variables that can result in classification performance similar to models which utilise all wavelengths. This is because it reduces the computational time required to train and validate the models and doesn’t require a large number of wavelengths which would increase the cost of the sensor required that works inline or in a rapid way for sprouting activity monitoring.

Fig. 7
figure 7

Selected wavelength for hyperspectral imaging, interactance spectroscopy, and transmittance spectroscopy for assessing leaf primordial counts for a sliced samples, and b whole tubers

Table 2 Selected wavelengths from several optical sensors for classifying sliced potato samples, and whole tubers based on primordial leaf count

Results of classification of potato tubers based on leaf primordial levels

The numbers of samples in class 1 (LC < threshold) and class 2 (LC ≥ threshold) were 294 and 106, respectively. For comparison purpose, the classification results based on all wavelengths are shown in Table 3. Whereas, the best classification accuracy using selected wavelengths and obtained from various classification techniques for the models serving both cultivars and obtained from Vis/NIR interactance and Vis/NIR hyperspectral imaging, and NIR transmittance systems as well as fused data are presented in Table 4 for sliced samples, and whole tubers. In both tables, the best classification accuracy values were shaded. For selected-wavelength classification models, in the case of sliced samples, classification accuracy values resulting from the test group showed that the hyperspectral imaging system along with the Knn method generally resulted in the best performance compared to the values obtained from other systems and classification techniques. In such case, the classification accuracy value was as high as 87.5%. In the case of interactance and transmittance systems, the highest classification accuracy values were 82.5 for both systems and obtained using the PLS-DA and SIMCA techniques, respectively. Data fused from different sensors showed some classification improvement only for sliced samples. Fusing data from (1) hyperspectral and interactance, (2) hyperspectral and transmittance, (3) interactance and transmittance, and (4) all systems resulted in accuracy values of 92.5, 95, 82.5, and 95%, respectively with all classification models obtained using the PLS-DA. Fused data contributed to explaining the variation between sliced samples especially when combining transmittance data with hyperspectral or interactance data which illustrates the benefit of using the wavelength ranges in the NIR region. The classification accuracies increased with by up to 7.5% when all systems were fused. Moreover, it is worth stating that among the preprocessing techniques applied, generally, the SNV, and MSC had a positive effect on classification accuracy. SNV provides a baseline correction which overcomes the problem of having any artefacts or noise in the spectra due to any surrounding or uncontrolled conditions. Whereas, MSC helps reduce the effect of light scattering associated with reflectance and transmittance spectra [33].

Table 3 Classification performance for data obtained from individual as well as fused systems using all wavelengths acquired from Vis/NIR interactance, Vis/NIR hyperspectral imaging, and NIR transmittance for classifying potatoes based on primordial leaf count for sliced samples and whole tubers (shaded cells refer to the optimal classification performance)
Table 4 Classification performance for data obtained from individual as well as fused systems using selected wavelengths acquired from Vis/NIR interactance, Vis/NIR hyperspectral imaging, and NIR transmittance for classifying potatoes based on primordial leaf count for sliced samples and whole tubers (shaded cells refer to the optimal classifier performance)

In the case of whole tubers, similar classification results to those for sliced samples were obtained. Hyperspectral imaging data yielded close accuracy for whole tubers reaching as high as 90% using the Knn technique and MSC. However, the models obtained from the interactance data did not perform as well, with the highest classification accuracy being 85% using ANN and MSC. Fusing data, however, was also effective in improving the efficiency of classification models. The highest classification accuracy obtained was 86.3% using the PLS-DA method and applying the mean centering for preprocessing. Based on such results, it can be concluded that measurements on whole tubers was more promising compared with those conducted on sliced samples despite the higher classification for the latter case. The reason for such conclusion is that the measurements conducted on whole are more suitable for those working in potato storage facilities as they are non-destructive and require less sample preparation. Thus, with the classification results obtained, the rapid or inline measurements for early sprouting signs could be possible using the hyperspectral imaging or the interactance spectroscopy with the advantage for the latter system of the considerable lower cost and shorter processing time. Knn and PLS-DA both yielded the best classification accuracy values. The advantage of PLS-DA comes from the fact that it can handle collinear data and works first on transferring the original data using PLSR latent variables then build the classification models based on the obtained dependent values (LC) [33]. The advantage of Knn is it works better with when the number of samples in each class are similar, as is the case in this work [41]. The ANN classifiers in general showed close results to other classifiers especially with having only 2 classes and more large number of samples compared with the number of variables which indeed results in better training to determine more accurate weights between the different network layers [44]. It is clear that the classification results based on selected wavelengths are close to or even better than those obtained using all wavelengths. In general, classification models for whole tubers (individual sensors) and sliced samples (fused sensors) showed better performance than those based on all wavelengths.

Results obtained in this work are comparative to those obtained from previous studies. The study conducted by Garnett [25], produced robust results in terms of detecting sprouting using spectral signature within the wavelengths range (600–750) associated with chlorophyll stimulation in the tuber skin. This current work also yielded some of the selected wavelengths in this range from the interactance and hyperspectral imaging systems. However, the referred study did not involve any classification models of tubers based on sprouting activity and the work does not seem to be practically suitable for online applications. Our study showed that it is possible to develop a low cost handheld technique for tracking the sprouting activity during storage. With the high efficiency of identifying early sprouting listed in the latter study, the tubers need to be manually placed such that that eye faces the detector. Also, our work statistically proved that the sprouting activity can be monitored for whole tubers using a limited number of wavelengths. Additionally, the classification performance obtained from this study is also in line with that produced by Ji et al. [26], where a classification accuracy of 90.9% was deduced for sprouting. However, it is important to state that our study aimed mainly to illustrate the importance of deducing a one classification model that can serve more than one cultivar.

Conclusion

This study investigated the capability of three different spectroscopic systems (Transmittance, reflectance and hyperspectral) and machine learning methods for classifying high or low levels of spouting activity on whole tubers and sliced sampled for two different cultivars of potatoes. The results showed that applying sequential forward selection followed by Knn or PLS-DA on hyperspectral data resulted in a classification accuracy of 90% for whole tubers with slightly lower values for the sliced samples (87.5%). This work has demonstrated the possibility of developing low cost technologies which only required a small number of wavelengths to successfully predict the sprouting activity in the tubers. The interactance system with only 4 wavelengths also produced reasonable classification accuracy for whole tubers (85%) using ANN which could be improved with more data. Although the interactance system resulted in lower classification accuracy than the hyperspectral system, the former is preferred for scanning whole tubers if used individually compared with the latter system taking the cost, simplicity, and shorter processing time for the interactance system. However, the current and future advancement of data acquisition hardware will indeed enable set up a low cost and more powerful hyperspectral imaging device or a multispectral imaging based on the results obtained from the hyperspectral imaging system in this study. Thus, the greater potential for successful industrial applications of such on agricultural grading and sorting systems. Results also showed that fused data enhanced the classification accuracy for sliced samples by up to 7.5% whereas no improvement was achieved for whole tubers. This study is suggested as pioneering in electronically assessing the sprouting status of potato tubers during cold storage. Moreover, the application of the obtained classification results could be transferred to the potato industry by designing a handheld device to accurately and rapidly monitor physiological status of tubers. Thus, supporting the feasibility of improved control of tuber sprouting activity through minimizing the application of chemical sprouting inhibitors at an appropriate physiological development stage. Consequently, the shelf life of tubers could be extended by preserving the desired quality status of tubers used either for table or processing purposes, leading to higher profit potential for growers and processors. It is worth stating that, more cultivars need to be tested to confirm the reproducibility and robustness of the results. In addition, a complementary study could be viable by testing the performance of several common sprouting inhibitors using the Vis/NIR interactance and/or Vis/NIR hyperspectral systems. Performing such studies would validate the impact of the spectral techniques on timing of application of the chemical inhibitors.