A modified feature fusion method for distinguishing seed strains using hyperspectral data

Jingjing Liu; Simeng Liu; Tie Shi; Xiaonan Wang; Yizhou Chen; Fulong Liu; Hong Men

doi:10.1515/ijfe-2019-0362

Published by De Gruyter April 21, 2020

A modified feature fusion method for distinguishing seed strains using hyperspectral data

Jingjing Liu , Simeng Liu , Tie Shi , Xiaonan Wang , Yizhou Chen , Fulong Liu and Hong Men

From the journal International Journal of Food Engineering

https://doi.org/10.1515/ijfe-2019-0362

Showing a limited preview of this publication:

Abstract

Precise classification of seeds is important for agriculture. Due to the slight physical and chemical difference between different types of wheat and high correlation between bands of images, it is easy to fall into the local optimum when selecting the characteristic band of using the spectral average only. In this paper, in order to solve this problem, a new variable fusion strategy was proposed based on successive projection algorithm and the variable importance in projection algorithm to obtain a comprehensive and representative variable feature for higher classification accuracy, within spectral mean and spectral standard deviation, so the 25 feature bands obtained are classified by support vector machine, and the classification accuracy rate reached 83.3%. It indicates that the new fusion strategy can mine the effective features of hyperspectral data better to improve the accuracy of the model and it can provide a theoretical basis for the hyperspectral classification of tiny kernels.

Keywords: feature selection; fusion strategy; hyperspectral data; seed purity; wheat classification

Corresponding authors: Jingjing Liu,College of Automation Engineering, Northeast Electric Power University, Jilin, 132012, China; and Department of Computer Science and Bioimaging Research Center, University of Georgia, Athens, 30602, GA, USA; and Biosensor National Special Laboratory, Key Laboratory for Biomedical Engineering of Education Ministry, Department of Biomedical Engineering, Zhejiang University, Hangzhou, 310027, China, E-mail: jingjing_liu@neepu.edu.cn; and Hong Men,College of Automation Engineering, Northeast Electric Power University, Jilin, 132012, China, E-mail: menhong@neepu.edu.cn

Funding source: National Natural Science Foundation of China

Award Identifier / Grant number: 31871882, 31772059, 31401569

Funding source: China Postdoctoral Science Foundation

Award Identifier / Grant number: 2018M642440

Funding source: Key Science and Technology Project of Jilin Province

Award Identifier / Grant number: 20170204004SF

Funding source: State Scholarship Fund of China Scholarship Council

Award Identifier / Grant number: 201808220037

Funding source: Project of Jilin Science and Technology Innovation and Development Plan

Award Identifier / Grant number: 201751206

Abbreviations and Nomenclature
VIS-NIR: the visible near-infrared hyperspectral imaging technique
SPA: successive projection algorithm
VIP: variable importance in projection algorithm
SVM: the support vector machine
EFF: efficiency value
SENS: sensitivity
SPEC: specificity
GA: genetic algorithm
SNR: the signal-to-noise ratio
RMSE: the root mean square error

Author contribution: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.
Research funding: This work was supported by the National Natural Science Foundation of China (no. 31871882, no. 31772059, no. 31401569); the Key Science and Technology Project of Jilin Province (20170204004SF); the State Scholarship Fund of China Scholarship Council (201808220037); China Postdoctoral Science Foundation (2018M642440). Project of Jilin Science and Technology Innovation and Development Plan (201751206). The experimental sample was provided by Northeast Agricultural University college of Resource and Environment.
Employment or leadership: None declared.
Honorarium: None declared.
Conflict of interest statement: The authors declare no conflicts of interest regarding this article.

Appendix 1

The structure of the hyperspectral imager shows that the hyperspectral image acquisition process of the six wheat grains is as follows:

Basic settings: Hyperspectral data needs to adjust the light intensity and exposure time to ensure the clarity of the image before acquisition. After repeated debugging, the physical parameters of the experiment were: exposure time 10 ms, object distance 77.5 mm, line speed 0.34 mm/s, sampling interval 0.73 nm, image resolution 1344 pixel × 1024 pixel, spectral range 380–1038 nm.

The process of collecting specific image information is as follows

Confirm the model of the camera used and determine the wavelength range to be used for this experiment.
Place the whiteboard directly below the lens and adjust the angle of the condenser to reflect light to the lens. (MAX DN)
Confirm the object distance: This time select the white paper with pure black lines to assist the focus, adjust the object distance to the appropriate height, and ensure that the collected images are black and white.
Determine the line speed: adjust the transmission speed (line speed) of the stepping motor moving platform to ensure that the captured image is consistent with the actual image of the object, preventing the image of the object from being compressed or stretched.
Determine the light intensity: Remove the auxiliary focusing tool, adjust the light intensity and exposure time of the light source to make the MAX DN value reach 80% of the maximum, and collect the image information of ref-white and ref-Dark. Put two image files in a folder for easy subsequent calls.
Confirm the exposure time: Open the lens cover, remove the whiteboard, let the object under test directly below the lens, adjust the exposure time to make the MAX DN value reach 80% of the maximum value, and collect the image information of sample-Dark.
Use “start up” to observe the imaging results in real time. If it is not met, you can modify and adjust it in time.
Correct the sample image information using the calibration procedure.

Appendix 2

Table A1:

The SVM classification results for different subsets of features base on VIP scores.

Feature subset	Classification results		Feature subset	Classification results
Feature subset	Accuracy(%)	Simples identified correctly	Feature subset	Accuracy(%)	Simples identified correctly
#1	46.6667%	56	#25	88.3333%	106
#2	50%	60	#26	85.8333%	103
#3	55%	66	#27	87.5%	105
#4	65.8333%	79	#28	86.6667%	104
#5	65%	78	#29	88.3333%	106
#6	65.8333%	79	#30	85.8333%	103
#7	66.6667%	80	#31	86.6667%	104
#8	65.8333%	79	#32	86.6667%	104
#9	67.5%	81	#33	85.8333%	103
#10	66.6667%	80	#34	86.6667%	104
#11	70.8333%	85	#35	82.5%	99
#12	75%	90	#36	83.3333%	100
#13	80.8333%	97	#37	84.1667%	101
#14	80%	96	#38	86.6667%	104
#15	82.5%	99	#39	86.6667%	104
#16	80.8333%	97	#40	85.8333%	103
#17	83.3333%	100	#41	84.1667%	101
#18	85.8333%	103	#42	86.6667%	104
#19	84.1667%	101	#43	86.6667%	104
#20	86.6667%	104	#44	85%	102
#21	85.8333%	103	#45	85%	102
#22	87.5%	105	#46	85%	102
#23	85.8333%	103	#47	85%	102
#24	85.8333%	103

Note: #1 was the feature subset containing the first feature variable, that is, #1 was {m₂₂} #2 was the feature subset containing the second feature variable, that is ,#2 was {m₂₂,m₂₀} similarly, #47 was the subset containing 47 variables.

Appendix 3

Mean: The spectral mean in the hyperspectral image is to average the spectral reflectance in the region of interest (the wheat sample in this paper) in the hyperspectral image, first in each pixel in the ROI at the first band. The spectral reflection values are arranged into a set of vectors of size, and then the average value of the vector is defined as follows:

μi=∑j=1Nfi,jN⋅⋅⋅i=1, 2,⋅⋅⋅, M

μi: the average of the spectra at the i_th band;
fi,j: the spectral reflection value of the j_th pixel in the ROI in the i_th band;

Standard Deviation: The spectral standard deviation in the hyperspectral image is the standard deviation of the spectral reflectance in the region of interest (the wheat sample in this paper) in the hyperspectral image. First, the spectral reflection values of the respective pixel points in the ROI in the ith band are arranged into a set of vectors f of size 1*N, and then the standard deviation of the vector f is defined as follows:

αi=∑j=1N(fi,j−μi)N i=1,2,⋅⋅⋅,M

μi: spectral mean for the ith band;
fi,j: the spectral reflection value of the jth pixel in the ROI in the ith band;
αi: the spectral standard deviation at the ith band.

Appendix 4

1 K-S algorithm

The essence of K-S algorithm was to select a set of spatially-distributed datasets from the initial dataset as a training set, which was based on the Euclidean distance between sample points. The Euclidean distance equation is shown below:

(A1)dx(β, γ)=∑i=1k[xβ(i)−xγ(i)]2 β, γ∈[1, P]

where xβ(i) and xγ(i) are the reflectance of sample β and sample γ at the i_th wavelength, respectively. K is the number of wavelengths, dx(β, γ) is the distance between β and γ. The algorithm first selects the sample pairs (β, γ) corresponding to the largest dx(β, γ), then calculates the distance from the remaining sample to the reference point β and γ and selects the shortest distance from the reference point. Next, selecting the sample corresponding to the maximum of these shortest distances as a new reference point. Finally, repeating this process until the specified number of samples.

2 Successive projection algorithm (SPA)

Take dataset X600×460 as an example, set the jth column of the train set spectral matrix X480×460 as xj. The set S was each remaining set of wavelengths defined as S={j, 1≤j≤460, j∉{k(0),⋯⋯, k(n−1)}},where k(n−1) was the wavelength that was included in the wavelengths combination in the nth iteration. N was the number of elements in each wavelength combination. In the process of generating a wavelength combination, firstly, Initialization: n = 1, Select any wavelength (column) in the training set X480×460 as the starting wavelength of the selected wavelength combination; secondly, Calculate the projection values of all wavelengths xj in the orthogonal space of xk(n−1), find the wavelength with the largest projection value and Incorporate the wavelength into the wavelengths combination; then loop through these processes until n < N, finally, obtain a wavelength combination of {kn, n=0, 1…N−1}.

3 Support Vector Machine (SVM)

The SVM classifier introduced into the kernel function is represented as follows:

(A2){maxα∑i=1Pαi−12∑i,j=1Pαiαjyiyjexp(−g‖xi−xj‖)2∑i=1mαiyi=00≤αi≤C ∀i

The final decision function is

(A3)f(x)=sgn(∑i=1Pαiyik(x, xi)+b∗)

4 Specific calculation methods of SENS and SPEC

To evaluate our classification model, an instance may be judged as one of the following four types.

True positives (TP): The number of positive cases that were correctly divided into positive instances;
False positives (FP): The number of cases that were incorrectly divided into positive cases;
False negatives (FN): The number of incorrectly divided negative cases.
True negatives (TN): The number of cases that were correctly divided into negative cases.

Sensitivity (SENS) was used to measure the classifier’s ability to identify positive cases, and specificity (SPEC) was used to measure the classifier’s ability to identify negative cases in testing set. They can be expressed as follows:

(A4)SENS=TPTP+FN

(A5)SPEC=TNTN+FP

References

1. Zhao Y, Zhu S, Zhang C, Feng X, Feng L, He Y. Application of hyperspectral imaging and chemometrics for variety classification of maize seeds. RSC Adv 2018;8:1337–45. https://doi.org/10.1039/c7ra05954j.Search in Google Scholar

2. Huang M, Wang QG, Zhu QB, Qin JW, Huang G. Review of seed quality and safety tests using optical sensing technologies. Seed Sci Technol 2015;43:337–66. https://doi.org/10.15258/sst.2015.43.3.16.Search in Google Scholar

3. Shi, Y, Gong, F, Wang, M, Liu, J, Wu, Y, Men, H. A deep feature mining method of electronic nose sensor data for identifying beer olfactory information. J Food Eng 2019;263:437–45. https://doi.org/10.1016/j.jfoodeng.2019.07.023.Search in Google Scholar

4. Nethra N, Rajendra Prasad S, Vishwanath K, Dhanraj KN, Gowda R. Identification of rice hybrids and their parental lines based on seed, seedling characters, chemical tests and gel electrophoresis of total soluble seed proteins. Seed Sci Technol 2007;35:176–86. https://doi.org/10.15258/sst.2007.35.1.16.Search in Google Scholar

5. Li X, Deng A, Xu Y, Wu D, Li J, Wang J. Research Progress on Agricultural Biotechnology Utilizing in Purity Identification of Rice Seed. Chin Agric Sci Bull 2007;23:54–8. https://doi.org/10.3969/j.issn.1000-6850.2007.04.012.Search in Google Scholar

6. Wang D, Zhang X, Li R, Lu L, Wang X, Gu X. Effects of Seed Vitality and Regeneration on Genetic Integrity in Soybean by SSR Markers. Gene Technol 2019;8:21–7. https://doi.org/CNKI:SUN:AGBT.0.2019-01-006.Search in Google Scholar

7. Esteve AL, Ellis DD, Duvick S, Goggi AS, Hurburgh CR, Gardner CA. Feasibility of near infrared spectroscopy for analyzing corn kernel damage and viability of soybean and corn kernels. J Cereal Sci 2012;55:160–5. https://doi.org/10.1016/j.jcs.2011.11.002.Search in Google Scholar

8. Amiryousefi MR, Mohebbi M, Tehranifar A. Pomegranate seed clustering by machine vision. Food Sci Nutr. 2018;6:18–26. https://doi.org/10.1002/fsn3.475.Search in Google Scholar

9. Tarr A, Diepeveen D, Appels R. Spectroscopic and chemical fingerprints in malted barley. J Cereal Sci 2012;56:268–75. https://doi.org/10.1016/j.jcs.2012.02.007.Search in Google Scholar

10. Liu TL, Su Q, Sun Q, Yang L. Recognition of corn seeds based on pattern recognition and near infrared spectroscopy technology. Spectrosc Spectr Anal 2012;32:1550–3. https://doi.org/10.3964/j.issn.1000-0593(2012)06-1550-04.Search in Google Scholar

11. Koenig A, Konitzer K, Wieser H, Koehler P. Classification of spelt cultivars based on differences in storage protein compositions from wheat. Food Chem. 2015;168:176–82. https://doi.org/10.1016/j.foodchem.2014.07.040.Search in Google Scholar PubMed

12. Manattayil JK, Ravichandran NK, Wijesinghe RE, Shirazi MF, Lee SY, Kim P. Non-Destructive Classification of Diversely Stained Capsicum annuum Seed Specimens of Different Cultivars Using Near-Infrared Imaging Based Optical Intensity Detection. Sensors (Basel) 2018;18:2500–14. https://doi.org/10.3390/s18082500.10.3390/s18082500Search in Google Scholar PubMed PubMed Central

13. Feng H, Jiang N, Huang C, Fang W, Yang W, Chen G. A hyperspectral imaging system for an accurate prediction of the above-ground biomass of individual rice plants. Rev Sci Instrum 2013;84:95–107. https://doi.org/10.1063/1.4818918.Search in Google Scholar PubMed

14. Lee H, Kim MS, Jeong D, Delwiche SR, Chao K, Cho BK. Detection of cracks on tomatoes using a hyperspectral near-infrared reflectance imaging system. Sensors (Basel) 2014;14:18837–50. https://doi.org/10.3390/s141018837.Search in Google Scholar PubMed PubMed Central

15. Gong A, Zhu S, He Y, Zhang C. Grading of Chinese Cantonese Sausage Using Hyperspectral Imaging Combined with Chemometric Methods. Sensors (Basel) 2017;17:1706–17. https://doi.org/10.3390/s17081706.Search in Google Scholar PubMed PubMed Central

16. Zhang X, Liu F, He Y, Li X. Application of hyperspectral imaging and chemometric calibrations for variety discrimination of maize seeds. Sensors (Basel) 2012;12:17234–46. https://doi.org/10.3390/s121217234.Search in Google Scholar PubMed PubMed Central

17. Zhang T, Wei W, Zhao B, Wang R, Li M, Yang L. A Reliable Methodology for Determining Seed Viability by Using Hyperspectral Data from Two Sides of Wheat Seeds. Sensors (Basel) 2018;18:813–26. https://doi.org/10.3390/s18030813.Search in Google Scholar PubMed PubMed Central

18. Yang W, Guo Z, Huang C, Duan L, Chen G, Jiang N. Combining high-throughput phenotyping and genome-wide association studies to reveal natural genetic variation in rice. Nat Commun 2014;5:1–8. https://doi.org/10.1038/ncomms6087.10.1038/ncomms6087Search in Google Scholar PubMed PubMed Central

19. Kandpal LM, Lohumi S, Kim MS, Kang J-S, Cho B-K. Near-infrared hyperspectral imaging system coupled with multivariate methods to predict viability and vigor in muskmelon seeds. Sensor Actuator B: Chem 2016;229:534–44. https://doi.org/10.1016/j.snb.2016.02.015.Search in Google Scholar

20. Zhu, Q., Feng, Z. Maize seed classification based on image entropy using hyperspectral imaging technology. Trans Chin Soc Agric Eng 2012;28:271–6. https://doi.org/10.3969/j.issn.1002-6819.2012.23.036.Search in Google Scholar

21. Williams PJ, Kucheryavskiy S. Classification of maize kernels using NIR hyperspectral imaging. Food Chem 2016;209:131–8. https://doi.org/10.1016/j.foodchem.2016.04.044.Search in Google Scholar PubMed

22. Wakholi C, Kandpal LM, Lee H, Bae H, Park E, Kim MS. Rapid assessment of corn seed viability using short wave infrared line-scan hyperspectral imaging and chemometrics. Sensor Actuator B: Chem 2018;255:498–507. https://doi.org/10.1016/j.snb.2017.08.036.Search in Google Scholar

23. Shrestha S, Knapič M, Žibrat U, Deleuran LC, Gislum R. Single seed near-infrared hyperspectral imaging in determining tomato (Solanum lycopersicum L.) seed quality in association with multivariate data analysis. Sensor Actuator B: Chem 2016;237:1027–34. https://doi.org/10.1016/j.snb.2016.08.170.Search in Google Scholar

24. Zhang B, Fan S, Li J, Huang W, Zhao C, Qian M. Detection of Early Rottenness on Apples by Using Hyperspectral Imaging Combined with Spectral Analysis and Image Processing. Food Anal Methods 2015;8:2075–86. https://doi.org/10.1007/s12161-015-0097-7.Search in Google Scholar

25. Zhang B-H, Li J-B, Zheng L, Huang W-Q, Fan S-X, Zhao C-J, et al. Development of a Hyperspectral Imaging System for the Early Detection of Apple Rottenness Caused byPenicillium. J Food Process Eng 2015;38:499–509. https://doi.org/10.1111/jfpe.12180.Search in Google Scholar

26. Galvao RK, Araujo MC, Jose GE, Pontes MJ, Silva EC, Saldanha TC. A method for calibration and validation subset partitioning. Talanta 2005;67:736–40. https://doi.org/10.1016/j.talanta.2005.03.025.Search in Google Scholar

27. Yang X, Hong H, You Z, Cheng F. Spectral and Image Integrated Analysis of Hyperspectral Data for Waxy Corn Seed Variety Classification. Sensors (Basel) 2015;15:15578–94. https://doi.org/10.3390/s150715578.Search in Google Scholar

28. Kucha CT, Liu L, Ngadi MO. Non-Destructive Spectroscopic Techniques and Multivariate Analysis for Assessment of Fat Quality in Pork and Pork Products: A Review. Sensors (Basel) 2018;18:377–89. https://doi.org/10.3390/s18020377.Search in Google Scholar

29. Liu D, Sun DW, Zeng XA. Recent Advances in Wavelength Selection Techniques for Hyperspectral Image Processing in the Food Industry. Food Bioproc Technol 2013;7:307–23. https://doi.org/10.1007/s11947-013-1193-6.Search in Google Scholar

30. Araújo MCU, Saldanha TCB, Galvão RKH, Yoneyama T, Chame HC, Visani V. The successive projections algorithm for variable selection in spectroscopic multicomponent analysis. Chemometr Intell Lab Syst 2001;57:65–73. https://doi.org/10.1016/S0169-7439(01)00119-8.Search in Google Scholar

31. Cortes C, Vapnik V. Support-Vector Networks. Kluwer Academic Publishers; 1995. pp. 273–97. https://doi.org/10.1023/A:1022627411411.10.1023/A:1022627411411Search in Google Scholar

32. Qiao X, Jiang J, Qi X, Guo H, Yuan D. Utilization of spectral-spatial characteristics in shortwave infrared hyperspectral images to classify and identify fungi-contaminated peanuts. Food Chem 2017;220:393–9. https://doi.org/10.1016/j.foodchem.2016.09.119.Search in Google Scholar PubMed

33. Li Q, Gu Y, Jia J. Classification of Multiple Chinese Liquors by Means of a QCM-based E-Nose and MDS-SVM Classifier. Sensors (Basel) 2017;17:272–87. https://doi.org/10.3390/s17020272.Search in Google Scholar PubMed PubMed Central

34. Luo W, Du, YZ Discrimination of varieties of cabbage with near infrared spectra based on principal component analysis and successive projections algorithm. Spectrose Spectr Anal 2016;36:3536–41. https://doi.org/10.3964/j.issn.1000-0593(2016)11-3536-06.Search in Google Scholar

Received: 2019-09-01

Accepted: 2020-03-13

Published Online: 2020-04-21

A modified feature fusion method for distinguishing seed strains using hyperspectral data

Abstract

1 K-S algorithm

2 Successive projection algorithm (SPA)

3 Support Vector Machine (SVM)

4 Specific calculation methods of SENS and SPEC

References

Journal and Issue

Articles in the same Issue