Using noise reduction to enhance ranking based genomic selection

Banerjee, Rohan; Singh, Manish

doi:10.1007/s10722-021-01190-9

Using noise reduction to enhance ranking based genomic selection

Research Article
Published: 27 April 2021

Volume 68, pages 3319–3331, (2021)
Cite this article

Genetic Resources and Crop Evolution Aims and scope Submit manuscript

191 Accesses
Explore all metrics

Abstract

Genomic Selection (GS) is a breeding technique that utilizes whole genome markers to make trait predictions. The goal of GS is to identify the top candidates that have the most desirable trait values. Usually, GS has been formulated as a regression problem where the marker data is used to predict phenotypic values. However, since the end goal of GS is identification of top candidates, ranking the individuals makes far more sense. Creating accurate ranking models pose three fundamental challenges—presence of noise in phenotypic data, extremely high dimensional nature of the genotypic data and small sample size of the genomic datasets. To combat these challenges, we present a novel two phase approach to increase the noise tolerance of ranking based approaches. The proposed algorithm uses pruning to perform noise filtering and leverages biclustering to improve model generalization. This approach is evaluated on both pointwise and pairwise ranking algorithms. Previous work on Arabidopsis and CIMMYT wheat datasets yielded mean Normalized Discounted Cumulative Gain (NDCG) @10 scores of 0.883 and 0.748 respectively. The proposed approach outperforms these results on both of the datasets yielding ranking accuracies of 0.965 and 0.865 respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Genomic prediction ability for yield-related traits in German winter barley elite material

Article 22 May 2017

A comparison of genomic selection methods for breeding value prediction

Article 12 May 2015

Phenotype Prediction Under Epistasis

Notes

Dataset available at http://publiclines.versailles.inra.fr/page/33.

References

Abellán J, Masegosa AR (2012) Bagging schemes on the presence of class noise in classification. Expert Syst Appl 39(8):6827–6837
Article Google Scholar
Abellán J, Moral S (2003) Building classification trees using the total uncertainty criterion. Int J Intell Syst 18(12):1215–1225
Article Google Scholar
Aggarwal CC (2015) Data mining: the textbook. Springer Publishing Company, Berlin
Google Scholar
Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the 1998 ACM SIGMOD international conference on Management of data, pp 94–105
Banerjee R, Marathi B, Singh M (2020) Efficient genomic selection using ensemble learning and ensemble feature reduction. J Crop Sci Biotechnol 1–13
Bernardo R, Yu J (2007) Prospects for genomewide selection for quantitative traits in maize. Crop Sci 47(3):1082–1090
Article Google Scholar
Beukert U, Li Z, Liu G, Zhao Y, Ramachandra N, Mirdita V et al (2017) Genome-based identification of heterotic patterns in rice. Rice 10:1
Article Google Scholar
Blondel M, Onogi A, Iwata H, Ueda N (2015) A ranking approach to genomic selection. PloS One 10(6):1
Bootkrajang J, Kabán A (2012) Label-noise robust logistic regression and its applications. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, Berlin, pp 143–158
Bootkrajang J, Kabán A (2014) Learning kernel logistic regression in the presence of class label noise. Pattern Recogn 47(11):3641–3655
Article Google Scholar
Bouveyron C, Girard S (2009) Robust supervised classification with mixture models: Learning from data with uncertain labels. Pattern Recogn 42(11):2649–2658
Article Google Scholar
Brodley CE, Friedl MA (1999) Identifying mislabeled training data. Journal of artificial intelligence research 11:131–167
Article Google Scholar
Burges C, Shaked T, Renshaw E, Lazier A, Deeds M, Hamilton N, Hullender G (2005) Learning to rank using gradient descent. In: Proceedings of the 22nd international conference on machine learning, pp 89–96
Chen T, Guestrin C (2016) Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 785–794
Cheng Y, Church GM (2000) Biclustering of expression data. In Ismb (Vol. 8, No. 2000, pp. 93–103).
Cho H, Dhillon IS, Guan Y, Sra S (2004) Minimum sum-squared residue co-clustering of gene expression data. In: Proceedings of the 2004 SIAM international conference on data mining. Society for Industrial and Applied Mathematics, pp 114–125
Cossock D, Zhang T (2006) Subset ranking using regression. In: International conference on computational learning theory. Springer, Berlin, pp 605–619
Crossa J, de Los Campos G, Pérez P, Gianola D, Burgueño J, Araus JL et al (2010) Prediction of genetic values of quantitative traits in plant breeding using pedigree and molecular markers. Genetics 186(2):713–724
Article CAS Google Scholar
de Castro PA, de França FO, Ferreira HM, Von Zuben FJ (2007). Applying biclustering to text mining: an immune-inspired approach. In: International conference on artificial immune systems. Springer, Berlin, pp 83–94
de França FO, Coelho AL (2015) A biclustering approach for classification with mislabeled data. Expert Syst Appl 42(12):5065–5075
Article Google Scholar
de França FO, Von Zuben FJ (2010) Finding a high coverage set of δ-biclusters with swarm intelligence. In: IEEE congress on evolutionary computation. IEEE, pp 1–8
de França, F. O., & Von Zuben, F. J. (2011). Extracting additive and multiplicative coherent biclusters with swarm intelligence. In 2011 IEEE Congress of Evolutionary Computation (CEC) (pp. 632–638). IEEE.
de França FO, Coelho GP, Von Zuben FJ (2013) Predicting missing values with biclustering: a coherence-based approach. Pattern Recogn 46(5):1255–1266
Article Google Scholar
Ding W, Geng X, Zhang XD (2015) Learning to rank from noisy data. ACM Trans Intell Syst Technol (TIST) 7(1):1–21
Article Google Scholar
Dorigo M, Bonabeau E, Theraulaz G (2000) Ant algorithms and stigmergy. Fut Gen Comput Syst 16(8):851–871
Article Google Scholar
Frénay B, Verleysen M (2013) Classification in the presence of label noise: a survey. IEEE Trans Neural Networks Learn Syst 25(5):845–869
Article Google Scholar
Freund Y, Iyer R, Schapire RE, Singer Y (2003) An efficient boosting algorithm for combining preferences. J Mach Learn Res 4:933–969
Google Scholar
Goddard ME, Hayes BJ (2007) Genomic selection. J Anim Breed Genet 124(6):323–330
Article CAS Google Scholar
Guan D, Yuan W, Lee YK, Lee S (2011) Identifying mislabeled training data with the aid of unlabeled data. Appl Intell 35(3):345–358
Article Google Scholar
Herbrich R (2000) Large margin rank boundaries for ordinal regression. Adv Large Marg Classif 115–132.
Heslot N, Yang HP, Sorrells ME, Jannink JL (2012) Genomic selection in plant breeding: a comparison of models. Crop Sci 52(1):146–160
Article Google Scholar
Jannink JL, Lorenz AJ, Iwata H (2010) Genomic selection in plant breeding: from theory to practice. Brief Funct Genomics 9(2):166–177
Article CAS Google Scholar
Kadam DC, Potts SM, Bohn MO, Lipka AE, Lorenz AJ (2016). Genomic prediction of single crosses in the early stages of a maize hybrid breeding pipeline. G3 (Bethesda) 6(11):3443–3453
Li H (2011) Learning to rank for information retrieval and natural language processing. Synthesis Lectures on Human Language Technologies 4(1):1–113
Article CAS Google Scholar
Li P, Wu Q, Burges CJ (2008) Mcrank: learning to rank using multiple classification and gradient boosting. In: Advances in neural information processing systems, pp 897–904
Loudet O, Chaillou S, Camilleri C, Bouchez D, Daniel-Vedele F (2002) Bay-0× Shahdara recombinant inbred line population: a powerful tool for the genetic dissection of complex traits in Arabidopsis. Theor Appl Genet 104(6–7):1173–1184
Article CAS Google Scholar
Madeira SC, Oliveira AL (2004) Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Trans Comput Biol Bioinf 1(1):24–45
Article CAS Google Scholar
Meuwissen THE, Hayes BJ, Goddard ME (2001) Prediction of total genetic value using genome-wide dense marker maps. Genetics 157:1819–1829
Article CAS Google Scholar
Mirkin B (1997) Mathematical classification and clustering. J Oper Res Soc 48(8):852–852
Article Google Scholar
Padilha VA, Campello RJ (2017) A systematic comparative evaluation of biclustering techniques. BMC Bioinform 18(1):55
Article Google Scholar
Pedregosa F et al (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830
Google Scholar
Rebbapragada U, Brodley CE (2007) Class noise mitigation through instance weighting. In: European conference on machine learning. Springer, Berlin, pp 708–715
Segal E, Battle A, Koller D (2002) Decomposing gene expression into cellular processes. In: Biocomputing, pp 89–100
Sheng Q, Moreau Y, De Moor B (2003) Biclustering microarray data by Gibbs sampling. Bioinformatics 19(suppl_2):ii196-ii205.
Tabassian M, Ghaderi R, Ebrahimpour R (2012) Combining complementary information sources in the Dempster-Shafer framework for solving classification problems with imperfect labels. Knowl-Based Syst 27:92–102
Article Google Scholar
Tanay A, Sharan R, Shamir R (2002) Discovering statistically significant biclusters in gene expression data. Bioinformatics 18(suppl_1):S136-S144.
Wang X, Liu F, Jiao LC, Zhou Z, Yu J, Li B et al (2012) An evidential reasoning based classification algorithm and its application for face recognition with class noise. Pattern Recogn 45(12):4117–4128
Article Google Scholar
Wang X, Xu Y, Hu Z, Xu C (2018) Genomic selection methods for crop improvement: Current status and prospects. The Crop Journal 6(4):330–340
Article Google Scholar
Wu Q, Burges CJ, Svore KM, Gao J (2008) Ranking, boosting, and model adaptation. Technical report, Microsoft Research
Google Scholar
Zhu X, Wu X (2004) Class noise vs. attribute noise: a quantitative study. Artif Intell Rev 22(3):177–210.
Zhu X, Wu X, Chen Q (2006) Bridging local and global data cleansing: Identifying class noise in large, distributed data datasets. Data Min Knowl Disc 12(2–3):275–308
Article Google Scholar

Download references

Author information

Authors and Affiliations

Indian Institute of Technology Hyderabad, Hyderabad, Telangana, India
Rohan Banerjee & Manish Singh

Authors

Rohan Banerjee
View author publications
You can also search for this author in PubMed Google Scholar
Manish Singh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rohan Banerjee.

Ethics declarations

Conflict of interest

The authors declare no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Banerjee, R., Singh, M. Using noise reduction to enhance ranking based genomic selection. Genet Resour Crop Evol 68, 3319–3331 (2021). https://doi.org/10.1007/s10722-021-01190-9

Download citation

Received: 25 July 2020
Accepted: 16 April 2021
Published: 27 April 2021
Issue Date: December 2021
DOI: https://doi.org/10.1007/s10722-021-01190-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Using noise reduction to enhance ranking based genomic selection

Abstract

Access this article

Similar content being viewed by others

Genomic prediction ability for yield-related traits in German winter barley elite material

A comparison of genomic selection methods for breeding value prediction

Phenotype Prediction Under Epistasis

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Using noise reduction to enhance ranking based genomic selection

Abstract

Access this article

Similar content being viewed by others

Genomic prediction ability for yield-related traits in German winter barley elite material

A comparison of genomic selection methods for breeding value prediction

Phenotype Prediction Under Epistasis

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation