Skip to main content

Advertisement

Log in

Dynamic recursive tree-based partitioning for malignant melanoma identification in skin lesion dermoscopic images

  • Regular Article
  • Published:
Statistical Papers Aims and scope Submit manuscript

Abstract

In this paper, multivalued data or multiple values variables are defined. They are typical when there is some intrinsic uncertainty in data production, as the result of imprecise measuring instruments, such as in image recognition, in human judgments and so on. So far, contributions in symbolic data analysis literature provide data preprocessing criteria allowing for the use of standard methods such as factorial analysis, clustering, discriminant analysis, tree-based methods. As an alternative, this paper introduces a methodology for supervised classification, the so-called Dynamic CLASSification TREE (D-CLASS TREE), dealing simultaneously with both standard and multivalued data as well. For that, an innovative partitioning criterion with a tree-growing algorithm will be defined. Main result is a dynamic tree structure characterized by the simultaneous presence of binary and ternary partitions. A real world case study will be considered to show the advantages of the proposed methodology and main issues of the interpretation of the final results. A comparative study with other approaches dealing with the same types of data will be also shown. The comparison highlights that, even if the results are quite similar in terms of error rates, the proposed D-CLASS tree returns a more interpretable tree-based structure.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Argenziano G, Fabbrocini G, Carli P, De Giorgi V, Sammarco E, Delfino M (1998) Epiluminescence microscopy for the diagnosis of doubtful melanocytic skin lesions: comparison of the abcd rule of dermatoscopy and a new 7-point checklist based on pattern analysis. Archiv Dermatol 134(12):1563–1570

    Google Scholar 

  • Bergmann B, Hommel G (1988) Improvements of general multiple test procedures for redundant systems of hypogheses. In: Bauer P, Hommel G, Sonnemann E (eds) Multiple hypothesenprüfung (Multiple hypotheses testing). Springer, Berlin, pp 100–115

  • Bashir S, Qamar U, Khan FH (2014) Heterogeneous classifiers fusion for dynamic breast cancer diagnosis using weighted vote based ensemble. Qual Quant 49:2061–2076

    Google Scholar 

  • Billard L, Diday E (2003) From the statistics of data to the statistics of knowledge: symbolic data analysis. J Am Stat Assoc 98(462):470–487

    MathSciNet  Google Scholar 

  • Bock HH, Diday E (2012) Analysis of symbolic data: exploratory methods for extracting statistical information from complex data. Springer Science & Business Media, Berlin

    MATH  Google Scholar 

  • Bono A, Tomatis S, Bartoli C, Tragni G, Radaelli G, Maurichi A, Marchesini R (1999) The abcd system of melanoma detection. Cancer 85(1):72–77

    Google Scholar 

  • Borgoni R, Berrington A (2013) Evaluating a sequential tree-based procedure for multivariate imputation of complex missing data structures. Qual Quant 47(4):1991–2008

    Google Scholar 

  • Box GE, Cox DR (1964) An analysis of transformations. J R Stat Soc Ser B 26(2):211–252

  • Bradley AP (1997) The use of the area under the roc curve in the evaluation of machine learning algorithms. Pattern Recognit. 30(7):1145–1159

    Google Scholar 

  • Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140

    MATH  Google Scholar 

  • Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    MATH  Google Scholar 

  • Breiman L, Friedman J, Olshen RA, Stone CJ (1984) Classification and regression trees. CRC Press, Boca Raton

    MATH  Google Scholar 

  • Brier GW (1950) Verification of forecasts expressed in terms of probability. Mon Weather Rev 78(1):1–3

    Google Scholar 

  • Cappelli C, Mola F, Siciliano R (2002) A statistical approach to growing a reliable honest tree. Comput Stat Data Anal 38(3):285–299

    MathSciNet  MATH  Google Scholar 

  • Celebi ME, Kingravi HA, Uddin B, Iyatomi H, Aslandogan YA, Stoecker WV, Moss RH (2007) A methodological approach to the classification of dermoscopy images. Comput Med Imag Graph 31(6):362–373

    Google Scholar 

  • Couso I, Sánchez L (2011) Mark-recapture techniques in statistical tests for imprecise data. Int J Approx Reason 52(2):240–260

    MathSciNet  MATH  Google Scholar 

  • Cozza V, Guarracino MR, Maddalena L, Baroni A (2011) Dynamic clustering detection through multi-valued descriptors of dermoscopic images. Stat Med 30(20):2536–2550

    MathSciNet  Google Scholar 

  • D’Ambrosio A, Aria M, Siciliano R (2012) Accurate tree-based missing data imputation and data fusion within the statistical learning paradigm. J Classif 29(2):227–258

    MathSciNet  MATH  Google Scholar 

  • D’Ambrosio A, Aria M, Iorio C, Siciliano R (2017) Regression trees for multivalued numerical response variables. Expert Syst Appl 69:21–28

    Google Scholar 

  • Dietterich TG (2000) Ensemble methods in machine learning. In: Kittler J, Roli F (eds) Multiple Classifier Systems. MCS 2000. Lecture Notes in Computer Science, vol 1857. Springer, Berlin, pp 1–15

    Google Scholar 

  • Ferraro MB, Coppi R, Rodríguez GG, Colubi A (2010) A linear regression model for imprecise response. Int J Approx Reason 51(7):759–770

    MathSciNet  MATH  Google Scholar 

  • Ferraro MB, Colubi A, González-Rodríguez G, Coppi R (2011) A determination coefficient for a linear regression model with imprecise response. Environmetrics 22(4):516–529

    MathSciNet  MATH  Google Scholar 

  • Ferri C, Hernández-Orallo J, Modroiu R (2009) An experimental comparison of performance measures for classification. Pattern Recognit Lett 30(1):27–38

    Google Scholar 

  • Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139

    MathSciNet  MATH  Google Scholar 

  • Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200):675–701

    MATH  Google Scholar 

  • Garcia S, Herrera F (2008) An extension on ”statistical comparisons of classifiers over multiple data sets” for all pairwise comparisons. J Mach Learn Res 9(Dec):2677–2694

    MATH  Google Scholar 

  • Gil MÁ, Montenegro M, González-Rodríguez G, Colubi A, Casals MR (2006) Bootstrap approach to the multi-sample test of means with imprecise data. Comput Stat Data Anal 51(1):148–162

    MathSciNet  MATH  Google Scholar 

  • Górecki T, Krzyśko M, Waszak L, Wołyński W (2016) Selected statistical methods of data analysis for multivariate functional data. Stat Pap 59(1):1–30. https://doi.org/10.1007/s00362-016-0757-8

    Article  MathSciNet  MATH  Google Scholar 

  • Hastie T, Tibshirani R, Friedman J, Franklin J (2005) The elements of statistical learning: data mining, inference and prediction. Math Intell 27(2):83–85

    Google Scholar 

  • Iman RL, Davenport JM (1980) Approximations of the critical region of the fbietkan statistic. Commun Stat Theory Methods 9(6):571–595

    MATH  Google Scholar 

  • Iorio C, Frasso G, DAmbrosio A, Siciliano R (2016) Parsimonious time series clustering using p-splines. Expert Syst Appl 52:26–38

    Google Scholar 

  • Kruskal WH, Wallis WA (1952) Use of ranks in one-criterion variance analysis. J Am Stat Assoc 47(260):583–621

    MATH  Google Scholar 

  • Lange T, Mosler K, Mozharovskyi P (2014) Fast nonparametric classification based on data depth. Stat Pap 55:49–69

    MathSciNet  MATH  Google Scholar 

  • Limam M, Diday E, Winsberg S (2003) Symbolic class description with interval data. J Symb Data Anal 1(1)

  • Maglogiannis I, Kosmopoulos DI (2006) Computational vision systems for the detection of malignant melanoma. Oncol Rep 15(4):1027–1032

    Google Scholar 

  • Makinde OS (2016) Classification rules based on distribution functions of functional depth. Stat Pap. https://doi.org/10.1007/s00362-016-0841-0

  • Mballo C, Diday E (2005) Decision trees on interval valued variables. Electron J Symb Data Anal 3(1):8–18

    Google Scholar 

  • Mosler K, Mozharovskyi P (2015) Fast dd-classification of functional data. Stat Pap. https://doi.org/10.1007/s00362-015-0738-3

  • Nachbar F, Stolz W, Merkle T, Cognetta AB, Vogt T, Landthaler M, Bilek P, Braun-Falco O, Plewig G (1994) The abcd rule of dermatoscopy: high prospective value in the diagnosis of doubtful melanocytic skin lesions. J Am Acad Dermatol 30(4):551–559

    Google Scholar 

  • Otsu N (1975) A threshold selection method from gray-level histograms. Automatica 11(285–296):23–27

    Google Scholar 

  • Périnel E, Lechevallier Y (2000) Symbolic discrimination rules. In: Bock HH, Diday E (eds) Analysis of symbolic data: exploratory methods for extracting statistical information from complex data. Springer, Berlin, pp 244–265

    MATH  Google Scholar 

  • Siciliano R, Aria M, Conversano C (2004) Harvesting trees: methods, software and applications. In: Proceedings in Computational Statistics: 16th Symposium of IASC. COMPSTAT2004, held Prague

  • Siciliano R, Tutore VA, Aria M, D’Ambrosio A (2010) Trees with leaves and without leaves. In: Proceedings of the 45th Scientific Meeting of the Italian Statistical Society. Italian Statistical Society

  • Situ N, Yuan X, Zouridakis G (2011) Assisting main task learning by heterogeneous auxiliary tasks with applications to skin cancer screening. J Mach Learn Res 15:688

    Google Scholar 

  • Tarpey T, Kinateder KK (2003) Clustering functional data. J Classif 20(1):093–114

    MathSciNet  MATH  Google Scholar 

  • Tutore VA, Siciliano R, Aria M (2007) Conditional classification trees using instrumental variables. In: Berthold M, Shawe-Taylor J, Lavrač N (eds) Advances in intelligent data analysis VII. IDA 2007. Lecture Notes in Computer Science, vol 4723. Springer, Berlin, pp 163–173

    Google Scholar 

  • Viertl R (2003) Statistical inference with imprecise data. Encyclopedia of life support systems. UNESCO, Paris. Online publication: http://www.eolss.unesco.org

  • Viertl R (1997) On statistical inference for non-precise data. Environmetrics 8(5):541–568

    Google Scholar 

  • Yang MS, Hwang PY, Chen DH (2004) Fuzzy clustering algorithms for mixed feature variables. Fuzzy Sets Syst 141(2):301–317

    MathSciNet  MATH  Google Scholar 

  • Zadrozny B, Elkan C (2001) Obtaining calibrated probability estimates from decision trees and naive bayesian classifiers. In: Proceedings of the ICML. Citeseer, vol 1, pp 609–616

Download references

Acknowledgements

Authors would like to thank Prof. A. Baroni of the Campania University “Luigi Vanvitelli” (Italy) for kindly providing us the Skin lesions data set. Authors would like to thank two anonymous reviewers whose comments highly contribute to improve the quality of the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Antonio D’Ambrosio.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Aria, M., D’Ambrosio, A., Iorio, C. et al. Dynamic recursive tree-based partitioning for malignant melanoma identification in skin lesion dermoscopic images. Stat Papers 61, 1645–1661 (2020). https://doi.org/10.1007/s00362-018-0997-x

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00362-018-0997-x

Keywords

Navigation