Review
Fair comparison of skin detection approaches on publicly available datasets

https://doi.org/10.1016/j.eswa.2020.113677Get rights and content

Highlights

  • Skin detection methods and applications are comprehensively reviewed.

  • A rough classification of the methods tested in this work is proposed.

  • Most common benchmarks and testing protocols are discussed and summarized.

  • Performances are evaluated on 10 datasets using 14 methods.

  • Insightful discussions and prospects for future work are given.

Abstract

Skin detection is the process of discriminating skin and non-skin regions in a digital image and it is widely used in several applications ranging from hand gesture analysis to track body parts and face detection. Skin detection is a challenging problem which has drawn extensive attention from the research community in the context of expert and intelligent systems, nevertheless a fair comparison among approaches is very difficult due to the lack of a common benchmark and a unified testing protocol. In the recent era, the success of deep convolutional neural network (CNN) has strongly influenced the field of image segmentation and gave us various successful models to date. Anyway, due to the lack of large ground truth for skin detection only few works have addressed the skin detection problem using CNN models. In this work, we investigate the most recent researches in this field, and we propose a fair comparison among approaches using several different datasets.

The major contributions of this work are (i) an exhaustive literature review of skin color detection approaches and a comparison of approaches that can be useful to researchers and practitioners to select the most suitable method for their application, (ii) the collection and examination of many datasets with ground truth for skin detection that can be useful to produce a training set for CNN models, (iii) a framework to evaluate and combine different skin detector approaches, whose source code is made freely available for future research, and (iv) an extensive experimental comparison among several recent methods which have also been used to define an ensemble that works well in many different problems.

Experiments are carried out in 10 different datasets including more than 10,000 labelled images: experimental results confirm that the best method here proposed obtains a very good performance with respect to other stand-alone approaches, without requiring ad hoc parameter tuning.

A MATLAB version of the framework for testing and of the methods proposed in this paper will be freely available from https://github.com/LorisNanni.

Introduction

Skin texture and color are important signs that people use to understand variety of culture-related aspects about each other, as: health, ethnicity, age, beauty, wealth and so on. The presence of skin color in images and videos is a signal of the presence of humans in such media. Therefore, in the last two decades extensive research in the context of expert and intelligent systems has focused on skin detection in videos and images. Skin detection is the process of discriminating “skin” and “non-skin” regions in a digital image and consists in performing a binary classification of pixels and in executing a fine segmentation to define the boundaries of the skin regions. Currently, skin detection is a sophisticated process involving not only the training of models, but also numerous additional methods, including data pre-processing and post-processing.

Skin detection is used within many application domains: it is used as a preliminary step for face detection (Hsu, Abdel-Mottaleb, & Jain, 2002) and tracking (De-La-Torre, Granger, Radtke, Sabourin, & Gorodnichy, 2015), body tracking (Argyros & Lourakis, 2004), hand detection (Roy, Mohanty, & Sahay, 2017) and gesture recognition (Han, Award, Sutherland, & Wu, 2006), biometric authentication (i.e. palm print recognition) (Sang, Ma, & Huang, 2013), objectionable content filtering (Lee, Kuo, Chung, & Chen, 2007), medical imaging. In this work a comprehensive analysis is carried out of how different expert systems (including artificial intelligence, deep learning, and machine learning systems) are designed in order to deal with the skin detection problem.

A useful feature for the discrimination of skin and non-skin pixels is the pixel color; nevertheless, obtaining skin color consistency across variations in illumination, diverse ethnicity and different acquisition devices is a very challenging task. Moreover, skin detection, when used as preliminary step of other applications, is required to be computationally efficient, invariant against geometrics transformations, partial occlusions or changes of posture/facial expression, insensitive to complex or pseudo-skin background, robust against the quality of the acquisition device. The factor that worst influences skin detection is the color constancy problem (Khan, Hanbury, Stottinger, & Bais, 2012): i.e. the dependency of pixel intensity on both reflection and illumination which have a nonlinear and unpredictable behavior. To be effective when the illumination conditions vary rapidly, some skin detection approaches use image preprocessing strategies based on color constancy (i.e. a color correction method based on an estimate of the illuminant color) and/or dynamic adaptation techniques (i.e. the transformation of a skin-color model according to the changing illumination conditions). Static skin color approaches that rely on image preprocessing can only partially solve this problem and their performance strongly degrades in real-world applications. A possible solution is considering additional data acquired out the visual spectrum (i.e. infrared images (Kong, Heo, Abidi, Paik, & Abidi, 2005) or spectral imaging (Healey, Prasad, & Tromberg, 2003)), however the use of such sensors is not appropriate for all applications and requires higher acquisition costs which limit their use to specific problems.

Skin detection is a challenging problem and has been extensive studied from the research community. Despite the large number of methods, there are only few surveys in this topic: the works in (Kakumanu et al., 2007, Prema and Manimegalai, 2012) are quite old and cover only the methods proposed before 2005, the surveys in (Chen et al., 2016, Mahmoodi and Sayedi, 2016, Naji et al., 2018) are more recent and contain a good investigation of methods, benchmarking datasets and performance related to a period of about two decades. Anyway in none of the above surveys is there a fair comparison among methods using the same testing protocols and datasets. The aim of the present work is not limited to survey the most recent research in this field (which is now enriched of methods based on deep learning (Xu et al., 2015, Zuo et al., 2017, Kim et al., 2017a, Ma and Shih, 2018), but also, and above all, to propose a framework for a fair comparison among approaches.

In this research, a novel framework is proposed that integrates different skin color classification approaches and compare their performance and their combination on several publicly available datasets. The major contributions of this research work are:

  • An exhaustive literature review of skin color detection approaches with a detailed description of methods freely available.

  • The collection and examination of almost all the datasets with ground truth for skin detection available in the literature. Such collection, which includes more than 10,000 labelled images, can be useful to produce a training set for CNN models.

  • A framework to evaluate and combine different skin detector approaches. The source code of the framework and many of the tested methods will be made freely available for future research and comparisons. The system can be tuned according to the target application: on the basis of the application requirement, the acceptance threshold can be tuned to prune a large percentage of false accepts at a small cost of reduction in genuine accepts or vice versa a larger number of false accepts can be admitted to maximize the number of genuine accepts. The framework includes training and testing protocols for most used benchmark datasets in this field.

  • A fair comparison among the most recent research and methods in the skin detection field, using the same testing protocols, benchmark datasets and performance indicators. An evaluation of computation time of each method in order to perform a comparison also in terms of complexity. A discussion about performance can help researchers and practitioners in evaluating the approaches most suited to their requirements according to computational complexity, memory requirements, detection rate and sensitivity.

  • Three different CNN architectures are trained for skin detection and the model are made available.

The arrangement of this paper is as follows. In Section 2 related works in skin detection are presented, including a discussion about taxonomy of existing approaches and a detailed description of the approaches tested in this work. In Section 3 the evaluation problem is treated, the most known datasets used for performance evaluation are listed and commented, testing protocols and performance indicators used in our experiment are discussed. In Section 4 the experiments conducted using the proposed framework are reported and discussed. Finally, Section 5 includes the conclusions and some future research directions.

Section snippets

Skin detection approaches

Several skin detection methods assume that skin color can be recognized from background colors according to some clustering rule in a specific color space. Even if this assumption can be valid in a constrained environment where both ethnicity of the people and background colors are known, it is a very challenging task in complex images captured under unconstrained conditions and when individuals show a large spectrum of human skin coloration (Kakumanu, Makrogiannis, & Bourbakis, 2007). There

Skin detection evaluation: Datasets and performance indicators

To assist research in the area of skin detection, there are some well-known color image datasets provided with ground truth. The use of a standard and representative benchmark is essential to execute a fair empirical evaluation of skin detection techniques.

A fair experimental comparison

A fair comparison among different approaches is very difficult due to the lack of a universal standard in evaluation: most of published works are tested on self-collected datasets which often are not available for further comparison; in many cases the testing protocol is not clearly explained, many datasets are not of high quality and the precision of the ground truth is questionable since sometimes lips, mouth, rings and bracelets have been labelled as skin. In this section, we carry out a

Conclusion and future research directions

In this work a new framework to evaluate and combine different skin detector approaches is presented and an extensive evaluation of several approaches is carried out on 10 different datasets including more than 10,000 labelled images. A survey of most recent existing approaches is carried out, three well-known deep learning models for data segmentation are trained and tested to this classification problem and four new ensembles based on the combination of nine methods (including 3 CNNs) are

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

We would like to acknowledge the support that NVIDIA provided us through the GPU Grant Program. We used a donated TitanX GPU to train CNNs used in this work.

References (66)

  • S.J. Schmugge et al.

    Objective evaluation of approaches of skin detection using ROC analysis

    Computer Vision and Image Understanding

    (2007)
  • A. Angelova et al.

    Pruning training sets for learning of object categories

    Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

    (2005)
  • A.A. Argyros et al.

    Real-time tracking of multiple skin-colored objects with a possibly moving camera

    Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

    (2004)
  • V. Badrinarayanan et al.

    SegNet: A deep convolutional encoder-decoder architecture for image segmentation

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2017)
  • J.P.B. Casati et al.

    SFA: A human skin image database based on FERET and AR facial images

    (2013)
  • W.C. Chen et al.

    Region-based and content adaptive skin detection in color images

    International Journal of Pattern Recognition and Artificial Intelligence

    (2007)
  • W. Chen et al.

    Skin color modeling for face detection and segmentation: A review and a new approach

    Multimedia Tools and Applications

    (2016)
  • L. Chen et al.

    A skin detector based on neural network

  • L.C. Chen et al.

    Encoder-decoder with atrous separable convolution for semantic image segmentation

    Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

    (2018)
  • C.Ó. Conaire et al.

    Detector adaptation by maximising agreement between independent data sources

    Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

    (2007)
  • M. De-La-Torre et al.

    Partially-supervised learning from facial trajectories for face recognition in video surveillance

    Information Fusion

    (2015)
  • Dourado, A., Guth, F., de Campos, T. E., & Li, W. (2019). Domain adaptation for holistic skin detection. CoRR,...
  • A. Gupta et al.

    Robust skin segmentation using color space switching

    Pattern Recognition and Image Analysis

    (2016)
  • J. Han et al.

    Automatic skin segmentation for gesture recognition combining region and support vector machine active learning

  • G. Healey et al.

    Face recognition in hyperspectral images

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2003)
  • R.L. Hsu et al.

    Face detection in color images

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2002)
  • L. Huang et al.

    Human skin detection in images by MSER analysis

  • N.B. Ibrahim et al.

    A dynamic skin detector based on face skin tone color

  • S. Jairath et al.

    Adaptive skin color model to improve video face detection

  • Z. Jiang et al.

    Skin Detection Using Color, Texture and Space Information

    Fourth International Conference on Fuzzy Systems and Knowledge Discovery

    (2007)
  • M.J. Jones et al.

    Statistical color models with application to skin detection

    International Journal of Computer Vision

    (2002)
  • M. Kawulok

    Fast propagation-based skin regions segmentation in color images

  • M. Kawulok et al.

    Self-adaptive algorithm for segmenting skin regions

    EURASIP Journal on Advances in Signal Processing

    (2014)
  • Cited by (29)

    • Deep ensembles and data augmentation for semantic segmentation

      2023, Diagnostic Biomedical Signal and Image Processing Applications with Deep Learning Methods
    View all citing articles on Scopus
    View full text