1 Introduction

Iris offers high accuracy for individual identification as it is unique to each person and remains unchanged during its entire lifetime. Thus, iris recognition has been under research during the last decade in order to find the most accurate and robust algorithm as possible. It can be used in applications related to personal identification such as secure access to bank accounts at automatic teller machines, to buildings, national border controls, passport control, etc. [1]. Furthermore, iris can be used for gender estimation or ethnicity detection [25]. According to Daugman [6], any iris recognition system normally consists of four steps: iris segmentation, normalization, feature extraction and matching. In this work, we will focus on iris segmentation.

Many authors have already attempted to implement an accurate segmentation algorithm. There are two possible types of image used to perform the segmentation: near-infrared (NIR) images and visible wavelength (VW) images. Until recently, most efforts have been made using NIR images as input, which require a distance to the user of 1–3 ft. and can cause permanent damage to the human eyes [7]. In turn, VW images are less hazardous to human eyes and do not require constrained environments. In addition, as technology evolves, high-resolution CMOS/CCD cameras will be able to acquire images suitable for iris recognition at distances beyond 3 ft. using visible illumination. For these reasons, the focus has been set on VW images.

The main objective of iris segmentation is to isolate iris texture from the rest (skin, pupil, reflections). There are roughly two groups of iris segmentation methods: boundary-based and pixel-based. Boundary-based methods aim at locating the boundaries needed to isolate iris regions. Pixel-based methods directly classify pixels as iris or non-iris according to their neighborhood appearance information. Integro-differential operator (IDO) [6], Hough transforms [8] and active contour models [9] are the classic boundary-based methods. Integro-differential operators compute the contour integral derivative exhaustively over the iris image and find its maximum. Hough transforms apply a voting procedure over a binary edge map to find the optimal curve parameters. Active contour models use energy-minimizing splines defined by external constraints and influenced by image features such as lines and edges. A limitation of boundary-based methods is that they need high-contrast images due to their dependence on image gradients. However, iris images contain much more appearance information, not only that related to gradients and geometry. Pixel-based methods take advantage of colour, intensity and texture information to first extract features of the pixel’s neighborhood and then build classifiers able to label the pixel as iris/non-iris.

Some works using pixel-based methods are briefly depicted in the following lines. Pundlik et al. [10] implemented a graph cut based energy minimization to separate eyelash, pupil, iris and background. However, their method is based on image intensity and relies on histogram peaks, so multimodal histogram distributions, which is the common case, could lead the algorithm to fail. Furthermore, they work on NIR images and the computational time is high. Proença [11] and Tan et al. [12] developed an iris/skin pixel classification algorithm both using the sclera to locate the iris. Proença used a neural network fed by manually designed location and colour features, using the sclera to locate the iris and a constrained polynomial fit to smooth the contours and remove possible segmentation errors. However, this could sometimes lead to simplified boundaries which do not exactly follow the iris shape. On the other hand, Tan et al. first classified the brightest pixels as skin and the darkest as iris and then determined the remaining unlabelled pixels according to the point-to-point distance and the validity of the unassigned pixels and the candidate region. Another work using the sclera for iris localization is that of Tan and Kumar [13], which utilized Zernike moments and colour features as inputs to a neural network and a Support Vector Machine (SVM) to classify iris and sclera pixels. Other approaches include those of Li and Savvides [14], who classified pixels in normalized iris maps into iris or occlusion by means of Gabor filters and Gaussian mixture models (GMMs); Yahiaoui et al. [15], who applied a method based on Hidden Markov Chain on NIR iris images; and Radman et al. [16] who developed a method based on Histogram of Oriented Gradients (HOG) over a SVM and a cellular automata evolved via the GrowCut technique. There are also methods utilizing convolutional neural networks (CNN), such as that of Liu et al. [17], who implemented two different neural network architectures for iris segmentation of noisy iris images acquired at-a-distance and on-the-move. Zhao et al. [18] also leveraged CNNs in order to perform iris recognition in an accurate way.

Regarding contour-based methods, Li et al. [19] and Jeong et al. [20] used the AdaBoost algorithm to first detect the eye position and then apply edge detection algorithms to locate the iris. Chen et al. [21] used instead the sclera as a natural boundary to restrict and extract the iris area, as opposed to the AdaBoost algorithm, and then applied a horizontal Sobel edge detector to obtain its edge map and Hough transforms to detect limbic and pupillary boundaries and segment the iris. Zhao and Kumar [22] used a combination of pixel and contour-based algorithms. They first used the relative total variation model (RTV) and then applied the Hough transform to detect the limbic and pupillary boundaries, and afterwards performed local grey-level analysis for pixel identification in order to segment the iris. Hu et al. [23] employed HOG and a SVM to pick the best among the segmentation outcomes of three different segmentation algorithms (a circle model and two ellipse models). Another approach is that taken by Ouabida et al. [24], who utilized the Optical Correlation based Active Contours (OCAC) to detect the iris and pupil contours.

Nevertheless, all these methods share a characteristic: they all work with eye images, whether they are NIR or VW. However, our method works with full-face images. Some methods also working with full-face images include those of Tan and Kumar [25], who first employed the AdaBoost eye detector to locate the iris within the face and extract pixel features which then classified with both a SVM and a NN; and in another attempt [26], they used Otsu’s method to binarize the iris region and Canny’s edge detector to extract the limbic and pupillary boundaries. However, this article does not show accuracy metrics of the segmentation achieved, it works only on high-resolution NIR partial-face images and it is not clear how they extract the eye region. In a posterior work [27], they employed the same approach as in [25] to locate the iris within the face, but then used a random walker to obtain a coarse mask of the iris and complex post-processing techniques to segment the iris. Another approach is that of [28], who also implemented the AdaBoost eye detector to locate the eye within the face and then utilized two Circular Edge Detectors (CED) to find the pupillary and limbic boundaries. However, they used NIR partial-face images and provide no segmentation metrics.

Compared to all the previously depicted methods, ours implements a full-face approach which, in our opinion, has some advantages. First, it uses VW images. Second, instead of relying on local shape and structure properties to find the eye, our algorithm takes advantage of all the facial structure in order to precisely locate the eye. Then, we ensure that the region being analysed contains only the eye, which allows to reduce the search space and thus increases the efficiency. We then use the IDO to coarsely detect the iris, which allow us to perform a series of morphological operations which endow the algorithm with robustness and precision. Furthermore, we have made a study with four different ethnicities (Asian, Black, Latino and White) and our method achieves high accuracy for all of them.

1.1 Databases

Due to the popularity of the topic, exhaustive efforts have been made to develop a wide variety of iris image databases, such as CASIA [29], WVU [30], UBIRIS [31], MICHEDB[32] or FRGC [33]. However, all these databases either only include eye images, are NIR images or are not ethnicity classified, making it difficult to find an adequate database for our study. The two most common databases appropriate for our work would be the CASIA-Iris-Distance v4 database or the FRGC database, but the former only includes partial (not full) NIR face photographs and none of them has the images classified by ethnicity. For our purpose, we needed full-face VW photographs classified by ethnicity, as we aim at describing the differences in iris segmentation across ethnicities, so our decision was to choose the Chicago Face Database [34]. Nevertheless, we also adapt our framework and test it on the CASIA-Iris-Distance v4 dataset for completeness. In-detail explanation of the database is available in Section 4.1.

1.2 Our work

In this work, we present a full framework for iris segmentation that works with full-face photographs as input. As briefly depicted before, the framework presented is a combination of face key-points detection, the integro-differential operator and mathematical morphology. The method’s core is the IDO proposed by Daugman, but it has been enhanced in order to improve the speed and accuracy of iris segmentation. First, we use the facial key-points detector to determine where the eye is located in the image. This allows us to drastically reduce the search space for the IDO and therefore the time required to execute it. Once the eye location is available, the IDO is applied and we get a first coarse iris mask. Finally, mathematical morphology operations (which are later explained in this paper) are applied to the detected coarse mask to remove every non-iris region, such as eyelids or eyelashes, and to detect the pupil. With this combination, the method is able to achieve a high level of accuracy, speed and robustness across ethnicities, while removing the need for an iris-exclusive image.

Our main contribution is a ready-to-use iris segmentation framework that works on VW full-face images gracefully coupling different existing techniques which, to the best of our knowledge, have never been combined before in order to perform iris segmentation. Moreover, we also introduce an algorithm based on mathematical morphology able to detect and correct eye reflections and eyelids, and a simplistic and yet effective way of segmenting the pupil. In addition, we implemented the method in a publicly available, modular framework which allows for rapid testing and modification while being completely functional. Then, due to its modularity, it is easy to change any of its modules to try to improve its behaviour. We also made the ground truth for iris segmentation on the CFD available, which to the best of our knowledge, makes it the first public database with full-face VW photographs with iris masks available as ground truth. Finally, we performed a comparison of iris segmentation on four different ethnicities showing that there are indeed differences among ethnicities.

The contents of this paper is thus organized as follows: Section 2 describes the material and methods used. Section 3 addresses the method proposed for iris segmentation. In Section 4 the database, the evaluation method and the equipment utilized are described. In Section 5 the results obtained by the method are discussed and compared with other available algorithms. Finally, Section 6 closes the paper with the conclusions and future work lines.

2 Material and methods

2.1 Morphological operators

Mathematical morphology is a theory for the analysis of spatial structures. It is based on set theory, integral geometry and lattice algebra; and poses a powerful image analysis technique [35].

Let f be a greyscale image which is defined as f(x):ET where x is the pixel position. In the case of discrete valued images, T={tmin,tmin+1,...,tmax} is an ordered set of grey levels. Typically, in digital 8-bit images tmin=0 and tmax=255. Furthermore, let B(x) be a sub-set of Z2 called structuring element (shape probe) centred at point x, whose shape is usually chosen according to some a priori knowledge about the geometry and size of the relevant and irrelevant image structures.

Erosion and dilation are the two most basic mathematical morphology operators:

$$\begin{array}{*{20}l} \text{Dilation}: \left[\delta_{B} (f)\right](\mathbf{x}) = \text{max}_{b\in B(\mathbf{x})} f (\mathbf x + \mathbf b) \\ \text{Erosion}: \left[\varepsilon_{B} (f)\right](\mathbf{x}) = \text{min}_{b\in B(\mathbf{x})} f (\mathbf x + \mathbf b). \end{array} $$
(1)

Their objective is to expand light or dark regions, respectively, according to the size and shape of the structuring element. These elementary operations can be combined to obtain a new set of operators or basic filters given by:

$$\begin{array}{*{20}l} \text{Opening}: \gamma_{B}(f)= \delta_{B} \left(\varepsilon_{B} (f)\right)\\ \text{Closing}: \varphi_{B}(f)= \varepsilon_{B} \left(\delta_{B} (f)\right). \end{array} $$
(2)

Light or dark structures are respectively filtered out from the image by these operators regarding the structuring element chosen.

Another operator used in this work is the geodesic dilation. The geodesic dilation is the iterative unitary dilation (a dilation performed repeatedly with a structuring element consisting in a disk of radius 1) of an image f (marker) which is contained within an image g (reference),

$$ \delta_{g}^{(n)}(f)=\delta_{g}^{(1)}\delta_{g}^{(n-1)}(f), \text{being}\ \delta_{g}^{(1)}(f)=\delta_{B}(f)\wedge g. $$
(3)

In order to define the close-hole operator, we must first introduce the geodesic reconstruction by dilation, which performs the successive geodesic dilation of f regarding g up to idempotence,

$$ \gamma^{rec}(g,f)=\delta_{g}^{(i)}(f),\ \text{so that}\ \delta_{g}^{(i)}(f)=\delta_{g}^{(i+1)}(f). $$
(4)

We can now define the close-hole operator. Basically, this operator fills all holes in an image f that do not touch the image boundary f (used as marker):

$$ \psi^{ch}(f)=\left[\gamma^{rec}(f^{c},f_{\partial})\right]^{c}, $$
(5)

where fc is the complement image (i.e. the negative). For a greyscale image, it is considered a hole any set of connected points surrounded by connected components of value strictly greater than the hole values.

The last algorithm used in this work is a hit-or-miss transform called thickening [35, 36]. In the case of hit-or-miss transformations, the structuring element is a set with two components, B(x)FG and B(x)BG, placed so that both reference pixels are at the same position (x) and are disjoint sets (i.e. B(x)FGB(x)BG=0). B(x)FG defines the set of pixels that should match the foreground, while B(x)BG does the same with the background. The hit-or-miss transform of a set f can be written in terms of an intersection of two morphological erosions:

$$ f * \mathbf{B} = \varepsilon_{\mathbf{B}_{FG}}(f) \cap \varepsilon_{\mathbf{B}_{BG}}\left(f^{c}\right), $$
(6)

where fc is the complement set of f, that is, the negative.

The thickening of a binary image f by a structuring element B is denoted by fB and defined as the union of f and the hit-or-miss transform of f by B:

$$ f\odot\mathbf{B}=f\cup(f*\mathbf{B}). $$
(7)

2.2 Integro-differential operator

In order to detect the iris, Daugman operator is used [37]. Daugman uses an integro-differential operator to locate the circular iris, pupil regions and the arcs of the upper and lower eyelids. The integro-differential operator is defined as

$$ \underset{(r, x_{0}, y_{0})}{max}\left| G_{\sigma}(r)\ast \frac{\partial}{\partial r} \oint_{r, x_{0}, y_{0}}\frac{I(x,y)}{2\pi r}ds\right| $$
(8)

where I(x,y) is an image containing an eye. The operator searches over the image domain (x,y) for the maximum in the blurred partial derivative with respect to increasing radius r and center coordinates (x0,y0). The symbol ∗ denotes convolution and Gσ(r) is a smoothing function such as a Gaussian of scale σ. The complete operator behaves as a circular edge detector, blurred at a scale set by σ, searching iteratively for the maximal contour integral derivative at successively finer scales of analysis through the three parameter space of center coordinates and radius (x0,y0,r) defining a path of contour integration.

3 Proposed method

A diagram of the proposed algorithm is shown in Fig. 1. First, facial key-points detection is performed. Next, a mask with the eye landmarks is created in order to extract the eye region. Coarse iris segmentation is then carried out by means of Daugman’s IDO to obtain a first coarse iris boundary (the iris circumference). Once the coarse segmentation of the iris is available, specular reflections are removed and then precise iris segmentation is performed in order to remove the eyelids and thus have a more accurate mask. Finally, we perform the pupil detection utilizing the recently segmented iris mask.

Fig. 1
figure 1

Block diagram of the proposed method

3.1 Facial key-point detection

Since the objective is to be able to segment the iris having as input a face photograph, the first step must be to obtain the location of the eye in the image by means of a facial key-point detector. To do this, the following frameworks were tested: Chehra [38], CLM [39], Face Detector by Zhu and Ramanan [40] and PO CR [41]. Finally, Chehra algorithm was chosen due to its accuracy, robustness and speed. The output of the detector is a vector of 49 facial key-points distributed as shown in Fig. 2a.

Fig. 2
figure 2

Detected landmarks and polygons of each feature. Distribution of facial landmarks (a), polygons before (b) and after (c) thickening and masks used to extract the features (d)

3.2 Eye extraction

Once the facial landmarks are available, masks for each facial feature are created. First, each feature’s landmarks are identified and a polygon is created by joining their outer landmarks (Fig. 2a shows the detected landmarks, and Fig. 2b the created polygons). Next, a binary mask is created where all pixels inside these polygons are set to 1 keeping the rest to 0. This binary mask is then thickened [35], obtaining the polygons shown in Fig. 2c. The thickening factor was empirically established at 20% of the inter-ocular distance (IOD), which is calculated as the distance between the centroids of the polygons defined by the eyes landmarks. The output of this step is a mask for each feature (i.e. both eyes are at this point located in the image and thus available to perform the segmentation). In Fig. 2d, a map with all the extracted coarse masks is shown.

3.3 Coarse iris segmentation

First, the full-face image is converted to greyscale. Then, the eye mask obtained in the last step (Fig. 2d) is used to crop the image keeping only the eye region (Ieye, Fig. 3a). Next, Daugman’s integro-differential operator is used in order to find a circumference in which the iris must be located (Fig. 3b). This operator works as described in Section 2.2. It takes as inputs rmin and rmax and outputs a coarse circular iris/sclera boundary (Biris) indicating the possible region where the iris should be located. Nevertheless, eyelashes or eyelids are likely to appear in this region.

Fig. 3
figure 3

Coarse iris segmentation. Original extracted eye image in greyscale (a) and iris/sclera boundary found by Daugman’s algorithm (b)

3.4 Specular reflection removal

In this stage, specular reflections are removed from the iris to help the precise iris segmentation algorithm to work better. The followed procedure is detailed in Algorithm 1. First, the iris boundary mask (Biris) is applied to the cropped greyscale eye image (Ieye). Next, the resulting image (Iiris, Fig. 4a) is filtered with a square Gaussian smoothing kernel with standard deviation of σ and size equal to 2×ceil(2×σ)+1 pixels (obtaining IG, Fig. 4b). Then, the filtered image is subtracted from the iris greyscale image (Igd=IirisIG, Fig. 4c). In order to obtain a mask to remove reflections, a threshold (thsr) is applied using Otsu’s method to binaries Igd (obtaining \(I_{th_{sr}}\), Fig. 4d). Next, the area, centroid position and eccentricity of the resulting blobs are analysed to distinguish reflections from anything else. Blobs with eccentricity<0.5, area between π(irisr/10)2<area<π(irisr/3)2 (where irisr refers to the iris radius) and its centroid located within the 10% of the iris height around the vertical center of the iris, are considered reflections. Then, the regions in Ieye indicated by the reflections map (Irefs, Fig. 4e) are filled in with the mean colour of the iris, which is computed as the mean intensity of the iris circumference region leaving out all areas corresponding to reflections. At this stage, iris specular reflections have been cleaned, and the images are ready to be processed in the next stage (Iec, Fig. 4f).

Fig. 4
figure 4

Specular reflections removal. Iris boundary mask applied to the eye sub-image (a), Gaussian filtering (b), Gaussian difference (c), specular reflections thresholding (d), blobs corresponding to specular reflections (e) and eye after specular reflections removal (f)

3.5 Precise iris segmentation

Now, Iec (Fig. 5a) contains iris images without specular reflections. However, iris images may still include eyelids, eyelashes or even sclera. Then, several mathematical morphology operations are carried out in this stage to remove non-iris regions. The procedure followed to accurately segment the iris is detailed in Algorithm 2. A close-hole operator is applied on Iec, obtaining Ich. Then, the residue image is obtained by subtracting IecIch. This residue image (Ires, Fig. 5b) is binarized applying a threshold (thbin) and a series of morphological operations (closing, opening and fill holes) is applied in order to smooth it, obtaining (Ismooth). The structuring elements used are two lines of length 20 and angles of 45 and 315 for the closing, and a disk of radius 5 for the opening. Usually, iris shapes do not have sharp peaks or valleys. Then, the convex hull of the cleaned image is obtained as the final mask. This way, possible artefacts due to eyelashes, eyelids or segmentation errors are cleaned out(Imask, Fig. 5c). At this point, the precise iris mask is available. Next step consists of checking if the precise iris mask (Imask) is better than the iris coarse mask (Biris).

Fig. 5
figure 5

Precise iris segmentation. Cropped greyscale eye image without specular reflections (a), residue of the close-hole operator(b), Otsu’s thresholding and binarisation after smoothing process and convex hull (c)

3.6 Best mask search

As mentioned before, in this step both coarse and precise masks compete in order to determine which one is the most accurate. In normal conditions, Imask is always be better. There is only one circumstance in which Biris might be more precise: when the iris precise segmentation algorithm fails. This usually happens when there is not enough contrast between the inferior part of the iris and its neighborhood. Nevertheless, the top part of the iris does not have this problem, because superior eyelid and eyelashes normally produce an occlusion and thus act as a natural “contrast barrier”. In other words, going from the centre of the iris to the top, the order of elements found is eyelashes, eyelid and skin, while going towards the bottom, the order is eyelid and eyelashes. Eyelid is usually lighter than eyelashes, then, the bottom region of the iris sometimes suffers this problem when the iris is too light.

To solve that, the lower third of Imask is checked in the following manner: if the curve of the lower part of the iris in Imask is too spiky, then an attempt to fix it is made by copying the third bottom of the coarse iris mask (Biris) to the precise iris mask (Imask). If this procedure is successful, the modified Imask is chosen as the best mask, otherwise, Biris is. At this point, the best iris mask is available, but some images might still have non desired regions incorrectly segmented as iris. In the next section, the implemented stage to remove all these regions is explained.

3.7 Eyelid removal

The objective of this stage is to remove superior and inferior eyelids from the iris segmentation mask to increase its accuracy. It is divided into two sub steps: superior eyelid removal and inferior eyelid removal. The removal of both of them share the same algorithm, the only difference is the region of the image where it is applied. Then, the process will be explained only for the superior eyelid. First, the image Igd (obtained in Section 3.4) is cropped before being binarized, keeping only the top third of the image (Athird). This helps in the detection of the eyelid due to the removal of the pupil and most of the iris, regions which normally have different intensities than the eyelid. Then, in the cropped region, only the eyelid and some iris is present, which makes Otsu’s method work better, as only two differentiated clusters of grey levels are present in the image. In addition, to avoid some possible mistake in case of multiple detections, the center, area, and eccentricity of detected shapes (\(S_{det_{i}}\), being i=1...ndetections) are analyzed to choose the uppermost, not circular one with an area within a limit empirically chosen (\(\phantom {\dot {i}\!}0.1A_{\text {third}} < S_{det_{i}} < 0.66A_{\text {third}}\) pixels). In the last step, the detected eyelids are removed from the iris mask obtained in Section 3.6, thus improving the iris segmentation mask. Figure 6 shows the segmented iris before eyelid removal (a), the eyelid detections to be removed (b) and the segmented iris after eyelid removal (c).

Fig. 6
figure 6

Eyelids removal. Iris before eyelids removal (a), detected eyelids (b) and iris after eyelids removal (c)

3.8 Pupil segmentation

The last step of our algorithm is to detect the pupillary boundary. To do so, we apply the final iris segmentation mask (Fig. 6c, Section 3.7) to the eye image without reflections (Fig. 5a, Section 3.5). The remaining is an image including only the iris region without any reflection (Iireg). We then adjust the histogram of Iireg by linearly mapping the values from [ 0,0.5] to [ 0,1], which enhances the darkest structures (Idark, Fig. 7a). Next, we binarize Idark by means of Otsu’s method, obtaining a coarse pupil mask (\(I_{p\_mask}\), Fig. 7b). Finally, to clean the coarse pupil mask, first we fill the holes of the mask and then prune its skeleton, resulting in the final pupil segmentation (Ipupil, Fig. 7c).

Fig. 7
figure 7

Pupil segmentation. Dark structures enhancement (a), binarisation (b) and final pupil segmentation result (c)

4 Experiment setup

Our experimentation is as follows: first, we perform a grid search over a training subset to tune the parameters of our algorithm. Then, the tuned algorithm is tested on the test set and compared against two other iris segmentation methods. Last, we assess the robustness of our algorithm by running it on the test set after some rotation, blur and noise was added.

4.1 Database and equipment

The experiment carried out to evaluate the performance of the proposed framework used the Chicago Face Database 2.0 (CFD) [34] which is publicly available and includes 597 high resolution photographs of male and female targets of varying ethnicity (Asian, Black, Latino and White) and age. Each target is represented with a neutral expression photo. We chose this database due to our necessity for full-face photographs and the possibility to compare results among ethnicities and elucidate the possible differences on iris segmentation tasks. The experiment was implemented using Matlab R2016a on an Intel(R) Core(TM) i7-4770S at 3.10 GHz processor PC with 16 GB of RAM.

4.2 Segmentation performance evaluation method

In order to evaluate the segmentation performance of our approach, ground truth for our database was first manually annotated. During evaluation, iris masks created by the presented method are compared with ground truth masks using well-known measures: precision (P), recall (R) and Dice’s Index or F-measure (F), similar to [42]. Pixels of the segmented iris mask are categorized as true positives (tp), the number of iris pixels correctly marked as iris; false positives (fp), the number of non-iris pixels incorrectly marked as iris; true negatives (tn), the number of non-iris pixels correctly unmarked; and false negatives (fn), the number of incorrectly unmarked iris pixels. The recall (R), precision (P) and F-measure (F) are then computed as:

$$ P = tp/(tp + fp) $$
(9)
$$ R = tp/(tp + fn) $$
(10)
$$ F = 2RP/(R + P) $$
(11)

In addition, a segmentation error rate (E) is also computed as in [43]. The segmentation error rate (Ei) on the Ii image is given by the proportion of disagreeing pixels (through the logical exclusive-or operator, ⊗) over all the image:

$$ E_{i} = \frac{1}{c \times r}\sum_{c^{\prime}}\sum_{r^{\prime}} O(c^{\prime}, r^{\prime}) \otimes C(c^{\prime}, r^{\prime}) $$
(12)

where O and C have the same dimensions (c columns, r rows) and O(c,r) and C(c,r) are, respectively, pixels of the output and ground truth images. The segmentation error rate (E) of an algorithm is given by the average errors on the input images (Eq. 13). The measure of (E) is closed in the [0, 1] interval, where “1" and “0" are respectively the worst and optimal error values.

$$ E = \frac{1}{n}\sum_{i}E_{i} $$
(13)

For comparative evaluation of segmentation accuracies, the proposed iris segmentation algorithm is compared with the well-known iris segmentation algorithms proposed by Daugman [44] and Zhao et al. [22]. Both algorithms are based on the IDO, implemented in MATLAB and publicly available [45, 46].

4.3 Parameter tuning

The following algorithm parameters need to be tuned to obtain the maximum possible accuracy:

  • rmin: minimum radius of the iris (IDO),

  • rmax: maximum radius of the iris (IDO),

  • thbin: threshold used to segment the iris,

  • thsr: threshold used to remove the specular reflections,

  • σ: variance of the Gaussian filter.

In order to find the best possible combination of parameters, we randomly split up the database in training set (70%) and test set (30%).Then, we used the training set to obtain the best parameter combination, defined as the one with minimum error rate. In Table 1, the range and best values of the parameters are shown.

Table 1 Parameter values for the grid search

The best values of the parameters obtained were: rmin=25,rmax=40,thbin=1.1,thsr=0.95 and σ=9.5.

The same procedure was followed in order to tune the parameters of the other two methods tested. For Daugman’s method, a grid search was performed with varying rmin and rmax parameters, obtaining the best results (the ones presented in Results section) for rmin=25 and rmax=40. Table 2 presents the grid search performed for Zhao et al.’s method, and the best parameters obtained.

Table 2 Parameter values for the grid search for Zhao et al.’s method

These parameters were optimized so the best accuracy was obtained across ethnicities. However, in the process of parameter tuning, we found that our algorithm (and therefore others as well) would perform better if these parameters were optimized independently for each ethnicity. The main reason we found for this fact is related to the election of the thresholds established for binarisation (thbin) and for specular reflections removal (thsr). Since the contrast of the iris affects the behaviour of the thresholding, obscure and light iris would need different parameters, or even an adaptive algorithm. This affected mostly Black people, however, as normally there is no information about the ethnicity of the subject when performing iris segmentation, we decided to stick to a unique general set of parameters for all ethnicities.

5 Results and discussion

As mentioned previously, the proposed iris segmentation framework is tested in a database consisting of real face photographs of male and female subjects from four different ethnicities, namely Asian, Black, Latino and White. For this study, the values of the parameters used were those indicated in the previous section and were kept unchanged for every ethnicity.

The evaluation of the algorithm was performed over the test set. Results for each ethnicity and the overall performance are shown in Table 3. As can be observed, ethnicities which performed best are “Asian", which achieved P=0.9066,R=0.9615 and E=0.0077; and “Latino", with P=0.9219,R=0.9502 and E=0.0078. “White” ethnicity was the fastest one (0.6266 s) and achieved P=0.9113,R=0.9359 and E=0.099. In contrast, “Black" ethnicity was the slowest (0.6197 s) and most difficult to segment, with P=0.8796,R=0.8639 and E=0.0134.

Table 3 Performance comparison of presented framework for each ethnicity

The difference of accuracies among ethnicities is negligible except for Black ethnicity. The main reason why Black ethnicity is the one achieving the poorest results is the general trend in Black people to have dark or very dark eyes. The parameters of the implemented algorithm were optimized in a training set with four ethnicities, in which Black one was the only one with very low contrast between eye skin and iris colour. Parameters related with thresholding are highly affected by this fact. Therefore, results obtained for this ethnicity could be improved by obtaining a set of independently optimized parameters.

Furthermore, performance is compared with Daugman’s and Zhao’s proposed iris segmentation algorithms. In contrast to the presented framework, these two methods need an eye image as input and will not work with a complete face image. Thus, the comparison is done using an eye image as input for both aforementioned algorithms and a complete face image for the presented framework. Then, to be able to perform a speed comparison, the execution time for the presented framework is computed from the instant the eye has been extracted from the face until the segmented mask for the iris is obtained.

In Table 4, it can be observed that the fastest algorithm is the one proposed by Zhao et al. (0.6819 seconds/eye). It also has the highest precision (P=0.9464) but the lowest recall (R=0.5989). In contrast, Daugman’s method is the slowest one (2.6508 seconds/eye), and achieves a precision of P=0.7482 and a recall of R=0.8139. Segmentation error is very similar for both frameworks, being E=0.0442 for Daugman’s and E=0.0418 for Zhao and Kumar. Finally, the proposed method has a quite fast execution time (0.9187 s/eye), the lowest segmentation error (E=0.0102) and the highest recall (R=0.9322), with a precision of P=0.8972. In addition, Dice’s Index show it as the most accurate one. Figure 8 shows some results of properly segmented iris by the presented framework, from the initial image to the final segmented iris. Figure 9 shows some not properly segmented iris. It can be observed as the contrast between the iris and the pupil for dark eyes is very low and sometimes makes the algorithm fail. Most segmentation errors occur due to pupil boundary leakage, so in future versions this algorithm should be improved in order to achieve a greater accuracy.

Fig. 8
figure 8

Some examples of correct iris segmentation. Rows: Asian, Black, Latino and White ethnicity, respectively. Original face (a), right iris segmentation (b) and left iris segmentation (c)

Fig. 9
figure 9

Some examples of incorrect iris segmentation. Rows: Asian, Black, Latino and White ethnicity, respectively. Original face (a), right iris segmentation (b) and left iris segmentation (c)

Table 4 Performance comparison for each implementation

We also run our algorithm in the well-known CASIA-Iris-Distance v4 [29] in order to be able to compare to other methods. As images within this dataset are not full-face images, we first extracted both eyes from the input image using an OpenCV-implemented eye detector [48] and then applied our iris segmentation to each of them. The parameters of the segmentation algorithm were the same as in our previous results. Results can be observed in Table 5.

Table 5 Performance of our algorithm in the CASIA-Iris-Distance v4 compared to Daughman [45] and Zhao et al. [46]

Last, to be able to assess the robustness of the method, we applied our algorithm on the test set after the addition of some rotation, blur and noise. First, images were rotated anticlockwise 5, 15 and 30. Then, Gaussian blur was applied to the images with a σ of 3 and 6. Finally, different kinds of noise (Poisson, Gaussian and salt & pepper) were added to the images. Poisson noise was generated by choosing the output pixel from a Poisson distribution with mean equal to the input pixel value, Gaussian noise with 0 mean and 0.01 variance; and “salt & pepper” noise with a density of 0.05. We chose these noises as they are typical in digital image [49]. Indeed, we wanted to approximate as much as possible the real noise, which in most digital acquired images is a Poisson noise [50]. Figure 10 shows the images with the aforementioned noises added. The columns of the image correspond with the kind of noise added, namely Poisson, Gaussian and “salt & pepper” noise, respectively. Rows account for the amount of blurring applied to the images (Gaussian blur with sigma of 0, 3 and 6, respectively).

Fig. 10
figure 10

Some examples of images with added noise. Gaussian noise (a), Poisson noise (b) and salt & pepper noise (c). Rows are for Gaussian blur of sigma=0,sigma=3 and sigma=6 respectively

Results are divided in three tables according to the degree of rotation of the images: Table 6 shows the results for images without rotation, Table 7 for 5 degrees of rotation and Table 8 for 15. As can be observed, our framework is most affected by high levels of blur (σ=6) and by “salt & pepper” noise. This is due to the intrinsic characteristics of mathematical morphology operators with fixed structuring elements size: if the spatial and colour structure of the image is severely changed, the chosen parameters are no longer the optimal ones, so the algorithm is more prone to fail. On the other hand, low levels of rotation and Poisson noise are almost innocuous. In addition, our algorithm did not work with a rotation of 30%. This is due to two reasons: first, the facial landmark detector is not able to locate the landmarks with such degree of rotation; second, we employ linear structuring elements which are not rotation invariant. Therefore, we can assure our method is robust against small rotations (≤15), low levels of blurring (γ≤3) and Gaussian and Poisson noise; but suffers when images with “salt & pepper” noise or high levels of blur (γ≥6) or rotation (≥30) are provided.

Table 6 Performance comparison without rotation for each test carried out to assess the robustness
Table 7 Performance comparison with 5 degrees of rotation for each test carried out to assess the robustness
Table 8 Performance comparison with 15 degrees of rotation for each test carried out to assess the robustness

6 Conclusions

In this work, a complete framework for iris segmentation is presented and made public [47]. Accurate iris segmentation is still a significant barrier to achieve a robust iris recognition algorithm. In the literature, two broad approaches are distinguished: contour-based and pixel-based methods. In this work, we employ a contour-based method, the IDO, combined with facial landmarks detection and mathematical morphology in order to achieve a fast, accurate and robust iris segmentation algorithm utilizing full-face photographs. We test our algorithm against two well-known iris segmentation algorithms and assess its robustness against rotation, blur and noise. The comparison shows our algorithm as fast and accurate, and the robustness tests prove it can work in a wide range of non-favorable conditions with a very little decrease in accuracy. In addition, a comparison among all ethnicities was performed and results show there exist differences in iris segmentation performance depending on the subject’s ethnicity. Finally, the ground-truth for iris segmentation on the CFD has been made publicly available, making it the first public full-face images database suitable for iris segmentation.

Nevertheless, our method has some drawbacks. First, it has been impossible for us to find a dataset with full-face photographs and iris segmentation ground truth. This has also prevented us to test the algorithm behaviour in the wild (i.e. with turned faces, occlusions, very high illumination contrasts), although we performed tests artificially adding wildness (rotation, noise and blur). Furthermore, our algorithm requires frontal face images to work properly, so a head pose correction method could be used in a future version in order to overcome this limitation. Another possible drawback is that segmentation results are dependent to the size of the image, as it employs morphological operations with fixed structuring element size and size-dependent operations with fixed parameters. Finally, our algorithm highly depends on the facial landmarks detection algorithm, which means that if it fails, the segmentation is likely to fail as well. This can also be taken as an advantage, because due to its modularity, anyone can change the facial landmark detection stage (or any other stage, such as the coarse iris detection or the pupil segmentation) and improve the accuracy.

As future work, we plan to apply the algorithm on The Face Recognition Grand Challenge (FRGC) database [33]. Moreover, the pupil segmentation algorithm must be improved, as some leakages occur when low contrast between iris and pupil is present. Another possible improvement is to make it auto-adaptive, giving it the ability to adjust the parameters accordingly to the detected ethnicity of the subject and the size of the structuring element of morphological operations and size-dependent operations (IDO, for example) in function of the size of the input image. The speed could as well be improved by doing some code optimization and swapping to C/C++ instead of MATLAB. Finally, some new methods leveraging the power of CNNs [17,18] are emerging which are worth exploring, as they provide more accuracy and remove the need for hand-crafted feature engineering.