Dynamic distance learning for joint assessment of visual and semantic similarities within the framework of medical image retrieval

https://doi.org/10.1016/j.compbiomed.2020.103833Get rights and content

Highlights

  • Handling the adequate modeling of similarity queries in unconstrained environments.

  • Assisting in radiological decision-making.

  • Investigation of the multivariate texture features and the fused ones.

  • Self-organization of the ground-truth distance metric dataset.

Abstract

The similarity measure is an essential part of medical image retrieval systems for assisting in radiological diagnosis. Attempts have been made to use distance metric learning approaches to improve the retrieval performance while decreasing the semantic gap. However, existing approaches did not resolve the problem of dependency between images (e.g. normal and abnormal images are compared with the same distance). This affects the semantic and the visual similarity. Thus, this work aims at learning a distance metric which preserves both visual resemblance and semantic similarity and modeling this distance in order to treat each query independently. The proposed method is described in three stages: (1) low-level image feature extraction, (2) offline distance metric modeling, and (3) online retrieval. The first stage exploits transform-domain texture descriptors based on local binary pattern histogram Fourier, shearlet, and curvelet transforms. The second stage is carried out using low-level features and machine learning. Given a query image, the online retrieval is based on the evaluation of the similarity between this image and each image within the dataset, while using a distance that is dynamically defined according to the query image. Realized experiments on the challenging Mammographic Image Analysis Society (MIAS) and Digital Database for Screening Mammography (DDSM) datasets prove the effectiveness of the proposed method in determining dynamically the adequate distance and retrieving the most semantically similar images, while investigating single low-level features as well as fused ones.

Introduction

In recent years, there have been concerted efforts to study medical image perception for assisting in radiological decision-making, while offering high interpretation accuracy, and accordingly, high efficiency of cancer diagnosis. Within this context, radiologists use Computer-Aided Diagnosis (CAD) in their clinical routines to detect specific types of abnormalities as well as to classify malignant and benign lesions. Thus, CAD systems play the role of a second reader by adding the potential of a second opinion in order to reduce false positive and false negative rates [1]. However, the main problem with CAD systems is an unclear opinion-making process, which decreases the radiologist confidence in the CAD outputs and disables the latter from further discussions [2]. Accordingly, the radiologists’ reluctance is mainly due to the black-box aspect of the currently used CAD systems. Thus, Content-Based Medical Image Retrieval (CBMIR) systems are investigated in the last two decades to overcome the disadvantages of the CAD systems. Indeed, CBMIR systems allow assisting radiologists in the clinical decision-making process by retrieving a selection of similar annotated clinical cases, to a given case, within large medical image archives. CBMIR systems are composed of two important processes, including visual feature extraction and similarity measure. The first process is carried out based on medical image content, which represents the visual information contained in image pixel/voxel in the form of low-level features. This representation is intended for encoding texture, color, and shape characteristics and spatial layouts of objects. The second process allows assessing the similarity between the query image features and the ones of each image within the studied dataset. During the last two decades, similarity measures have been a vigorous and active research topic. Each image is represented by a feature space point; and finding similar images refers to finding the nearest neighbors in this space. In the generic CBMIR, the nearest neighbor search requires the definition of a distance metric. Widely used similarity distance metrics include the Euclidean distance [3], [4], Mahalanobis distance [5], and other norm-based measures. Despite their popularity and computational simplicity, most distance metrics used require continuous features within a range for corresponding to a human visual perception similarity. Consequently, the choice of the distance metric depends on its meaningful features, as well as on a specific knowledge (the semantics) of the domain, the application, and the data, which generates a semantic gap [2]. In order to design a similarity measure, which correlates well with human perception, there has been a great interest in developing distance metric learning, which aims to construct an effective domain-specific distance metrics. In addition, the learned distance metric based on machine learning enables the prediction of the image similarity measures reported by expert/radiologist observers. Recently, some researchers have proposed to develop distance metric learning methods that find meaningful low-dimensional manifolds. These methods define a convex space of semantically similar images through the capture of the intrinsic structure of high-dimensional descriptors. Generally, distance metric learning methods can be divided into two main categories: unsupervised distance metric learning and supervised distance metric learning [6]. The former is aimed at constructing a low-dimensional manifold, while largely preserving the data point geometric relationships, whereas the supervised distance metric learning is aimed at identifying the most informative dimensions to the example classes using a class label information.

Overall, image similarity includes semantic similarity (also known as semantic relevance), which relies on class of severity (i.e. two images are semantically similar if they are both malignant or benign), and visual similarity (also referred to as visual resemblance), which refers to feature similarity and aims at bridging the semantic gap by choosing the most discriminative features. As shown in Fig. 1 , although the first pair of two images is visually similar, it is semantically unrelated, which leads experts towards an incorrect diagnosis. The second pair of images shows that both images are benign according to medical annotation but their appearances are different (the second image does not look like the third one visually); this could decrease experts’ trust. As a matter of fact, in contrast to the work presented in [7], which takes into account the two terms of image similarity using two different distances, we are using a dynamic distance that is query-dependent, while preserving both visual and semantic similarities. The main contributions of this work can be summarized as follows:

  • Automatic distance metric modeling via supervised learning, which is capable of handling the adequate modeling of similarity queries in unconstrained environments in order to model the perceptual similarity.

  • The investigation of the multivariate texture features as well as the fused features using the Canonical Correlation Analysis (CCA).

  • The self-organization of the ground-truth distance metric dataset without using the equivalence and in-equivalence constraints between data objects.

The remaining part of this paper is organized as follows: in the next two sections, we present a literature review of related distance metric learning methods and the proposed contribution to this work, respectively. The fourth section is devoted to the description of the set of experimental results. After discussing the results in Section 5, the conclusion and the future scope of improvement are presented in the last section.

Section snippets

Related work

The common objective of CBMIR systems is to retrieve the most relevant images within an annotated dataset with regard to a query image. To deal with this challenging issue, recent works have been focusing on improving the feature extraction process and/or the similarity assessment process. In the real use of cases of medical image retrieval, few free engines are available such as LIRE [8], GIFT [9], the ImageCLEF evaluation campaigns [10], and ParaDISE [11]. LIRE (Lucene Image REtrieval) is

Proposed method

Unlike the existing distance learning-based CBMIR methods, which are typically based on a single data-driven metric for all queries, the proposed method revolves around a mathematical model for learning a specific distance metric for each input query. In fact, the suggested method consists of three main steps (Fig. 2): low-level medical image feature extraction, offline semantic distance metric modeling, and online retrieval. The first step is carried out offline (resp. online) for the medical

Experiments and results

This section addresses the feasibility of the proposed method for a semantic retrieval of medical images based on a dynamic selection of the adequate distance according to the query image. The experiments are conducted over two public breast cancer datasets, including MIAS (Mammographic Image Analysis Society) and DDSM (Digital Database for Screening Mammography). The first dataset is composed of 322 mammograms diagnosed by an experimented radiologist as follows: 51 as malignant, 62 as benign,

Discussion

In this research, we have proposed a dynamic distance for medical image similarity assessment. The proposed query-based distance allows ensuring both visual and semantic similarities. Besides, it focuses on maximizing the top precision measure, which should increase the experts’ confidence concerning the CBMIR results. This confidence is accomplished through the dependency of similarity measure to each query rather than a single data-driven metric for all queries. Moreover, the proposed dynamic

Conclusion and future work

We proposed in this study a semantic CBMIR method to imitate the radiologist interpretation according to the query’s sense of similarity. This method is based on modeling a distance metric that enables CBMIR systems to provide for each query a suitable distance independently from others. Using the single and the CCA-based fused multivariate texture features, a distance metric modeling step consists of applying a machine learning approach, using the random forest classifier, to define a

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (22)

  • AbhayR.P. et al.

    A content based image information retrieval for medical MRI brain images based on hadoop and lucene (LIRe)

    Int. J. Sci. Eng. Res.

    (2016)
  • View full text