Abstract

With the advancements in biomedical imaging applications, it becomes more important to provide potential results for searching the biomedical imaging data. During the health emergency, tremors require efficient results at rapid speed to provide results to spatial queries using the Web. An efficient biomedical search engine can obtain the significant search intention and return additional important contents in which users have already indicated some interest. The development of biomedical search engines is still an open area of research. Recently, many researchers have utilized various deep-learning models to improve the performance of biomedical search engines. However, the existing deep-learning-based biomedical search engines suffer from the overfitting and hyperparameter tuning problems. Therefore, in this paper, a nondominated-sorting-genetic-algorithm-III- (NSGA-III-) based deep-learning model is proposed for biomedical search engines. Initially, the hyperparameters of the proposed deep-learning model are obtained using the NSGA-III. Thereafter, the proposed deep-learning model is trained by using the tuned parameters. Finally, the proposed model is validated on the testing dataset. Comparative analysis reveals that the proposed model outperforms the competitive biomedical search engine models.

1. Introduction

The advancements in biomedical imaging applications lead to the challenge of providing the significant results for searching the biomedical imaging data. During the health emergency, tremors require efficient results at rapid speed to provide results to spatial queries using the Web [1]. Search engines which allow obtaining specific medical contents along with the complementary and different details would considerably help biomedical researchers [2]. An efficient biomedical search engine can obtain the significant search intention and return additional important contents in which users have already shown some interest [3, 4].

Numerous biomedical search engines have been implemented such as inverted index and Boolean retrieval [58]. But, index sizes are becoming exponentially large as the number of biomedical contents is increasing at a rapid rate [5]. Thus, ranking of biomedical contents has been achieved using their respective retrieval frequency [8]. It has been found that to compute the results when the biomedical queries have minimum similarity scores is still an open area of research [911].

Essie is a well-known biomedical search engine which is providing services to various websites at the National Library of Medicine. It is a phrase-based search engine with notion and term query expansion and probabilistic relevancy ranking. It has proven that a judicious group of exploiting document structure, phrase searching, and concept-based query expansion is a beneficial method for data retrieval in the biomedical field [12].

Bidirectional Encoder Representations from Transformers (BERTs) has shown good advancement in the field of biomedical search engines. In precision medicine, corresponding patients to appropriate investigational support or probable therapies is a difficult job which needs both biological and clinical information. To resolve it, BERT-based ranking models can provide fair comparisons [13]. A computer-based query recommendation model was designed which recommends semantically interchangeable terms based on an initial user-entered query [14, 15].

The typical view of biomedical search engine is represented in Figure 1. Initially, potential features of biomedical images are obtained. Thereafter, similarity score is computed. A prediction model is then utilized to compute the image index. Finally, the obtained class wise results are returned to the users.

The primary contributions of this paper are as follows:(1)An NSGA-III-based deep-learning model is proposed for biomedical search engines(2)The hyperparameters of the proposed deep-learning model are obtained using the NSGA-III(3)The proposed deep-learning model is trained by using the tuned parameters obtained from NSGA-III(4)Finally, the proposed model is validated on the testing dataset.

The remaining paper is organized as follows: The related work is presented in Section 2. Section 3 presents the proposed biomedical search engine. Comparative analysis is discussed in Section 4. Section 5 concludes the paper.

Hsieh et al. [16] proposed a semantic similarity approach by utilizing the page counts of two biomedical contents obtained from Google AJAX web search engine. The features were extracted in co-occurrence forms by considering two provided words. Support vector machines (SVMs) were utilized for classification purpose. Mao and Tian [17] utilized TCMSearch as a semantic-based search engine for biomedical images. It has shown good results for biomedical contents. Wang et al. [3] designed an ontology-graph-based web search engine named as G-Bean for evaluating biomedical contents from the MEDLINE database. The multithreading parallel approach was utilized to obtain the document index to address. Kohlschein et al. [18] studied that ViLiP can be efficiently utilized to search the contents in PubMed. ViLiP was further improved using NLP-based semantic search engine for obtaining better drug-related contents within a query. Depending upon the linguistic annotations, significant drug names can be obtained.

Tsishkou et al. [19] designed a TTA10 approach which stores biomedical data in hierarchical fashion. Logarithmic complexity was utilized to retrieve a huge data repository. AdaBoost was utilized to integrate independent search results to obtain efficient results. Mao et al. [1] designed a prototype model of subject-oriented spatial-content-based search engine (SOSC) for critical public health hazards. It can obtain Web contents from the Internet, find the Web page database, and obtain spatial content during pandemic from these Web pages. Boulos [20] designed a GeoNames-powered PubMed search which has an ability to handle these problems. The geographic ontology can utilize potential words to obtain the significant results from the PubMed. Al Zamil and Can [21] improved the contextual retrieval and ranking performance (CRRP) with minimal input from researchers. The performance was evaluated using the retrieval procedure in terms of topical ranking, precision, and recall. Grandhe et al. [22] designed an ascendable search engine (ASE) for biomedical images. Researchers can select a region of interest iteratively to evaluate the corresponding region from the images. An efficient cluster-based engine was designed to reduce the content retrieval time. Mishra and Tripathi [23] proposed a vector- and deep-learning- (VDP-) based biomedical search engine model. The degree of similarity was computed by integrating the vector space and deep-learning model.

The implementation of efficient biomedical search engines is still a challenging issue [24]. Recently, many researchers have utilized various deep-learning models to improve the performance of biomedical search engines [25]. However, the existing deep-learning-based biomedical search engines suffer from the overfitting and hyperparameter tuning problems.

3. Proposed Model

This section discusses the proposed biomedical search engine. Initially, the deep-learning model is discussed. Thereafter, the tuning of the deep-learning model is achieved using the NSGA-III.

3.1. Deep Convolutional Neural Network

The deep convolutional neural network (CNN) is widely accepted as a classification problem, and many researchers have utilized it in the field of search engines. Figure 2 shows the deep-learning-model-based biomedical search engine. It utilizes numerous convolution filters to extract the potential features.

We assume a single channel which is mathematically computed as

Here, . shows the dimension of every input factor. represents the number of images. During convolution operation, a filter is utilized to extract the potential features aswhere represents a bias. is the integration of . shows an activation function. The filter approaches . Thus, feature map can be computed as

Maxpool is applied on to compute peak value as . It shows the final feature obtained using .

The proposed model evaluates numerous feature groups by utilizing various filters with different sizes. The computed feature groups returns a vector aswhere shows the number of filters. Softmax () is utilized to evaluate the prediction probability as

We assume a training set (, ), where shows the similarity of the biomedical image query for search engine , and the prediction probability of the proposed model is for each label . The computed error can be defined as

Here, represents the labels of . shows an indicator and if , otherwise. The gradient descent is then utilized to update the deep-learning variables.

3.2. Nondominated Sorting Genetic Algorithm-III

Nondominated sorting algorithm-III (NSGA-III) [26] is widely accepted to solve numerous engineering applications. Recently, many researchers have utilized NSGA-III to solve hyperparameter tuning issues with deep-learning models [9, 11, 27].

The nomenclature of NSGA-III is demonstrated in Table 1. The generation of the initial population is represented in Algorithm 1. Initially, the random population is computed. The computed solutions are then encoded to the initial attributes of CNN.

Optimal CNNs.
Evaluate CNN on optimal CNNs.
whiledo
Consider CNN with maximum performance
Ifthen
  
else
  
end if
end while
elect a random set of solutions from using a normal distribution
obtain a group of random solutions
return

NSGA-III- and CNN-based biomedical search engines are discussed in Algorithm 2. The random-population-based CNN models are trained on the chunk of biomedical dataset. Fitness of the computed CNN models is then evaluated. Solutions are then divided into dominated and nondominated groups. Crossover and mutation operators are further employed to compute the children. Nondominated sorting () is implemented to sort the nondominated solutions. Based upon the termination criteria, the tuned parameters of CNN models are returned.

elect randomly solutions from given elite
for alldo
decode as hyper-parameters of gradient boosting
fortodo
   compute a random population based CNN in
  ifthen
   
   end if
end for
Ifthen
  
end if
fortodo
   elect randomly an
  ifthen
   
  else
   
  end if
  ifthen
   
  end if
   end for
end for
ifthen
    select solutions obtain from NSGA-III
end if

decomposes random individual to initial parameters of the CNN model.

4. Performance Analysis

The proposed biomedical search engine is implemented on MATLAB 2019a with the help of deep-learning and image processing toolboxes. The proposed and the existing models are tested on the biomedical search engine dataset. The proposed model is compared with the competitive models such as TCMSearch [17], SVM [16], G-Bean [3], TTA10 [19], ViLiP [18], SOSC [1], GeoNames [20], CRRP [21], ASE [22], and VDP [23]. To compute the performance of the NSGA-III-based CNN model, median and variation values (i.e., median ) are computed. of biomedical dataset is used for building the model. of the dataset is used for validation purpose. Remaining is used for testing purpose.

The training and validation loss analysis of the NSGA-III-based CNN model are represented in Figure 3. It clearly shows that the loss difference between training and validation is significantly lesser; therefore, the NSGA-III-based CNN model is least affected from the overfitting issue. Additionally, the loss approaches towards and convergence during the epoch. Thus, the proposed model is trained efficiently on the biomedical images.

Training and testing analysis of the NSGA-III-based CNN model are depicted in Tables 2 and 3. Specificity, area under curve (AUC), sensitivity, f-measure, and accuracy metrics have been utilized to evaluate the performance of the NSGA-III-based CNN model over competitive models such as TCMSearch [17], SVM [16], G-Bean [3], TTA10 [19], ViLiP [18], SOSC [1], GeoNames [20], CRRP [21], ASE [22], and VDP [23]. It has been observed that the proposed model outperforms the competitive models. The bold indicates the highest performance of biomedical search engines. Comparative analysis reveals that the proposed model outperforms the competitive biomedical search engine models in terms of specificity, AUC, sensitivity, f-measure, and accuracy by , 1.8372, 1.8328, 1.4838, and 1.4828, respectively.

The comparative analysis of the NSGA-III-based CNN model with the state-of-the-art approaches is depicted in Table 4. It has been observed that the NSGA-III-based CNN model achieves significantly better results than the existing web search engines.

5. Conclusions

This paper has proposed an efficient model for biomedical search engines. It has been found that the deep-learning models can be used to improve the performance of the biomedical search engines. However, the existing deep-learning-based biomedical search engines suffer from the overfitting and hyperparameter tuning problems. Therefore, an NSGA-III-based CNN model was proposed for biomedical search engines. Initially, the hyperparameters of the proposed model were obtained using the NSGA-III. Thereafter, the proposed CNN model was trained by using the tuned parameters. Finally, the proposed model is validated on the testing dataset. Comparative analysis reveal that the proposed model outperforms the competitive biomedical search engine models in terms of specificity, AUC, sensitivity, f-measure, and accuracy by , 1.8372, 1.8328, 1.4838, and 1.4828, respectively.

Data Availability

The dataset used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.