Introduction

The benefits of using ontologies have been empirically grounded in several studies, among the most recent being the ones by [1,2,3]. According to Cardoso [3], for instance, ontologies are mostly used to make domain assumptions explicit (70%), to enable reuse of domain knowledge (56%), or to share a common understanding of the structure of information among people or software agents (37%) [4]. In other words, ontologies have gained tremendous momentum due to their great potential for providing a new approach for managing, searching, retrieving, maintaining, sharing and viewing information. They offer a best solution for resolving the heterogeneity problem that occurs between two or more information systems, by providing a generic knowledge that can be shared and reused by different kind of domains such as artificial intelligence, semantic web services, knowledge engineering and computer science [5]. As ontologies tend to evolve rapidly over time and between different applications, there is an increasing need in recent years towards their construction approaches.

Generally stated, building ontology is an engineering activity and there are two main approaches for its construction—either from scratch, or by using ontology learning approaches. Building ontology from scratch or manually [6,7,8,9,10,11] is a very complicated and expensive task that usually requires a combination of the knowledge of domain experts and skills of ontology engineers. This task is difficult due to the unbelievable rate of knowledge development in the real world, which requires the ontology engineers to constantly update and revise the resulting ontologies with new concepts, terms and lexicons. Consequently, building ontology from scratch is non-intuitive, time-consuming, error-prone, and can be costly [12]. Due to these limitations, the term “ontology learning” has appeared, which captures an approach to discover ontological knowledge automatically or semi-automatically from various resources [13]. Ontology learning can solve the problems of knowledge acquisition and greatly facilitates the building of ontologies compared with the scratching methods.

Formally, using learning approaches, ontologies can be constructed from various sources of information including structured sources, such as a relational database, semi-structured sources, such as dictionaries, or unstructured sources, such as web pages [14]. The majority of the studies in the literature focus on relational database as a source of information for several reasons. Firstly, around 70% of data on the web is stored in relational databases [15]. Secondly, relational databases present full conceptual models [16]. Thirdly, they provide a full information resource [16]. Finally, they offer one of the best techniques for storing and manipulating data. However, relational databases suffer from the absence of semantic meaning, which is hinders the ability to achieve interoperability among information systems [17].

Despite the significant progress made during the last few years and the wide number of proposed approaches [18,19,20,21,22,23,24,25,26,27,28,29,30], there are still many issues that have not been sufficiently addressed. First, all the existing works [18,19,20,21,22,23,24,25,26,27,28,29,30] focus only on generating A-Box or T-Box [31] and ignore the integration process between these two components. Second, the majority of these studies [18,19,20,21,22,23,24,25,26,27,28,29,30] mainly focused on the process of building ontologies from relational database without covering the maximum semantics resided in the database [32]. Furthermore, all these studies focus only on describing the process of generating ontology from RDB, while they did not define a life cycle for describing the most common scenarios that arise during the creation of the ontology from RDB [33]. Broadly stated, there is a difference between a lifecycle and process. Indeed, the need to the life cycles increases dramatically with the need to resolve the data integration problems and evaluation constraints [33].

Finally, the availability of ontology for different domains on the web is gradually increasing. Therefore, the resulting ontology from RDBs must be evaluated from different perspectives to determine its quality before use or reuse. All the existing works in this topic did not take into consideration the measurement of the quality of the resulting ontology [34].

In this paper, we: (1) propose a new life cycle for ontology learning from RDBs based on the software engineering requirements; (2) describe a new process for building ontology from Relational database based on the predefined life cycle; (3) add three new semantics that can be extracted from RDB; (4) we suggest an evaluation process based on two categories of metrics: (i) Conceptual Ontology (T-Box) Evaluation metrics; (ii) Factual ontology(A-Box) evaluation metrics.

The rest of this paper is organized as follows. In “Related works” section, we present the related works, which describes the most popular studies about relational database into ontology conversion. In “Learning ontology from relational database (LOFRDB): life cycle” section, we introduce, the life cycle for learning ontologies from relational database. In “Proposed method” section, we introduce the proposed processes for generating ontologies from relational database. “Results and discussion” section is devoted to present the experimental results and discussions. Finally, we conclude the paper and suggests directions for future works.

Related works

Considerable amount of studies [18,19,20,21,22,23,24,25,26,27,28,29,30] have been conducted on building ontologies from RDBs using SQL-DDL [35]. While these studies share the common objective of converting RDB into Ontology, they differ in the process used as well as the metadata extracted and the mapping rules proposed. In fact, these studies fall roughly into one of the two categories. Firstly, approaches based on an analysis of relational schema. Secondly, approaches based on analysis relational data.

On one hand, all methods described in [18,19,20,21,22,23,24,25,26,27,28,29,30] take into account the mapping of: tables, columns, primary keys and foreign keys. However, the binary relationship is missed in [24, 25], and the ternary relationship is not manipulated in [21, 24, 25, 28, 30]. Only [22, 26, 29] covered the check constraint, Not Null constraint, and unique, while added the cardinality constraint. Moreover, Astrova [22] represents the only work that can handle the transitive and symmetric property, while [29] handled just the transitive property. In addition [29], presents the most reference work in the literature, because it consists of combining the existing studies and adds new rules for building ontology from RDB. Besides, Sequeda [29] covers all possible combinations of primary key and foreign keys as depicted in Table 2. Clearly, the two studies provided by Astrova [22] and Sequeda [29] represent the most relevant ones because they proposed many requirement that can act as best practices for building ontologies from RDBs. On other hand, building an ontology based on an analysis of relational data (Migration of the instances) is addressed in [21, 22, 28, 29].

However, all these studies ignore constraints that capture additional semantics in order to improve the quality of the resulting T-Box [31], such as owl: hasvalue constraint, data range restriction, and owl: all values from constraint [36]. In addition, all these works did not take into consideration the phase of integrating the A-Box with T-Box. In fact, the combination of TBox and ABox has two main benefits; (a) it facilitates the Semantic integration problem; (b) it allows to use a reasoning services for checking the consistency and satisfiablility of the resulting ontology [37]. To the extent of our knowledge, this is the first work that integrates the A-Box with T-Box in addition to use the reasoning capabilities for checking the consistency and satisfiablility [37]. These approaches allowed a mapping of RDB models into Ontology, they [18,19,20,21,22,23,24,25,26,27,28,29,30] focused only on describing the process of building the ontology, whilst they did not describe the life cycle. From software engineering perspective, the ontology development process identifies which activities are to be performed. However, it does not identify the order in which the activities should be performed. Whereas the life cycle identifies when the activities should be carried out, it determined the global stages through which the ontology moves during its life time and it describes what activities are to be performed in each stage and how the stages are related.

Eventually, the ontology evaluation becomes extremely important for developers to determine the fundamental characteristics of ontologies in order to improve the quality, estimate cost and reduce future maintenance [38]. To the best of our knowledge, there are only a few papers [22, 29] have appeared with the concern of evaluating not the resulting ontology but the mapping process. Astrova [22] proposed a method for measuring the quality of the mapping process RDB to Ontology based on retransforming the resulting ontology to a relational database and testing if the transformation is reversible using the lexical overlap measure. Sequeda [29] introduced an effective approach for validating the mapping process with regard to four properties. Nevertheless, those studies mainly focused only on the validation of the mapping process and not on the quality of the resulting ontology. In fact, learning ontologies from relational databases without considering the evaluation phase means that the resulting ontology does not cover the user or the domain needs [39].

Learning ontology from relational database (LOFRDB): life cycle

In this section, we present LOFRB lifecycle, which refers to the activities or phases that have to be performed for learning ontologies from relational databases. As depicted in Fig. 1, our proposed life cycle is based on four phases: Discovery, Preparation, Development, and evaluation. For most phases in the life cycle, the movement can be either forward or backward. This iterative depiction is intended to more closely portray a real project [40], in which aspects of the project move forward and may return to earlier stages as new information is uncovered and ontologist learns more about the domain of interest [5].

Fig. 1
figure 1

Life cycle of learning ontologies from relational database

Discovery

In this stage, the ontologist must define clearly the domain and scope of the ontology by answering the following questions [10]:

  • What is the domain that the ontology will cover?

  • For what we are going to use the ontology (Application)?

  • For what types of questions the information in the ontology should provide answers?

  • What are the ontology intended uses and who are the end-users (Stakeholders)?

  • What are the sources of RDBs used to build ontology?

  • Is it necessary to interviewing the domain expert?

The answers to these questions may change during the ontology development, but at any given time they help to limit the scope of the model. In this stage also the ontologist formulates some competency questions (CQ) that the ontology should be able to answer and that can be tested later [41]. The aim of the CQ is to check if the ontology includes sufficient information to answer these questions and if the answers require a particular level of detail or representation of a particular area. These CQs are just a sketch and do not need to be exhaustive [41].

As part of the discovery phase, the ontologist needs to assess the resources available to support the ontology development process. In this context, resources contain technology, tools, data, and people [40]. In addition, the ontologist can remove or add the data sources from this phase [40].

Preparation

The second phase of the LOFRDB involves data preparation, which includes the steps to explore and preprocess (conditioning) data prior. The data exploration consists of checking if the data sources contain enough semantics for generating ontology by checking if the RDB contains the complete space of relations and the maximum possible combinations of the primary keys and foreign keys [29]. The second sub-phase is data conditioning, which refers to the process of cleaning data and normalizing datasets. We can consider the RDB normalization as a part of the data conditioning phase.

Development

The Development (the building of the ontology) is tackled in two phases: the pre-development and post-development. The pre-development starts by the Data acquisition (ABox), which consists of extracting the instances from the relational database [42], and represent them based on the RDF triple form [43]. After the data acquisition, the schema acquisition (TBox) [22] will be started in order to generate the definition and the meaning of the extracting instances. Therefore, it is necessary to build a vocabulary of these terms for simplifying the development phase. The Development does not only include the data and schema acquisition, but provides also the phase for integrating these two components. The post-development encompasses several other tasks such as alignment, merging and integration, etc. [5].

Evaluation

After having built ontology from RDB, metrics for evaluating the resulting ontology must be presented [44]. Generally, the process of evaluation can be defined as the process of deciding on the quality of the ontology with respect to particular metrics [44]. For this purpose, two orthogonal dimensions to evaluate the quality of the resulting ontology are defined; (i) the first dimension is T-Box evaluation; (ii) the second dimension is A-Box evaluation. T-Box Evaluation postulates the design of the constructed T-Box. Although we cannot definitely know if the T-Box design correctly models the domain knowledge, metrics such as the richness, and inheritance indicate the quality of the T-Box created. The most significant metrics in this category are described in [45].

Proposed method

From the proposed lifecycle, many processes or models can be extracted and this is depends on the needs of the ontologist and the objectives of the project. In this work, we propose a method for ontology learning from RDB based on our proposed lifecycle. In this method, we consider that the data is already cleaned and conditioned. In addition, the resulting ontology needs neither alignment nor fusion with other ontologies.

As depicted in Fig. 2, after the discovery phase, which aims to identify the domain and scope of the ontology as well as take a first look at the data sources, the next phase is the data preparation. In this phase, some semantic characteristics are extracted and we use a novel metric to choose the RDB the most relevant. From this last one, we generate the ABox and the TBox [37] and then after we integrate the two component to get the final ontology. The last phase of our process is the validation of the resulting ontology that consists of evaluating the ABox and the TBox components by using some metrics and a reference ontology, and finally verify if the resulting ontology can response to the Competency Questions (CQ) [41]. If the validation [46] is failed, this means that the resulting ontology cannot be published on the web or used inside applications. In this case, it is necessary to return to the discovery phase.

Fig. 2
figure 2

The method of building the ontology from RDB

RDB exploration

The exploration phase consists in verifying if the input relational databases contain the complete space of metadata and semantic characteristics for generating ontology. In this context, some information can be extracted from the input RDBs like the number of: tables, columns, primary keys, foreign keys and instances. On the other hand, we consider the semantic characteristics summarized in Table 1 to choose the most relevant RDB.

Table 1 Summary of patterns to calculate NS

In this context, we suggest the number of semantics (NS) metric that represents the number of semantic characteristics of each input RDB. The range of this metric is from 0 to 17. Values close to 0 reflects a relational database that semantically poor, while large values, that are close to 17, represent a rich RDB. The NS metric is calculated by giving the value “1” to each characteristic existing in the RDB and “0” otherwise:

$$NS\,=\, NTDFK+NTFK+NTMTWOFK+NTEXTWOFK+\text{NAFKNNU}+NAFKNNNU+NAFKNNU+NANNUNPK+NA+NAN+NANFKU+NPK+NACHECK+NADEF+NTSAMEPK+NUnaryRel+NtrRel.$$

The RDB exploration needs also the human intervention for selecting the relevant relational database because the database that have high total number of semantics does not mean that it covers all the possible semantics [47].

Building the TBox (conceptual ontology)

The TBox introduces the vocabulary of an application domain. It represents the repository that contains the declarations of concept axioms or roles [48]. To generate the TBox from RDB, we use some transformation patterns that are defined in Table 2. Concisely, the main steps for generating conceptual ontology is depicted in Algorithm 1.

Table 2 The applied rules for generating conceptual ontology (TBox)

In this step, we propose 3 new transformation rules which allow to transform: the check constraint, the default constraint, and the constraint for improving inheritance relationship.

Transformation of the check constraint

As mentioned in [49], Check constraints are conditions that validates the data in a table. In this work, we propose a rule for transforming the CHECK constraint as data range restriction. For resolving this problem, we used the bounds facets, which are: xsd:minInclusive, xsd:minExclusive, xsd:maxInclusive, and xsd:maxExclusive [36] (see Fig. 3).

Fig. 3
figure 3

Check constraint for data range restriction

Transformation of the default constraint.

The DEFAULT constraint in RDB [50] is used to provide a default value for a column. In this respect, the owl: hasValue constraint describes a class of all individuals for which the property concerned has at least one value semantically equal to the default value. Consequently, owl: hasValue says regardless of how many values a class has for a particular property, at least one of them must be equal to the default value [36]. Figure 4 depicts the transformation of the default constraint to OWL.

Fig. 4
figure 4

Default Value constraint transformatio

Improvement of the inheritance relationship

It is important to realize that in OWL domains and ranges should not be viewed as constraints to be checked [36]. They are used as ‘axioms’ in reasoning. For instance, if the property hasProfessor has the range set as Professor and the domain set as Student, then we applied the hasProfessor property to Student (instances that are members of the class Student), this would generally not result in an error. Knowing that Student and Professor are subclasses of Person. In this context, it would infer that Student and Professor Classes can have instances in common. More precisely, it can be found that “Student hasProfessor Student”. As a result, we will use the owl: AllValuesFrom constraint [36] for avoiding such problem as depicted in Fig. 5.

Fig. 5
figure 5

Improvement of the inheritance relationship example

The generation of the TBox

The TBox introduces the terminology and the vocabulary of application domain. It represents the repository that contains the declaration of concept axioms or roles. A naïve approach would consider that the TBox corresponds to the schema of the Relational Database [31]. In this phase, we implement the rules that are identified in Table 2. Concisely, the main steps for generating the TBox is depicted in Algorithm 1.

figure a

The automated process of our algorithm receives as input the SQL DLL file [51] that contained the definition of the RDB and generates the OWL file as output. More precisely, the Algorithm 1 gets all RDB patterns depicted in Table 2 then it matches each RDB element with its corresponded element in OWL. It is important to mention that our algorithm is completely automatic. The implementation if this algorithm is uploaded into our GitHub repository.

The generation of the ABOX

The process of generating the A-Box is conducted using the R2RML language [52] that plays an important role for completing the data acquisition phase. Generally, the algorithm receives a SQL file that includes statement represented by SQL DDL. We then use the Database Metadata Extraction Engine (DMEE) that analyzes the SQL file and extracts automatically the metadata from it. The extracted metadata includes tables, columns, primary keys (PKs), and Foreign Keys (FKs). Thirdly, Mapping Generator Engine (MGE) exploits the extracted metadata and build a mapping file (R2RML file). Lastly, R2RML engine takes as input, the database model (Schema + Instances) and the generated mapping document that contains a set of rules representing the database schema, then provides an output represents the RDF dataset (triples) using r2rml-kit-master.Footnote 1 Concisely, the main steps for generating the A-Box is depicted in the following algorithm.Footnote 2 For convenience to the readers, the algorithms of generating the A-Box are deeply explained in [53]

figure b

.

The evaluation

The last step of our process involves validation of the resulting ontology. For this purpose, we propose to evaluate the ABox component and the TBox component separately by using some metrics. In this context, we have choose, the attributer richness, Inheritance Richness and Relationship Richness to evaluate the TBox component, and Class Richness as well as Average Population to evaluate the ABox [54].

The evaluation of TBox

Although we cannot really know whether the design of the T-Box correctly models the domain knowledge, metrics such as wealth, width, depth and heritage indicate the quality of the T-Box created. Therefore, the most important measures in this category are described below.

Attribute richness (AR)

AR represents the average number of attributes (slots) per class. Generally, we assume that more the attributes are generated from RDB more the knowledge conveys to the ontology [44].

Definition

The attribute richness is defined as the average number of attributes per class. It is calculated as the number of attributes for all classes (\(ATT\)) divided by the number of classes \((C)\).

$$AR=\frac{|ATT|}{|C|}$$
Inheritance richness (IR)

This metric represents the distribution of information across different levels of T-BOX and serves as an indicator of how well knowledge is grouped into different categories and subcategories in TBox. A TBox with a low IR indicates that the T-Box covers a specific domain in a detailed manner, while a T-Box with a high IR represent a general knowledge [44].

Definition

IR is defined as the average number of subclasses per class, where \(H\) is the sum of the number of inheritance relationships, and \(C\) is the total number of classes.

$$IR=\frac{|H|}{|C|}$$
Relationship richness (RR)

This metric reflects the diversity of the types of relations in the TBox such as. A TBox that contains only inheritance relationship usually conveys less information than a T-Box that contains a diverse set of relationships such as Transitive, symmetric, and reflexive relationship [45].

Definition

The RR of a T-Box is defined as the ratio of the number of non-inheritance relationships \((P),\) divided by the sum of inheritance relationships \((H)\) and non-inheritance relationships \((P)\).

$$RR=\frac{|P|}{|H|+|P|}$$

A-Box validation

A-Box evaluation metrics can be used to check how the data is placed inside the ontology. More specifically, A-Box evaluation refers to the instances metrics. In this respect, we used two predefined metrics: class richness and average population.

Class richness (CR)

CR is related to how instances are distributed across classes. The number of classes that have instances in the KB is compared with the total number of classes, giving a general idea of how well the KB utilizes the knowledge modeled by the T-Box. A-Box with low CR indicates that the A-Box does not have data that exemplifies all the class knowledge exist in the T-Box. On the other hand, A-Box with high CR proves that the data in A-Box covers most of the knowledge [44].

Definition

\(CR\)is defined as the ratio between the total number of classes that have instances \({c}^{\prime}\) divided by the total number of classes \((C)\).

$$CR=\frac{|{c}^{^{\prime}}|}{|C|}$$
Average population (AP)

This measure is an indication of the number of instances compared to the number of classes. It can be useful if the ontology developer is not sure if enough instances were extracted compared to the number of classes [44].

Definition

\(AP\)is defined as the number of instances in the A-Box \((I)\) divided by the number of classes defined in the ontology schema \((C)\).

$$AP=\frac{\left|I\right|}{\left|C\right|}$$

Results and discussion

To evaluate the efficiency and the solidity of the proposed process, we have started from 6 relational databases of the e-commerce domain. These databases cover several metadata used in the process of learning ontologies from relational database, such as tables, columns, foreign keys (FKs) and primary keys (PKs). The detailed information of these databases is summarized in Table 3. As proof of concept, our experimental simulations were conducted on a personal computer under windows 10, with Intel core i7 2.70 GHZ processor and 16 GB RAM.

Table 3 A list of metadata extracted from RDB

The discovery phase

In the discovery phase, we have to answer the following question: do we have enough information background to start building ontology? Table 1 shows the most relevant questions that we have covered. It may be possible to refer to an expert in the studied domain to resolve some problems concerning the gathered data such as the database conceptualization problems [47].

Unlike many traditional stage-gate processes, in which the process of building ontology from relational databases start without checking if some specific criteria are met. Therefore, the proposed lifecycle is intended to accommodate more ambiguity. As depicted in the Table 4, it is recommended to pass certain checkpoints as a way of gauging whether we are ready to move to the next phase of the LOFRDB lifecycle. Creating the perfect plan for learning ontology from RDB requires a clear understanding of the domain area, the problem to be solved, and scoping of the data sources to be used. Answering these questions clarify the problem definition and help us to select the appropriate database that can be used in later phases. The Table 5 exhibits a list of competency questions (CQs) that represent informal questions that the ontology must be able to answer [41]. We consider these to be natural language sentences that express patterns for types of question people want to be able to answer with the ontology.

Table 4 The list of questions the ontologist must answer before start building ontology
Table 5 A list of competency questions

As we know, ontology authors are usually domain experts but not necessarily proficient in ontology technologies, especially their logic underpinnings [41]. As a consequence, on the one hand it is difficult for human authors to express their requirements for the axiomatization of an ontology and, on the other hand, it is also difficult to know whether the requirements are fulfilled as a result of their ontology authoring actions. To address this issue, we introduce the methodology of Competency Question in order to help the authors of the ontology to check if the resulting ontology embedded all the necessary information. In fact, it is important to list these questions in the discovery phase in order to allow to the ontologist to take them into consideration during the process of development.

Additionally, in the discovery phase, we can build an initial look at the list of data that we have chosen in order to determine whether it contains a large number of necessary metadata. It can be clearly seen from Table 3, that, the relational database EcommerceDB and Iscommerce did not contain sufficient semantics to start building ontology. For instance, EcommerceDB database contains 3 tables, 20 columns, 4 PKs, 2 FKs, and 100 instances. Based on these measures, we can decide that the EcommerceDB database is semantically poor. In the same context, the Iscommerce database also provides a poor semantics. As a result, in the discovery phase, we can remove the EcommerceDB and Iscommerce databases. We eliminate these two databases based on the rule: RDB poor semantically implies ontology poor semantically [31].

The RDB exploration

Now, to choose the most relevant RDB among the remaining ones, we have to calculate the NS measure from the patterns depicted in Table 6.

Table 6 The set of patterns

As stated previously, the NS metric represents the number of semantic characteristics present in the relational database. Table 6 shows that the Sakila database covers all possible semantics that can be used to build a rich ontology from RDB. In this context, we compare also the total number of semantics per RDB as shown in Table 7. The first interesting observation is that the database having a high total number of semantics does not mean that it covers all the possible semantics as depicted in Table 7. For instance, the total number of semantics for the North database is 180, but the number of semantics is 10 (less than 17). Consequently, if we decided to build ontology from the North and e-commerce databases, the resulting ontology will not address the following semantics: inheritance, transitive, symmetric, value restriction, data range restriction, Functional and inverse Functional property. This leads to predict that the resulting ontology based on the North and E-commerce databases will be very poor semantically.

Table 7 The NS and the total number of semantics for each database

For the Northwind and Sakila database, the NS and the total number of semantics are (13,102) and (17,126) is 13 and its total number of semantics is 102. We can notice that, the number of instance of each database are respectively 2120 and 47,237 for Northwind and Sakila. As a result, the most appropriate relational database for building ontology is Sakila, because it covers the most important semantics and the large number of instances.

The ontology building evaluation

For the ontology building evaluation, we typically compared our resulting ontology against a gold-standard which is suitably designed for the domain of discourse [54]. This may in fact be an ontology considered to be well-constructed to serve as reference. As we aforementioned, the domain of discourse that we treat is E-commerce [55]. The ontology reference that represents the E-commerce domain is GoodRelations ontology [56]. It is a standardized vocabulary for product, price, and company data that can be (i) embedded into existing and dynamic web pages and (ii) processed by other computer. Generally, GoodRelations is used to facilitate creation of formal descriptions of product offering for electronic commerce. Table 8 shows the basic metrics of the GoodRelations Ontology versus the resulting ontology.

Table 8 The basic metrics of the GoodRelations versus the resulting ontology

The basic metrics of ontology provide the count number of classes, objects, axioms, properties and instances used in the ontology. Considering the result presented in Fig. 6, it is clearly seen that our resulting ontology covers more basic knowledge than the reference ontology with regard to the total number of classes, the total number of datatype property (TDP) and object Properties (TOP), the number of logical axioms (LAC), the number of axioms, and the number of instances (TINDV). For instance, the TINDV are 47,803 and 46 for the resulting ontology and reference ontology respectively. However, we cannot discuss the quality of the resulting ontology based on these metrics, because these metrics represent just the discriminative effect of the knowledge coverage [54] as shown in Fig. 6. In this respect, the two following subsections are well explained the metrics that we used to measure the quality of our ontology.

Fig. 6
figure 6

Discriminative effect of the knowledge coverage

The TBox evaluation

IR values close to zero indicate flat or horizontal ontology representing perhaps more general knowledge while large values represent vertical ontologies describing detailed knowledge of a domain. As depicted in Table 9, the IR for our ontology is 2357 while for GoodRelations is 0.5. This indicates that our resulting ontology describes the E-commerce domain better than the reference ontology. However, the relationships richness for the ontology reference is greater than the resulting ontology, which indicate that the reference ontology contains many relationships other than class-subclass relations, where our ontology is richer than a taxonomy with only class-subclass relationships. On other hand, the attribute richness for our resulting ontology is significantly greater than the AR of the reference ontology, which indicates that our ontology defined more knowledge than the reference ontology.

Table 9 The list of metrics for evaluating the resulting ontology

According to the result depicted in Table 10, we presented the TBox output of each surveyed approach using a specific OWL elements. In addition, the last column shows our mapping result. It is evident from this table that our ontology is greatly contained a high number of semantics compared to the other approaches.

Table 10 Ontological output of each mapping approach

The ABox evaluation

The first group of measures that we have considered for this validation is related to the knowledge distribution in the ontology. As we can see in Table 7, the average population (AP) for the resulting ontology is better than the ontology reference. Compared to the reference ontology, the value of the AP of our ontology, which is 2078.39, involves that our ontology offers a sufficient number of instances for describing the e-commerce domain. According the authors in [57], this metric is proposed to be used in conjunction with the class richness metric (CR). In this respect, we calculated the CR metric. The value of this metric confirms that our ontology’s classes are populated with a high number of instances with regard to GoodRelations ontology, and this is reflects the diversity of knowledge embedded in our A-Box.

The competency questions (CQs)

Now to validate the resulting ontology in its totality we have checked if it is able to answer the competency questions established previously. As depicted in Table 11, the positive Answer means that our ontology can provide the correct answer to the query, while Negative answer means that the ontology cannot answer the query. Therefore, our resulting ontology answered all the formulated queries with a positive feedback. These queries are formulated in SPARQL Query Language [58]. For a high-level description of each query, we refer the reader to our GitHub Link.Footnote 3

Table 11 A list of competency questions with answers

Eventually, we can conclude that our proposed life cycle shows sufficient exactitude to be used for selecting an appropriate database for building ontology and it is able to exhibit very accurate result. Note that the life cycle phases represents formal stages-gates; they save as criteria to help ontologist for answering a very important question: how to select a Relational database that provides a sharp and clear boundary between the relational model and ontological model. From this experiment, we can notice that, we start our experimental study with 6 databases, and during each phase in life cycle, we evaluated the outcome of this phase in order to check if we made enough progress to move to the next phase. As a result, instead of converting the six databases directly into ontology, we early removed some RDBs that are not contained the sufficient semantics for representing the ontological model.

Conclusion

To sum up, in this paper, we tried to gather the most important and contributing approaches in the subject of the mapping of the relational database to ontology. We attempted to provide the reader with concise overview of these approaches in terms of identifying the main drawbacks that the researchers in this field are faced as well as suggesting solutions. In addition, the biggest contributions within this paper are the following: (1) We propose a new life cycle for ontology learning from RDBs based on the software engineering requirements; (2) We describe a new method for building ontology from Relational database based on the predefined life cycle; (3) We add three new semantics that can be extracted from RDB; (4) we suggest an evaluation process based on two categories of metrics: (i) Conceptual Ontology (T-Box) Evaluation metrics; (ii) Factual ontology(A-Box) evaluation metrics. In future works, we aim to focus on the cleaning and conditioning the data embedded in the relational database in order to improve the quality of the resulting ontology. Also, we plane to focus on different structured sources of information such as Excel spreadsheet, comma-separated value (CSV), and SQL DDL files in order to integrate these diverse data Format. Finally, we plan to move toward the unstructured data sources for constructing ontologies.