Rule-based knowledge discovery of satellite imagery using evolutionary classification tree

https://doi.org/10.1016/j.jpdc.2020.09.003Get rights and content

Highlights

  • Most studies used few classification and clustering problems of data mining.

  • This study to develop a novel self-organized classification tree, ECT and to PBA.

  • Spectral reactions in SI mining make accurately determine surface classification problems.

  • Results shows ECT to produce SI classification is more accurate than CT and SVM less accurate than BPN.

  • ECT is the best model for users wanting to mine the explicit rules and knowledge in practical application.

Abstract

The classification tree (CT) may be used to establish explicit classification rules for Satellite Imagery (SI). However, the accuracy of explicit classification rules attained by this method is poor. Back-propagation networks (BPN) and the support vector machine (SVM) may both be used to establish highly accurate models for predicting the classification of SI. However, neither is able to generate explicit rules. This study proposes the evolutionary classification tree (ECT) as a novel mining rule method. Composed of the particle bee algorithm (PBA) and classification tree (CT), the ECT produces self-organized rules automatically to predict the classification of SI. In ECT, CT serves as the architecture to represent explicit rules and PBA acts as the optimization mechanism to optimize CT in order to fit the experimental data. A total of 600 experimental datasets were used to compare the accuracy and complexity of four model-building techniques: CT, BPN, SVM, and ECT. The results demonstrate the ability of ECT to produce rules that are more accurate than CT and SVM but less accurate than BPN. However, because BPN is black box model, the ability of ECT to generate explicit rules makes ECT the best model for users wanting to mine the explicit rules and knowledge in practical applications.

Introduction

Economic development and shifting population demographics have significantly increased the pace of destruction of the natural environment and of land resources. The effective management of land resources is an essential step toward sustainable land use. Satellite Imagery (SI) technologies use space-based sensors to indirectly survey the planet or object around which its satellite platform is orbiting [1], [14], [19]. Since 1972, the United States has maintained Landsat satellites in orbit around earth. These satellites are designed to reflect objects using the sensor-receiving surface of solar electromagnetic radiation. Raw data from Landsat satellites are sent to earth in the form of numeric data. This data is the source of environmental resources information used by researchers. SI mining is used in real time surveys to cover extensive areas of land. SI mining has become an effective survey tool for building environmental resource databases.

The steps in SI mining are: (1) the satellite obtains image data by scanning the surface spectral reflectance intensity from the sensor’s spectrum; (2) staffs conduct investigations on site to obtain the surface classification information; (3) the relationship between the surface spectral reflectance intensity data and the surface classification information is established using appropriate statistical methods; (4) the established relationship may be directly applied to other surfaces only based on its associated surface spectral reflectance intensity data in order to determine its surface classification. These steps obviate the prior need to conduct on-site investigations, which reduces manpower and funding requirements considerably. The ability of SI mining to provide a quick understanding of data across a wide geographic area makes it a valuable tool in applications related to land utilization, agriculture and forestry planning, environmental monitoring, disaster assessment, and scientific research. However, similarities between different surface classifications of spectral reactions in SI mining currently make accurately distinguishing surface classification problematic. The purpose of this study is to propose an approach to resolving SI classification problems using an artificial intelligence (AI) data mining technique. This satellite imagery map shows the Zengwen Reservoir in Tainan city, Taiwan. The Zengwen Reservoir is the largest reservoir and the largest manmade lake in Taiwan.

Over the past few years (1), artificial neural networks (ANN) have been made significant contributions in the sciences. A significant body of literature [6], [10], [11], [17] has already been devoted to proposing complex nonlinear models able to predict the behavior of materials to a high degree of accuracy. However, the developed “black box” models are unable to generate explicit formulas or rules that explain the essence of the models. Furthermore, significant research has been dedicated to the realm of SI classification (2). Examples include the nearest neighbor classifier (NNC) [5] and (3) the inductive decision tree (IDT) [15]. However, those methods largely focus on accurately predicting the performance of a classification model while ignoring the issue of classification model understandability.

Some researchers have used the genetic operation tree (GOT), a hybridization of genetic algorithms (GA) and an operation tree (OT), to build models that are both able to predict material behaviors accurately and to explain the substance of the model [12], [13], [18]. An operation tree is a tree structure that expresses a mathematical formula. Optimizing the operation tree produces a self-organized regression formula. In general, the accuracy of GOT-generated models is lower than those generated by neural networks and more accurate than those generated by RA [12], [13], [18].

The strength of GA lies in its use of random-yet-directed search operators to locate global optima. Therefore, GAs are relatively less likely to become trapped within a local search [7] and thus are at higher risk of finding a suboptimal solution. Another important disadvantage of GA is the long run-time necessary to deliver satisfactory results for large instances of complex design problems.

A hybrid swarm algorithm, the particle bee algorithm (PBA), was proposed as an alternative to GA. PBA imitates the intelligent swarming behavior of birds and honeybees and integrates their advantages [3], [8], [9]. By improving on the neighborhood search of BA by using PSO search  [3], [8], [9], PBA is able to solve discrete optimization problems, which is one paradigm of evolution computation. PSO search is based on natural evolution, derived from ideas on survival of the fittest, and has been successful applied to many case studies [3], [8], [9]. Several clear advantages of PBA include: global optimization, local optimization, exploration process, exploitation process, flexibility, and parallelism [3], [8], [9].

However, previous studies  [2], [4], [12], [13], [18] used tree model only to build a self-organized regression formula and fewer address the classification and clustering problems of data mining. Therefore, this study was designed to develop a novel self-organized classification tree known as the evolutionally classification tree (ECT). The ECT uses PBA to optimize the tree rules structure.

Six hundred experimental [16] datasets were used to compare the accuracy and complexity of five model building techniques, including CT, BPN, SVM, and ECT, and to evaluate the ability of ECT to generate accurate classification trees that comply with satellite imagery data mining rules while being simpler and easier to comprehend than the other four techniques.

Section snippets

Particle bee algorithm (PBA)

A hybrid swarm algorithm, the particle bee algorithm (PBA), was proposed as an alternative to GA. PBA imitates the intelligent swarming behavior of birds and honeybees and integrates their advantages [3], [8], [9]. By improving on the neighborhood search of BA by using PSO search [3], [8], [9], PBA is able to solve discrete optimization problems, which is one paradigm of evolution computation. PSO search is based on natural evolution, derived from ideas on survival of the fittest, and has been

Experimental data

Angular second moment (ASM), contrast (CON), and entropy (ENT) are the 3 features common to all satellite imagery datasets. These 3 features are each surveyed by 4 sources: raw light, green light, infrared and red light. Thus, a satellite imagery dataset includes a total of 12 input variables. Dataset outputs address 6 different types of images: water, betel palm, building, cloud, orchard, and wood. Of the 600 experimental satellite imagery datasets collected, 200 were randomly selected as the

Evolutionary classification tree (ECT)

This study adopted PBA to optimize ECT operations and to produce the self-organized classification rules. Fig. 4 shows the result of ECT classification of satellite imagery data. Eleven rules were used in mining classification work on the 600 sets of experimental satellite imagery data.

The ECT mined 11 classify rules as the followed:

Rule 1: IF G_SOU 46.87 AND I_SOU 100.58 THEN Cloud

Rule 2: IF G_SOU 46.87 AND I_SOU 77.51 AND I_SOU < 100.58 THEN Building

Rule 3: IF G_SOU 46.87 AND I_SOU <

Conclusion

The results of this study demonstrate that ECT produces explicit rules that are more accurate than models produced by CT and SVM but less accurate than models produced by BPN. However, whereas BPN is a black box model, ECT is able to produce explicit rules, which is an important advantage when mining explicit rules and knowledge in practical applications.

Its ability to generate accurate self-organized classification models and rules makes ECT the preferred choice in cases where the user

CRediT authorship contribution statement

Li-Chuan Lien: Participated in the concept, Design, Analysis, Writing, Revision of the manuscript. Unurjargal Dolgorsuren: Participated in the concept, Design, Analysis, Writing, Revision of the manuscript.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Li-Chuan Lien received a Ph.D. in Department of Construction and Civil Engineering from National Taiwan University of Science and Technology, Taiwan in 2011. His research interests include construction and engineering management; construction automation and ecommerce; unman aerial vehicle and building information modeling; artificial intelligence Inference modeling and optimization.

References (19)

There are more references available in the full text version of this article.

Cited by (3)

  • Sampling scheme-based classification rule mining method using decision tree in big data environment

    2022, Knowledge-Based Systems
    Citation Excerpt :

    Several beneficial methods have been proposed to improve the performance of single algorithms. First, in algorithm design, some scholars have recently proposed diverse evolutionary decision tree induction methods as a result of aggregating evolutionary algorithms to improve the global search ability where greedy methods have failed [28–32]. However, there have been some studies on the accuracy and interpretability of rules.

  • Significant association rule mining with high associability

    2021, Proceedings - 5th International Conference on Intelligent Computing and Control Systems, ICICCS 2021

Li-Chuan Lien received a Ph.D. in Department of Construction and Civil Engineering from National Taiwan University of Science and Technology, Taiwan in 2011. His research interests include construction and engineering management; construction automation and ecommerce; unman aerial vehicle and building information modeling; artificial intelligence Inference modeling and optimization.

Dolgorsuren Unurjargal is currently studying for a doctorate in Civil Engineering at Chung Yuan Christian University in Taiwan. Destined to plan the construction project management with artificial intelligence. Also interested in urban planning; replanning with optimization; construction automation; building information modeling; and inference modeling.

View full text