Rule-based knowledge discovery of satellite imagery using evolutionary classification tree
Introduction
Economic development and shifting population demographics have significantly increased the pace of destruction of the natural environment and of land resources. The effective management of land resources is an essential step toward sustainable land use. Satellite Imagery (SI) technologies use space-based sensors to indirectly survey the planet or object around which its satellite platform is orbiting [1], [14], [19]. Since 1972, the United States has maintained Landsat satellites in orbit around earth. These satellites are designed to reflect objects using the sensor-receiving surface of solar electromagnetic radiation. Raw data from Landsat satellites are sent to earth in the form of numeric data. This data is the source of environmental resources information used by researchers. SI mining is used in real time surveys to cover extensive areas of land. SI mining has become an effective survey tool for building environmental resource databases.
The steps in SI mining are: (1) the satellite obtains image data by scanning the surface spectral reflectance intensity from the sensor’s spectrum; (2) staffs conduct investigations on site to obtain the surface classification information; (3) the relationship between the surface spectral reflectance intensity data and the surface classification information is established using appropriate statistical methods; (4) the established relationship may be directly applied to other surfaces only based on its associated surface spectral reflectance intensity data in order to determine its surface classification. These steps obviate the prior need to conduct on-site investigations, which reduces manpower and funding requirements considerably. The ability of SI mining to provide a quick understanding of data across a wide geographic area makes it a valuable tool in applications related to land utilization, agriculture and forestry planning, environmental monitoring, disaster assessment, and scientific research. However, similarities between different surface classifications of spectral reactions in SI mining currently make accurately distinguishing surface classification problematic. The purpose of this study is to propose an approach to resolving SI classification problems using an artificial intelligence (AI) data mining technique. This satellite imagery map shows the Zengwen Reservoir in Tainan city, Taiwan. The Zengwen Reservoir is the largest reservoir and the largest manmade lake in Taiwan.
Over the past few years (1), artificial neural networks (ANN) have been made significant contributions in the sciences. A significant body of literature [6], [10], [11], [17] has already been devoted to proposing complex nonlinear models able to predict the behavior of materials to a high degree of accuracy. However, the developed “black box” models are unable to generate explicit formulas or rules that explain the essence of the models. Furthermore, significant research has been dedicated to the realm of SI classification (2). Examples include the nearest neighbor classifier (NNC) [5] and (3) the inductive decision tree (IDT) [15]. However, those methods largely focus on accurately predicting the performance of a classification model while ignoring the issue of classification model understandability.
Some researchers have used the genetic operation tree (GOT), a hybridization of genetic algorithms (GA) and an operation tree (OT), to build models that are both able to predict material behaviors accurately and to explain the substance of the model [12], [13], [18]. An operation tree is a tree structure that expresses a mathematical formula. Optimizing the operation tree produces a self-organized regression formula. In general, the accuracy of GOT-generated models is lower than those generated by neural networks and more accurate than those generated by RA [12], [13], [18].
The strength of GA lies in its use of random-yet-directed search operators to locate global optima. Therefore, GAs are relatively less likely to become trapped within a local search [7] and thus are at higher risk of finding a suboptimal solution. Another important disadvantage of GA is the long run-time necessary to deliver satisfactory results for large instances of complex design problems.
A hybrid swarm algorithm, the particle bee algorithm (PBA), was proposed as an alternative to GA. PBA imitates the intelligent swarming behavior of birds and honeybees and integrates their advantages [3], [8], [9]. By improving on the neighborhood search of BA by using PSO search [3], [8], [9], PBA is able to solve discrete optimization problems, which is one paradigm of evolution computation. PSO search is based on natural evolution, derived from ideas on survival of the fittest, and has been successful applied to many case studies [3], [8], [9]. Several clear advantages of PBA include: global optimization, local optimization, exploration process, exploitation process, flexibility, and parallelism [3], [8], [9].
However, previous studies [2], [4], [12], [13], [18] used tree model only to build a self-organized regression formula and fewer address the classification and clustering problems of data mining. Therefore, this study was designed to develop a novel self-organized classification tree known as the evolutionally classification tree (ECT). The ECT uses PBA to optimize the tree rules structure.
Six hundred experimental [16] datasets were used to compare the accuracy and complexity of five model building techniques, including CT, BPN, SVM, and ECT, and to evaluate the ability of ECT to generate accurate classification trees that comply with satellite imagery data mining rules while being simpler and easier to comprehend than the other four techniques.
Section snippets
Particle bee algorithm (PBA)
A hybrid swarm algorithm, the particle bee algorithm (PBA), was proposed as an alternative to GA. PBA imitates the intelligent swarming behavior of birds and honeybees and integrates their advantages [3], [8], [9]. By improving on the neighborhood search of BA by using PSO search [3], [8], [9], PBA is able to solve discrete optimization problems, which is one paradigm of evolution computation. PSO search is based on natural evolution, derived from ideas on survival of the fittest, and has been
Experimental data
Angular second moment (ASM), contrast (CON), and entropy (ENT) are the 3 features common to all satellite imagery datasets. These 3 features are each surveyed by 4 sources: raw light, green light, infrared and red light. Thus, a satellite imagery dataset includes a total of 12 input variables. Dataset outputs address 6 different types of images: water, betel palm, building, cloud, orchard, and wood. Of the 600 experimental satellite imagery datasets collected, 200 were randomly selected as the
Evolutionary classification tree (ECT)
This study adopted PBA to optimize ECT operations and to produce the self-organized classification rules. Fig. 4 shows the result of ECT classification of satellite imagery data. Eleven rules were used in mining classification work on the 600 sets of experimental satellite imagery data.
The ECT mined 11 classify rules as the followed:
Rule 1: IF G_SOU 46.87 AND I_SOU 100.58 THEN Cloud
Rule 2: IF G_SOU 46.87 AND I_SOU 77.51 AND I_SOU 100.58 THEN Building
Rule 3: IF G_SOU 46.87 AND I_SOU
Conclusion
The results of this study demonstrate that ECT produces explicit rules that are more accurate than models produced by CT and SVM but less accurate than models produced by BPN. However, whereas BPN is a black box model, ECT is able to produce explicit rules, which is an important advantage when mining explicit rules and knowledge in practical applications.
Its ability to generate accurate self-organized classification models and rules makes ECT the preferred choice in cases where the user
CRediT authorship contribution statement
Li-Chuan Lien: Participated in the concept, Design, Analysis, Writing, Revision of the manuscript. Unurjargal Dolgorsuren: Participated in the concept, Design, Analysis, Writing, Revision of the manuscript.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Li-Chuan Lien received a Ph.D. in Department of Construction and Civil Engineering from National Taiwan University of Science and Technology, Taiwan in 2011. His research interests include construction and engineering management; construction automation and ecommerce; unman aerial vehicle and building information modeling; artificial intelligence Inference modeling and optimization.
References (19)
- et al.
Prediction of slump flow of high-performance concrete via parallel hyper-cubic gene-expression programming
Eng. Appl. Artif. Intell.
(2014) - et al.
A fuzzy polynomial neural networks for approximation of the compressive strength of concrete
Appl. Soft Comput.
(2008) - et al.
Genetic search for solving construction site-level unequal-area facility layout problems
Autom. Constr.
(2000) - et al.
A hybrid swarm intelligence based particle-bee algorithm for construction site layout optimization
Expert Syst. Appl.
(2012) - et al.
Particle bee algorithm for tower cranes layout with materials quantity supply and demand optimization
Autom. Constr.
(2014) - et al.
Predicting the compressive strength and slump of high strength concrete using neural network
Constr. Build. Mater.
(2006) - et al.
Knowledge discovery of concrete material using genetic operation trees
Expert Syst. Appl.
(2009) Introduction To Geographic Information Systems
(2004)- et al.
A hybrid AI-based particle bee algorithm (PBA) for benchmark functions and facility layout optimization
J. Comput. Civ. Eng.
(2012)
Cited by (3)
Sampling scheme-based classification rule mining method using decision tree in big data environment
2022, Knowledge-Based SystemsCitation Excerpt :Several beneficial methods have been proposed to improve the performance of single algorithms. First, in algorithm design, some scholars have recently proposed diverse evolutionary decision tree induction methods as a result of aggregating evolutionary algorithms to improve the global search ability where greedy methods have failed [28–32]. However, there have been some studies on the accuracy and interpretability of rules.
Cost-sensitive classification algorithm combining the Bayesian algorithm and quantum decision tree
2023, Frontiers in PhysicsSignificant association rule mining with high associability
2021, Proceedings - 5th International Conference on Intelligent Computing and Control Systems, ICICCS 2021
Li-Chuan Lien received a Ph.D. in Department of Construction and Civil Engineering from National Taiwan University of Science and Technology, Taiwan in 2011. His research interests include construction and engineering management; construction automation and ecommerce; unman aerial vehicle and building information modeling; artificial intelligence Inference modeling and optimization.
Dolgorsuren Unurjargal is currently studying for a doctorate in Civil Engineering at Chung Yuan Christian University in Taiwan. Destined to plan the construction project management with artificial intelligence. Also interested in urban planning; replanning with optimization; construction automation; building information modeling; and inference modeling.