Abstract

Forestry is an undoubtedly crucial part of today’s industry; thus, automation of certain visual tasks could lead to a significant increase in productivity and reduction of labor costs. Eye fatigue or lack of attention during manual visual inspections can lead to falsely categorized wood, thus leading to major loss of earnings. These mistakes could be eliminated using automated vision inspection systems. This article focuses on the comparison of researched methodologies related to wood type classification and wood defect detection/identification; hence, readers with an intention of building a similar vision-based system have summarized review to build upon.

1. Introduction

Wood recognition is crucial technology in various areas of modern industry, for example, covering the construction industry or various manufacturing processes. There are basically two main ways on how to identify timber—either a trained professional inspects each individual log or the whole batch is processed by computer vision. Both ways work very similarly. Various species of wood exhibit certain characteristics or features, which must be recognized to successfully sort each sample [1]. Different features can also significantly influence wood quality, thus influencing quality of the final product [2]. Wood material inspection is necessary, as timber of a lower level of quality can only be used for certain purposes.

In constructions, the choice of the right wood type/quality is crucial, as it influences material used for a roof truss. Lower quality of chosen wood might lead to instability of the whole roof, which could end in disaster [3]. Similarly, various wood products, such as furniture, require certain quality of used wood material. The right wood type can also influence quality of manufactured paper [4].

Other than that, certain species of wood are endangered, and their export is banned. However, a significant number of these endangered trees are mixed in piles of ordinary wood (criminality further explained below) [5].

As mentioned before, trained human professionals are often used for wood classification. However, this process is time-consuming and tedious, so these inspectors tend to be bypassed or have a high identification error rate, often caused by fatigue. Various computer vision techniques, discussed further below, can be also used for classification. The error rate and reliability of different techniques tend to vary as well, but more advanced systems could completely replace the human factor, thus increasing yields of companies significantly, as different types of wood species have vastly different values. These somewhat automated systems can be also used in crime scene investigation, ancient architecture classification, or ecology studies focused on relationships between various species.

One of the problems is the number of well-trained inspectors in the current market environment. Their number tends to stagnate or slowly decrease, which contradicts the ever-growing industry. That is why many companies started to develop computer vision alternatives, such as Kenalkayu, achieving a recognition rate of approximately 90% [6].

The article is focused on visible spectrum-based vision systems and other alternative methods presented in Alternative Data Acquisition Methods. Unfortunately, these alternative methods are often lacking and offer lower accuracy than standard vision-based systems; therefore, this section was added to complete the review.

The first part is primarily focused on wood classification problematics, the motivation behind it, and intentions of researchers. Samples of some datasets are presented, along with a brief analysis of wood structures and types. Moreover, the summary of all used methods is presented along with a brief description of results.

The second part is structured in a similar way. However, it is focused on wood defect identification. In contrast to wood classification, the second subsection mainly covers wood defection types. The significant part is also focused on the mathematical description of the most frequently used methods. The section is chronological—the newest methods are presented and described as last.

In the final part (Conclusion), the future evolution is discussed, along with potentially beneficial ways of research.

2. Wood Recognition and Quality

Several papers focused on wood quality, type of wood recognition, and defect recognition tasks have been published, especially by institutes from countries with significant wood industrial potential. Since it is necessary to process a significant amount of wood in the shortest time possible, the emphasis is placed on speed and quality of assessment. The quality of wood can be assessed based on the following parameters:(i)Wood type classification: [722](ii)Wood defects:(a)Knot detection [2334](b)Crack detection [27, 28, 3135](c)Wood grading [35](d)Holes [26, 33](e)Resin pockets [35](f)Discoloration [26](g)Joints [28](h)Stains [33]

The human eye and magnifying glass have been the most common historical instruments used to assess the quality and type of wood. According to Cao et al. [27], humans can achieve reliability of just around 70% and are error-prone in long-term wood quality assessment, as eye fatigue is an inseparable part of the inspection routine. In addition, demands on inspection speed are significantly increasing, leading to the lack of competent manpower.

Recognition is a significantly more challenging task in tropical countries, as they have a much higher diversity of tree species in comparison to temperate countries. Highly trained experts are needed to adequately assess wood in these areas, as characteristics of individual trees vary greatly. Over a thousand tree species with unique characteristics which must be cross-referenced by microscopes are currently assessed by trained workers [9]. The process of training requires talent and hard work, which can lead to mastery of regional tree species. Nowadays, the biggest problems lie with the limited manpower and speed of each assessment [12].

Some countries made significant investments in specific areas of the timber industry focusing on their internal issues and interests. Almost every referred article published by Malaysian, Indonesian, or Brazilian Institutes [710, 12, 15, 16, 18, 20] is focused on wood type assessment, because of illegal trade within their territory. Thousands of tree species are present in their forest areas (described by [36]), and proper inspection and classification of processed pieces in higher volumes are practically impossible, as these countries lack any convenient technology. This leads to smuggling of endangered and rare tree species, or even underpricing at corrupted customs, which leads to significant loss of money. Moreover, Paula et al. and Carpentier et al. [11, 14] mentioned in their article that even a trained expert cannot guarantee reliability of his inspections, when some particular features like shape of leaves or needle distribution are not present during truck loading (Figure 1). In the case of observations by a magnifying glass, even a subtle difference in the wood structure can serve as a key dividing element, thus increasing the error rate of visual inspections.

Scandinavian institutes focus their attention on sawmill automation [13, 30, 35]. Their natural resources consist of European standard coniferous trees. In general, their datasets include fewer tree species in comparison to Malaysians mentioned above, but the structure in cross-section is similar because of natural adaptations to Nordic climatic conditions.

China contributed with significant research as well [23, 25, 2729, 32, 34]. They focus on defect detection. The Chinese timber industry is mainly built on plywood slab production, which provides fundamental material for manufacturing of furniture, decorations, packaging, and construction parts. Defects negatively impact mechanical properties, which subsequently influence price and overall quality of products. According to Yang et al. [32], the annual output of plywood in China reached up to 170 million cubic meters in 2016.

The identification of hardwood has significant importance as well. The article presented by Tou et al. [12] pays close attention to structural differences which are crucial in civil engineering and could cause fatal failures in case of incorrect building material selection. For instance, the wrong choice of wood type for roof pillars can lead to subsequent collapse of the building.

3. Illegal Logging and Trading

Illegal forest activities and illegal logging are a significant problem in today’s society. Illegal forest activities were defined by Tacconi et al. [38] as all illegal acts that relate to forest ecosystems, forest-related industries, and timber and nontimber forest products. This definition however excludes activities such as processing of illegal timber (or illegal processing of timber if the manufacturer does not have appropriate licenses), trading of illegal timber, illegal expropriation of customary forest lands, or even illegal conversion of forest land [39]. Many bigger markets, including the US, the EU, and Australia, adopted laws prohibiting illegally harvested timber products from entering their markets. However, when China introduced a domestic logging ban in 1998, it became the world’s largest importer of tropical timber [5, 40]. It is also a key processing country, manufacturing 40% of global furniture, which is then imported to the US and Europe [41]. Unlike other mentioned countries, China does not have dedicated legislation concerning curbing of illegal timber imports [40].

Illegal logging affects many timber species, which are often rare and endangered, but also highly valuable for export. Many of these endangered species have higher economic values due to their unique physical and chemical properties (including color, texture, odor, and hardness of wood) or even cultural value [42, 43]. Higher value in turn increases rarity/scarcity of harvested trees, intensifying their threatened status or even driving them to extinction. Among these rare species are mahogany, rosewood, and ebony wood. Each of these species has its own characteristics. These wood species are generally used in specific markets (high-value products) such as parquets, furniture, boats, or musical instruments [44, 45].

During illegal tree harvesting, criminal loggers face serious obstacles, in the form of local residents, inspectors, and law enforcers. Among these people are also forgers of logging permits and timber certifications. People who can arrange the shift from “illegality” to “legality” are necessary intermediaries between both worlds. Their “pay” may vary greatly, according to quality, type, and amount of wood. For example, in the case of wood trafficking from Indonesia to Malaysia, Malaysian businessmen were paid around 10–20 euros for one cubic meter of meranti wood, while its price on the European market was approximately 200 euros [46].

The wood classification system based on computer vision could at least partially bypass a significant amount of these illegal “businessmen,” especially the factor of trained wood inspectors, which sort each individual log. However, this system needs a high degree of security, as criminal loggers often employ hackers, who “legalize” wood in internal systems of the affected countries.

Figure 2 represents one of the often used trafficking techniques. Two wood producers, apart from supplying the local production, sell their logs to a third party processing plant. However, one of the producers takes part in illegal logging. The third company basically mixes legally and illegally logged wood together, which leads to wood laundering. The final processed product is branded as manufactured from legal wood, while it might contain parts from illegal activities.

4. Wood Type Classification Dataset

A dataset of Malaysian macroscopic images of tree cross-section acquired by the Centre for Artificial Intelligence and Robotics (CAIRO) is often used as a starting point of case validation [8, 9, 12, 16, 21, 22, 47]. This dataset consists of over 100 species of tropical trees. Each species has fifty images for training and fifty images for testing. Images were taken by a grayscale Picolo CCD camera with 10x magnification and can be seen in Figure 3. It is also used for real-world applications, such as Kenalkayu, which is a state-of-the-art pattern recognition technology developed by the earlier mentioned CAIRO. Unfortunately, this dataset is not publicly available.

On the other hand, a more extensive macroscopic dataset was acquired and released by Brazilian University UFPR (Federal University of Parana) in 2010. Figure 4 shows different samples from the UFPR dataset. The database consists of colored, JPG, high-resolution photos () without compression, counting 41 species of the Brazilian flora. Each species includes at least 50 pictures, and the dataset itself has 2942 pictures in total. It might seem insufficient for neural network application, but since high-resolution photos were obtained, each picture can be sliced into multiple different ones. This database, collected by SONY DSC T20 with macro function, was already used by several Brazilian researchers [10, 11, 48]. The dataset is publicly accessible.

In 2011, UFPR released a microscopic database, containing 112 different catalogues of forest species [7]. All images were acquired by an Olympus Cx40 microscope with 100x zoom. The database itself consists of 2240 microscopic PNG images, with a resolution of pixels, which were labeled by experts in wood anatomy. Of the 112 available species, 37 are softwoods and 75 are hardwoods. This database cannot be used for color-based recognition, since its hue depends on the current used to produce contrast in the microscopic images. All images can be freely converted to grayscale for further work. This database was presented by university staff and successfully tested on gray level cooccurrence matrix (GLCM) and Local Binary Pattern (LBP) methods described further below [7].

The only thesis devoted to the classification of nontropical wood was published by Shustrov [13]. He worked with a self-made database of 3 coniferous tree types (spruce, fir, and pine) covering 1115 board samples divided into 242938 high-quality image patches. Figure 5 shows different samples from Dmitrii Shustrov’s database.

The Brodatz texture dataset is also one of the currently available datasets. This dataset covers 112 grayscaled textures, which are segmented into 16 disjoint images with a size of . The whole dataset is somewhat outdated, since it was created in 1999 [49], and the resolution is unfortunately pretty low. However, it can be used for GLCM, a texture classification method, further described below [50].

A novel idea of autonomous forest inventory gathering using drones was presented by Carpentier et al. [14]. His solution is built on tree type recognition based on bark images. This approach is advantageous in numerous ways, because despite seasonal changes, the bark is always present, and even if logs are cut and stored in a lumber yard, the bark remains. Figure 6 shows different samples of bark.

5. Wood Type Classification Results

The sum of various results is summarized in Table 1, and crucial settings with the best performance were highlighted. Since multiple methods including either different feature extractors or various classifiers were mostly tested within each article, all used approaches are mentioned under the “methods” column, so every compared method is listed. A blank field means that only one approach was used, and it is listed under the column “features and classifiers,” where in the case of multiple methods, only the most successful one is listed. The dataset column shows the number of pictures used for testing in total, mentioning the image ratio in brackets that is used for training validation and testing. The last values are missing in some cases, signalizing that there was no separate dataset used for the final phase of testing. The only exception can be noticed in [14] where 5-fold cross-validation was used. It means that the whole dataset was split into 5 folders, each serving as a validation set in one epoch, while others are used for training. The results of all 5 validations are averaged for the final accuracy value. Even though this technique is computationally expensive, it provides an objective conclusion. Dataset expansion methods were found to be very useful as well, as the system accuracy significantly increases with bigger datasets. Two types of dataset expansion are listed: subimages and augmentation. The first one is based on splitting of the original high-resolution picture into multiple low-resolution ones. For instance, Shustrov [13] expanded his dataset of 1115 pictures into 255724 patches and that is one of the main reasons his system performed so well. The other method called augmentation is based on artificial enhancement of the original picture. Algorithms like flipping, rotation, color jittering, scaling, and cropping are very popular in this area. For example, Tou et al. [16] enlarged his dataset by employing rotational augmentation from the original 12 pictures to 600.

Since different datasets and conditions were presented by all authors, it is not totally objective to compare their system performance just by achieved accuracy. Even with the increasing number of categories, the task becomes much more challenging. The general rule says that when the system is trained on bigger datasets, it will perform better on newly acquired data.

Promising results were accomplished by convolutional neural networks [13, 14] which have recently become popular. They are computationally expensive; therefore, significant resources are needed for real-time applications. However, as their computation can be accelerated by a Graphic Processing Unit (GPU), satisfactory speed can be obtained. Classical neural networks in the form of classifiers have proved to be useful as their accuracy is greater than 95% [9, 21]. Unfortunately, their performance is highly dependent on the choice and setting of a feature extractor. Mentionable performance was achieved by nonneural machine learning classifiers Linear Discriminant Analysis (LDA) and support vector machine (SVM) in [10, 18].

A successful methodology was proposed by Hafemann et al. [48], who introduced in-house designed convolutional neural network architecture. Original images were split into smaller patches leading to an increased training dataset (approx. 3 million images). A lightweight model in combination with the region split voting method (described in Region Split) achieved 97.32% accuracy.

6. Wood Defect Description

Many types of defects were introduced and inspected within published articles. Some of those refer to identification or localization of basic elements in general [25, 29], like knots, splits, and cracks (Figure 7), but other cases required more specific classification of individual defect types with respect to various mechanical properties [23, 26, 3035]. Specific mechanical differences between individual defects are described by Berthellemy [53], which focuses on timber bridge construction, where not only the exact type of defect must be recognized but also orientation and age which play an important role as a quality indicator.

Basic differentiation between types of knots was described by Gu et al. [30]. They stated that for most applications, sound knot (Figures 7(a) and 7(b)) along with pin knot (Figure 7(d)) is considered harmless and does not influence mechanical properties to any significant extent. On the contrary, black knots (Figure 7(c)), knot holes (Figure 7(f)), stripes (Figure 7(h)), splits (Figure 7(i)), and wanes (Figure 7(j)) are described as quality reducers. In addition, quality indicators vary according to individual standards and field of application. For instance, solution of Yang et al. [32] required just basic differentiation into 4 categories. The opposite case was introduced by Ruz et al. [33], who managed to distinguish 10 types of defects.

7. Wood Defect Classification Results

It might seem that every listed author achieved good results. However, with respect to a number of sorted categories along with extent of the presented dataset, some articles [26, 29] did not present universal and sufficiently robust solution. On the other hand, there are some [30, 33] which achieved a promising score, both based on SVM classifiers. Two exceptions [27, 34] based on Near-Infrared Spectroscopy (NIRS), instead of casual cameras, accomplished competitive results as well, but since the method is based on different technology, it cannot be analyzed as equal.

Not all researched papers are included in Table 2, as their methods or results were not comparable in any scale [23, 28, 35]. These were focused more on research and contained no presentable results of described methods. Even so, they could serve as useful inspiration, especially in the case of Kauppinen’s dissertation [35] covering description of multiple color-based machine vision methods.

Some articles cover additional information [23, 29, 32, 33], because they cover not only defect recognition but also its localization during preprocessing.

Two interesting CNN solutions were introduced in recent years. The first one [55] employs the method of splitting images into multiple sections, which are evaluated, and faulty segments are identified using a lightweight CNN model. Even though the used architecture DeCAF [56] is not as computationally intensive as newer successors, the patches managed to cover areas only up to pixels. However, in real-time deployment, the splitting and evaluation of high-resolution images would lead to a significant increase in computational complexity. The authors managed to implement defect segmentation as well (described more in the next subsection).

The second one, the Faster R-CNN-based solution [57] covers multiple wood veneer defect detection, using a  mm region of interest. The proposed system managed to distinguish between 4 types of defects. With respect to accuracy, the most successful model based on ResNet152 reached up to 96.1% in the case of differentiating between faulty and nonfaulty regions and up to 80.6% in the case of distinguishing between types of defects. On the contrary, ResNet152 inference time (48.01 ms) was 7 times longer than computationally economical AlexNet architecture, which was in terms of accuracy just 0.6% behind ResNet152.

Although the algorithm was released in 2016, the Faster R-CNN is the newest proposed method among the researched papers focusing on wood defect detection [58]. However, it can not be certainly used as state-of-the-art solution. Promising results were achieved by the lastest YOLO-V5 CNN architecture, which was released in June of 2020. A comparison of YOLO and Faster R-CNN was carried out by Dwivedi [59], and all results favored the YOLO-V5. Unfortunately, no additional official papers were published by YOLO authors [60]; therefore, the further development of the algorithm is expected. Another promising state-of-the-art model EfficientDet [61] could provide some significant improvements as well, since the EfficientDet model currently leads to COCO dataset-based ranking [62].

8. Defect Segmentation

Defect localization became an attractive machine vision community topic, although for the most of the tasks, bounding boxes are a sufficient solution. Nevertheless, in some specific cases, the precise localization of defect position with respect to pixel resolution can be beneficial.

Unfortunately, only a limited number of publications expressed an effort to locate the exact positions of classified defects.

In the first researched publication, Zhang et al. [29] introduced a method build on assumption, by which every defect has an easily distinguishable edge. In the first phase, the image is preprocessed by a canny edge detector. The process is supposed to subtract foreground with a potentially defective area and consequently classify segmented image. Even though the stated results showed potential, in the case of edge detection of knot with hardly spotable edges, the substracted mask does not seem to differentiate the faulty region from the nonfaulty one (see Figure 8). As only 3 examples were presented and none of them included a picture without defect, the method cannot be objectively evaluated.

A different approach was chosen by the Northeast Forestry University research team [34], which evaluated wood quality using near spectroscopy. Their tested segmentation method was based on hit-or-miss transformation (HTM) followed by PCA classification of defect. Similarly to Zhang et al. [29], only 3 samples were presented in the publication and none of them included a nondefect scene.

The latest deep learning approach was introduced by Ren et al. [55]. The proposed method was based on lightweight convolutional neural network architecture DeCAF which was originally introduced in 2013 [56]. A pretrained model was used as a feature extractor; thus, the extensive dataset was not required for the initial training. Initially, the image was split into multiple regions, which were assessed individually. In the case of positive defect classification, the Class Activation Map (CAM) [63] method was used to identify the discriminative region.

9. Alternative Data Acquisition Methods

Even though this research is focused on machine vision recognition image-based systems related to a visible spectrum, some alternative approaches with satisfying results were published as well. One such approach is the analysis of the spectral radiation reflected from the surface of the wood using a special radiation source. For example, in some publications, the deployment of Near Infrared (NIR) [27, 64] or Mid-IR [65, 66] was proposed. Another example is fluorescence spectroscopy technology [67]. Such systems consist of a spectrometer, a laser source, and an optical filter. In [68], a system for wood species recognition was developed, where ultrasonic signals were used as input features. Different wood types have different elastic reactions which are caused by their own structure of cellular characteristics. The signal that has passed through the radial, tangential, and longitudinal surfaces of the wood is used as input for the classification system.

Some alternative nonconventional camera approaches in the final testing/recognition phase are based on image processing of acquired data.

Specific beneficial properties were observed in X-ray sensing. X-ray can be used to localize rotten knots or hollow hearts, as proposed by Mu et al. [25], or even for tree-ring detection as mentioned by Piuri and Scotti’s article [67].

A unique method was tested by Jordan et al. [68] who obtained images of tree bark of cedar and cypress using terrestrial lidar (Figure 9) and managed to achieve accuracy of nearly 90% by using them in a convolutional neural network. However, as Carpentier et al. [14] proposed more accurate and robust solution using standard images, this method was not expanded further.

10. Features and Classifiers

Researchers tend to use various methods in their approach. This section will cover the basics of each individual approach and theoretically describes methods which had the best results according to Tables 1 and 2. Only the works which had the highest performance in respect to the size of dataset and number of categories were selected. These methods have high possibility of deployment in newly developed systems.

Since all values of image pixels cannot be fed to a conventional classifier straight away, as it would lead to poor performance, important features, represented by a vector, must be extracted from the image. Multiple methods were proposed for solving this issue including machine learning-based categorization systems related to machine vision, but not a single universal solution was discovered. The accuracy of the system still partially depends on the designer’s intuition, as different types of scenes include different recognizable features. These features can be obtained from various mathematical and physical points of view. Some tasks are well recognized with respect to their color, and others could perform better with distribution of cooccurring pixel values. Furthermore, some articles proved that the combination of different feature extractors leads to enhanced overall performance [10, 18, 19, 33].

Most of the researched solutions are based on the model illustrated in Figure 10. Some exceptional experimental approaches were published, where authors choose to create their own features or classifiers after observing some similarities or differences between classes. This approach was presented in [12], where thresholding was used for classification. In [30, 33], one set of features was represented as extracted geometrical dimensions of defects. Color-based self-made features could be found in [30]. In [11], the authors unusually choose extraction of particular color channels from different color spaces.

11. Feature Extraction Methods

11.1. Gray Level Cooccurrence Matrix

In order to capture essential information about the structural arrangement of the surface, two types of texture features can be used. The first-order statistical features based on the histogram of an image, and second-order statistical features derived from the GLCM.

The gray level cooccurrence matrix, also called gray level dependency matrix, is therefore defined as a two-dimensional gray level histogram for pixel pairs, which are separated by a fixed distance along a specific direction (usually horizontal, vertical, diagonal, or antidiagonal). Figure 11 illustrates the process of formation of the GLCM for the horizontal direction with a step of 1 [69].

The elements of the GLCM can be considered as probabilities of finding the relationship between gray level to gray level . With respect to those, we can calculate one of the selected features.

Energy feature:

Entropy feature:

Contrast feature:

Homogeneity feature:

GLCM was successfully used by Khalid et al. on Universiti Teknologi Malaysia in 2008 [9]. At that time, this research was groundbreaking, as he was the first who achieved accuracy greater than 95% on the dataset consisting of 2100 images in 20 categories. The feature vector was categorized by a shallow neural network-based classifier. Even though the dataset contained macroscopic monochromatic pictures of tree cross-sections, this solution could be potentially used in a real-time automation system as well.

Khalid et al.’s article was multiple times cited and served as an inspiration for research in this field in multiple consequent years. Publications [18, 20], which had a similar approach, managed to improve performance and differentiated even more categories.

11.2. Gray Level Aura Matrix and Basic Gray Level Aura Matrix

One of the approaches to find a feature inside an image is to look at neighboring pixels. These methods work with a so-called structural element, which is the by matrix (in some rare cases, it even can be a different object), which defines a pattern inside an image. One of the commonly used image operations in this case is based on a so-called aura, where aura is a measurement of distance of two subsets and , both belonging to a structural element , and denoted by while is the neighboring system [70]:

The aura of with respect to characterizes how the subset is present in the neighborhood of . A grayscale algorithm uses only the grayscale variants of original images with intensities in the range of 0–255, to reduce the computational complexity in comparison to fully colored images. is represented by the following equation:where is the gray level set corresponding to the th level of the intensity and is the aura measurement between two aura sets given by the equation above. In essence, the Gray Level Aura Matrix (GLAM) of an image characterizes the probability distribution of each gray level in the neighborhood of each other gray level, generalizing the GLCM [71].

Basic GLAM (BGLAM) is GLAM computed from only a single site neighboring pixel system. This approach was very successful at measuring image similarities for textures, image retrieval, and texture image retrieval or to classify tree species with an improved result compared to GLCM [72, 73].

The first paper which provided a promising result by deploying the BGLAM feature extractor (along with SPPD and GLB) on a grayscale wood dataset, consisting of 5200 images from 52 categories, was presented by Yusof et al. [18] from Universiti Teknologi Malaysia in 2013. Yusof et al. greatly enhanced the function of the used algorithm by employing a genetic algorithm which reduced dimensionality of the feature vector. It led to the reduction of computing resources and significant improvements of accuracy. Even though the deployment of a genetic algorithm was beneficial, It was not used in any other paper. It can be considered as a promising method for performance enhancement in tasks which are based on combination of feature vector extractor and classifier. Classification of Yusof et al.’s solution was performed by the LDA classifier (described in Linear Discriminant Analysis) and reached accuracy of 98.69%.

Yusof et al.’s research was consequently followed by Ruz et al. [33] in 2016, who used the same dataset and increased accuracy to 99.84% by exchanging the LDA classifier with SVM. Even though it has the best results stated in this paper, it cannot be considered as a universal solution since it was not used on new popular extensive datasets (which include thousands of images [1, 2]) and objectively compared with performance of neural networks in general.

Both mentioned articles [18, 20] stated that their results were compared to GLCM and find GLCM less accurate than BGLAM especially with the increasing number of categories. Their solutions are in general advantageous for pattern recognition, where rotation invariance is needed, and only a limited dataset is provided for training. These methods could be possibly implemented in an embedded system with limited computational resources.

11.3. Local Binary Patterns and Completed Local Binary Patterns

LBP is based on the combination of GLCM with previously mentioned thresholding. Just as in GLCM, the LBP describes each pixel by the relative graylevels of its neighboring pixels. The descriptor describes the result over the neighborhood as a binary number (binary pattern) [74]:

In order to code the local image in a better way, Guo et al. proposed the Completed LBP (CLBP) pattern for texture classification. CLBP has three different components, CLBP-S indicates the sign (positive or negative) of difference between the center pixel and local pixel, CLBP-M indicates the magnitude of the difference between the center pixel and local pixel, and CLBP-C indicates the difference between local pixel value and average central pixel value. CLBP-S basically describes conventional LBP [75].

LBP along with PLS, Gabor filters, fractals, GLCM, and the self-proposed colored-based algorithm were used in the Filho et al. [10] research. Filho et al. were examining performance of each individual feature extractor and subsequently impact of their various combinations. Interestingly enough, CLBP, LBP, and the self-proposed colored-based algorithms were present in every successful combination.

Performance of Filho et al.’s system would not exceed its predecessors without the so-called technique “divide and conquer,” which is described in Region Split. Filho et al. described dependency between the number of splits and their relationship to performance of the whole platform. They stated that image splitting is beneficial only up to a certain number of patches. This approach proved to be beneficial in every solution it was deployed in. The self-made dataset consisting of 2942 pictures and 41 categories was used for training and validation.

12. Classification Methods

12.1. Support Vector Machine

SVM is one of the most popular supervised classification algorithms with the ability to handle noisy and high-dimensional data. In a simple term, the SVM classification tries to find an optimal line or a hyperplane, capable of separating objects that have different class memberships. The SVM method then tries to separate the given samples by a hyperplane in such a way that the separation between the two classes, denoted as a margin, is as wide as possible (Figure 12) [76].

In mathematic formulation, it is necessary to minimize the vector of marginal distances from a separating “line.” This task is a nonlinear optimization task solved by Karush-Kuhn-Tucker conditions in combination with Langrange multipliers. Improvements can be made using a least square computation together with the support vector machine algorithm, creating a combination called LS-SVM. This idea was used to classify woods together with NIR spectrometry [77] and also solves most disadvantages of SVM [78].

12.2. Linear Discriminant Analysis

In multiple cases, more than just 1 or two features are used for classification; it is hard to track the -Nearest Neighbor (KNN) and any other multispace classification algorithms. One of the suggested solutions applies statistic methods. LDA uses multiple spaces, to create a new space, and projects data onto a new space in a way to maximize the separation of the two categories, reducing the multidimensional graph into a 1D graph (or at least less dimensions). This is carried out by calculating means and variances. Calculation itself is carried out by the ratio of the variance between the classes to the variance within the classes. Modifications of LDA can be used to successfully classify whole images, instead of vector spaces only [79]:

The equation is an example of the multidimensional problem, which is in this case a 2-dimensional problem representing 2 features calculated with the methods mentioned above. The problem is the same for all classification methods.

12.3. Artificial Neural Network

The Artificial Neural Network (ANN) is a classification and machine learning system which attempts to process the information in the same way as a biological neural network does (Figure 13). Biological neural networks consist of an enormous amount of organized nerve cells, called neurons, which interact with each other in parallel.

The logical principle of the neural network is finding a set of weights, which solve the required problem with a least means square error. It is mostly archived through the process called backpropagation, where the results of forward pass of the neural network are compared with a desired output value (known solution for the required inputs). After evaluation, the given error is backpropagated, and based on gradient computation, a set of new and adjusted weights are configured in the system [80]:

12.4. Multilayer Perceptron

One of the possible topologies for a certain layer of neurons is to use multilayer perceptron. Multilayer perceptron (MLP) consists of at least three layers of nodes: an input layer, a hidden layer, and an output layer. Except for the input nodes, each node is a neuron that uses a nonlinear activation function. Some cases of MLP can be constructed of a single hidden layer, and in some cases, each neuron is connected to every single neuron in the previous and also in the next layer of the neural network. An example of this topology was used to classify wood, with 20 hidden neurons and 180 neurons in the output layer and matrix of input layers [81].

Multilayer perceptron (shallow neural network) was used as a classifier in multiple papers. Even though Tou et al. [8, 50] had a limited dataset and results of their methods were relatively poor in comparison to other solutions, their work showed potential of a neural network (NN), as its performance clearly outperformed KNN.

On the contrary, Ruz et al. [33] in 2009 designed a system where the pairwise version of SVM outperformed MPL by 7.51% reaching 91.39%. The research included methods of defect segmentation based on the fuzzy min–max function (FMMI). Only features based on geometrical and color properties of segmented defect were used by the classifier. Cao et al. [27] and Yu et al. [34] presented their solution for defect detection in 2017. They distinguished 4 categories with about 400 samples by using alternative NIR technology for defect classification. Overall performance had promising results, exceeding over 92% accuracy in both cases. When analyzing the same 1D dataset, the NN classifying method performed 9% better than PCA-PLS [34], 4.17% better than SVM [27], and 10.83% better than PLS [27].

13. Convolutional Neural Networks

The convolutional neural network (CNN) is a modification of ANN, where instead of using standard perception layers, a convolution is used (max pooling layers), leading to fully connected layers and output. There are many different types of architectures, but the main difference is that an input to this type of neural network is a 2D array. The convolution operation is inherited from a frequency property of an image and also together with max pooling can lead to a reduction in transition from layer to layer. This is a valuable property for classification, because a huge number of inputs can be reduced to a small number of outputs:

One of the modern approaches in CNN includes a massive number of layers and parameters. These factors affect the training time, and in order to train CNN from the scratch, it takes a lot of time even when such computational resources as GPU are used. This task can be solved by the concept of transfer learning [82].

Ever since the CNN algorithm won the ImageNet competition in 2012 [83], its popularity increased rapidly. Nowadays, even greater challenges arose, and with sufficient available computational resources, convolutional neural networks represent a revolutionary approach (Figure 14). Moreover, an impressive result was achieved by the You Only Look Once (YOLO) algorithm [84], which is able to segment objects in image and sort them to 9000 categories in real time.

Unfortunately, just a few articles related to wood inspection were published. Two of them were researched [13, 14], and both have an outstanding outcome. In [13], 4 basic CNN architectures were described: AlexNet, VGG-16, GoogLeNet, and ResNet-50. These architectures were consequently tested and compared to each other. Expectedly, the newest GoogLeNet provided the best performance but is computationally the most expensive in comparison to other tested methods. Furthermore, Shustrov [13] was able to process one board patch in 0.02 s (whole board of 25 patches in 0.47 s), but his experiments were performed on the server with two NVIDIA GeForce GTX TITAN Black GPU, Intel Xeon CPU E5-2680, and 128 gigabytes of random access memory.

The convolutional neural network was highly successful in the ImageNet competition in 2012 and proved to be very effective on the extended dataset. CNN shifted focus of researchers to the model comprising feature extractors and classifiers. Unfortunately, despite the significant amount of papers, dedicated to identification and localization of defects on various materials, these algorithms were tested on wood only scarcely.

The most promising results were presented in the master’s thesis of Dmitrii Shustrov, who managed to extend his dataset from 1115 to 242938 images. It was achieved by splitting high-resolution pictures to low-resolution pictures. Even though only 3 wood categories were present in the dataset, they had hardly noticeable differences in wood structure, which could not be processed by human inspectors. This research presents performance comparison of popular CNN models AlexNet, VGG-16, GoogLeNet, and ResNet-50. AlexNet managed to reach accuracy of 98.9% in the first evaluation in comparison to GoogLeNet 99.4% in the second.

Carpentier et al. [14] managed to achieve promising results by differentiating 23 tree categories according to bark images. A training dataset was used for testing of ResNet18, ResNet34 CNN Architectures. The influence of various batch sizes (8, 16, 32, and 64) was examined, and its influence on system accuracy was described. Results varied across different settings, so it is difficult to decisively select one solution. The best achieved accuracy (97.81%) was further improved by 3.93% by deploying the “divide and conquer” method (described in Region Split).

CNNs are highly dependent on the hardware and its resources. Long training processes require a high amount of random access memory (RAM), and computation itself should be carried out by GPU. Therefore, CNNs can be only used on High-End Desktops (HEDT) or high-performance embedded systems.

13.1. Region Split

Positive accuracy enhancement was in multiple articles [10, 13, 14, 20] accomplished by using the “divide and conquer” method. A whole image is split into multiple parts, and each subimage is evaluated individually. The most recognized category then represents the resulting one according to majority voting (Figure 15).

Some publications [10, 13] explored the dependency test between the number of patches and overall accuracy of the system. According to graphs in Figure 16, it is obvious that, for the majority of methods, image splitting is beneficial, but once a certain point of recognition rate is exceeded, it is no longer dependent on the number of patches.

Advantageous properties of majority voting fusion logic such as robustness or invariance were confirmed in a study by [85] and experiments carried out in a manuscript by [86].

14. Discussion

In previous chapters, comparisons of late or recent algorithms and methods for classification of either wood species or even defects in wood itself were presented. The trend is shifting from manually “hard coded” algorithms to a much more modern artificial intelligence approach, especially to neural networks, which offer at least the same or even significantly better performance. The deep convolutional neural network is nowadays capable of achieving record-breaking results on highly challenging datasets while using purely supervised learning [83].

Improvement to deep neural networks, the mechanics behind the individual layers [87] or deployment of deep residual learning [88], will as well lead to significant improvements to performance of wood species classification and error detection. Evaluation of networks of increasing depth based on architecture with very small () convolution filters shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers [89]. The actual problem in deployment of such deep neural network lies in actual datasets. As mentioned in previous chapters, wood datasets usually contain around 2000 images; however, the dataset for training of a deep neural network can consist of more than 60000 images [90].

Other ways to work around this “gap” is to switch the role of passive learning algorithms, where learning is done by a computer itself and focuses on the role of a teacher. While machine learning focuses on creating new algorithms and improving the accuracy of “learners,” the machine teaching discipline focuses on the efficacy of the “teachers” [91]. The research carried out by Microsoft can improve the performance of wood error detection and classification by teaching a neural network based on new breakthrough methods.

Neural network complexity reduction along with optimization [92] seems to be a very perspective approach as well. Nowadays, the mobile device market is on the rise and developers are pushing for compact architectures and effective algorithms. Numerous economical models were introduced in recent years, among them MobileNet [93], ShuffleNet [94], and ANTNets [95], which gained great popularity. In comparison to the best performing architectures, mobile-oriented models use more than 10 times less parameters, while offering slightly worse performance than the original ones.

To find correct learning data, multiple images are fed into CNN-Recurrent Neural Networks (CNN-RNN). Some of the valuable features can be acquired with X-ray [96], mechanical or chemical principle-based wood testers, or even by a 3D camera [97]. In the case of evaluation of “image data,” the most valuable source can be found in X-ray wood probing as well as in 3D scanning of the wood. This can also be used as a reliable difference between different types of knots, which were described in chapters before.

The fusion of multiple different data sources has a significant impact on the performance of neural networks or other machine learning algorithms. This method, also called multimodal deep learning, shows an improvement in comparison to the standard deep learning and machine learning implementations [98, 99]. Since the multimodal deep learning depends on the input data of different sensors, the normalization of data is practically a necessary step. Normalization can be either manual or based on self-normalizing neural networks [100, 101]. This process is often used for multilateral prediction of moving objects and can be used for wood species classification and error detection, as long as there is a reliable model for data fusion [102]. An example of the proposed method can be seen in Figure 17.

The proposed model can then be pushed to a higher-level hierarchy by decentralization of the system, at the cost of issues with spatial and temporal alignments of the information [103]. Solving this issue might lead to a mesh system type, which can work with different information sources, and provides a reliable classifier of defects or wood types. However, the framework for wood classification is still not standardized, and formalizations with comparison of multiple DL architectures are not yet done. Also, comparing the naive accuracy of different models is not sufficient to declare that the certain approach is a strict and precise solution [104].

As different types of convolutional neural networks are efficient in various tasks, architecture fusion or combination is advantageous in specific use cases. For instance, autoencoder-based segmentation of the faulty area in pixel precise localization could be applied only to a specific selected area, premarked by more suitable object detection models like YOLO or Faster R-CNN. Moreover, fusion of CNN with nonconvolutional types of networks could be beneficial as well. One example is systems combined with noncamera-based sensors. Even though none of the researched wood-related articles covered such solution, the methodology was successfully applied in other fields like mobile traffic classification [104] or gearbox fault diagnosis [105].

15. Conclusions

Multiple industrial sectors are dependent on reliable wood type classification, as it provides analytical information regarding characteristics and features of manufactured products (mechanical properties, value, etc.). This approach is typical in the furniture industry or wood panel production. This research summarizes worldwide efforts in wood recognition and quality inspection systems.

Nowadays, the analysis is mostly performed by trained humans. In addition to being slow, it also has a nonuniform accuracy, as the experience and attention of workers can vary significantly. Automatization of wood classification is crucial for further expansion of the furniture industry and wood panel production. Different kinds of wood have different quality aspects, properties, or value. Correct wood type classification is critical, as it influences price and features of final products. For example, in wood panel production, the quantity of used glue is directly influenced by the used wood type, as the manufacturer has to guarantee prescribed mechanical properties. However, the amount of glue also influences the final price and has impact on the environment. Another important is the paper industry. The wood type influences the quantity of cellulose in manufactured paper, thus also influencing its quality.

Human visual inspection is often slow and influenced by workers’ fatigue. On the other hand, chemical tests are expensive and can be only carried on a small sample. Other than these two methods, it is impossible to identify the wood type from the wood emitted spectrum. Unfortunately, the interpretation of the wood fluorescence spectrum is not an easily achievable task. In the case of wood type identification, even a small difference in the unique set of spectral peaks is meaningful. Therefore, human identification of wood types is not accurate or repeatable. Nowadays, a large quantity of recycled wood is used as a basic material in the wood panel industry. Classification of a large quantity of chopped or mixed wood slices is not suitable for human assessment of material quality. A custom system based on accurate automatic identification of continuous flow of input data from the feeding line of the manufacturing plant is basically necessary. That is why it is crucial to pursue new sophisticated approaches in the field of wood quality classification.

Interest in research and development of automatic wood defect detection or classification methods is rapidly increasing. Investments in this field are significate, mainly in wood-rich countries, such as Scandinavia. Nowadays, trained experts are still used for wooden plank inspections (including freckle, bark, pitch pockets, wane, split stain, or various knot types) or wood quality assessment. Long-term wood inspection might lead to eye fatigue, leading to low efficiency and accuracy. New imaging techniques, wood image databases, and sophisticated computer vision techniques have made automatic wood inspections an approachable goal, offering performance exceeding human workers.

Many techniques are currently developed and expanded; however, many challenges are still present and remain to be solved.

Conflicts of Interest

The authors declare that there is no conflict of interest regarding the publication of this paper.

Acknowledgments

This article was supported by the Ministry of Education of the Czech Republic (Project No. SP2020/151). This work was supported by the European Regional Development Fund in A Research Platform focused on Industry 4.0 and Robotics in Ostrava project, CZ.02.1.01/0.0/0.0/17_049/0008425 within the Operational Programme Research, Development and Education.