Abstract

The texture image decomposition of porcelain fragments based on convolutional neural network is a functional algorithm based on energy minimization. It maps the image to a suitable space and can effectively decompose the image structure, texture, and noise. This paper conducts a systematic research on image decomposition based on variational method and compressed sensing reconstruction of convolutional neural network. This paper uses the layered variational image decomposition method to decompose the image into structural components and texture components and uses a compressed sensing algorithm based on hybrid basis to reconstruct the structure and texture components with large data. In compressed sensing, to further increase each feature component, the sparseness of tight framework wavelet-based shearlet transform is constructed and combined with wave atoms as a joint sparse dictionary big data. Under the condition of the same sampling rate, this algorithm can retain more image texture details and big data than the algorithm. The production of big data that meets the characteristics of the background text is actually an image-based normalization method. This method is not very sensitive to the relative position, density, spacing, and thickness of the text. A super-resolution model for certain texture features can improve the restoration effect of such texture images. And the dataset extracted by the classification method used in this paper accounts for 20% of the total dataset, and at the same time, the PSNR value of 0.1 is improved on average. Therefore, taking into account the requirements for future big data experimental training, this article mainly uses jpg/csv two standardized database datasets after segmentation. This dataset minimizes the difference between the same type of base text in the same period to lay the foundation for good big data recognition in the future.

1. Introduction

Since entering the 21st century, with the increasing progress of the country’s big data, Internet+, and intelligent manufacturing technology, various technologies in the ceramic light industry have been vigorously developed, and the ancient ceramic industry is no exception [1]. Ming and Qing blue and white porcelains have always been the mainstream varieties of ancient porcelain. Studying this period of history and investigating blue and white porcelain samples from this period can obtain a lot of historical information and ancient ceramic technology information [2]. The development of the domestic national economy also means that people’s spiritual civilization market will become more and more active, and the soaring market price of blue and white porcelain in the Ming and Qing Dynasties has repeatedly verified this view [3]. But the active market also brings a series of problems; for example, consumers suffer from price fraud due to the lack of background knowledge of ancient ceramics [4]. Therefore, integrating modern science and technology to develop an automatic portable device for the general public that can quickly identify ancient ceramics and perform basic evaluation is not only a hot spot of public concern but also a research topic in the industry [5].

With the rapid development of image processing technology, people have higher and higher requirements for the resolution of big data images. When the image is acquired, due to the interference of surrounding environmental factors, as well as the inherent sampling frequency of the sensor, defocus, atmospheric turbulence, noise, and so on, the image becomes blurred [6]. In applications with higher resolution requirements, in order to obtain clearer big data images, the most direct way is to increase the density of the acquisition unit of CCD or CMOS sensors, but high-precision optical equipment is not only very expensive [7]. The hardware production technology has reached the bottleneck; it is more difficult to improve, the resolution improvement effect is not obvious, and the volume of the sensor may also become larger [8]. The hardware improvement method has great limitations in both technology and cost [9]. In addition, in the process of image transmission and storage, factors such as undersampling and noise introduction may also cause the image resolution to drop. These not only reduce the subjective perception of the human eye but also cause the inconvenience of subsequent processing, which cannot meet people’s needs [10].

This paper mainly studies the coupled deep autoencoder and gives an improved convolutional neural network structure. In the design of the codec network structure, jump connections are added to ensure the depth of the encoder network; in the design of the mapping layer network structure, the NIN (Network in Network) structure is added to enhance the mapping ability; in order to reduce the training time, the residual image is used as output instead of high-resolution images. This paper adopts the pretraining method. First, the low-resolution codec network and the residual codec network are trained separately by using high- and low-resolution image block sample pairs, so that the input and output are both low-resolution images and residual images. Then, we use the symbols to train the mapping layer network so that it can accurately map the symbols of the low-resolution image to the symbols of the residual image. Then, the pretrained low-resolution codec network’s coding part, mapping layer, and residual codec network’s decoding part are combined into a complete super-resolution network. Finally, the coefficients are fine-tuned, and the output residual image is added to the low-resolution image after bicubic interpolation to obtain an improved convolutional neural network super-resolution image reconstruction network.

The rest of the article is organized as follows. In Section 2, the research background and significance of super-resolution image reconstruction methods are introduced. In Section 3, the super-resolution methods are classified, various typical super-resolution algorithms are listed, and their advantages and disadvantages are analyzed. In Section 4, application and analysis of porcelain fragment texture image classification model based on convolutional neural network is introduced. In Section 5, the research content and chapter arrangement of this article are summarized.

Rika et al. [11] used SIMCA software and pattern recognition methods on ancient ceramics in ancient Mesopotamia to perform a focused analysis on the 24 elements of the identified samples, thereby deducing the origin of these ancient ceramic samples. In the early stage of domestic research, Biasotti et al. [12] discussed several types of famous white porcelains in the south and north of Xing kiln, Gongxian, Ding kiln, Jingdezhen (the predecessor of Jingdezhen blue and white porcelain was white porcelain), Dehua, and so on. They analyze the data of white porcelain in past dynasties and the use of computer pattern recognition methods for processing, so as to find the “target area” of various types of white porcelain kiln body and glaze. They also used the inverse mapping method to obtain some inverse mapping exhibition points with similar sample points and performed simulations in the real-face room environment based on the process of the inverse mapping points and finally obtained relatively consistent effects with ancient white porcelain. This method of selecting process parameters for the process of imitating ancient ceramics provides a new and effective idea. Huang and Guan [13] used an expert system to appraise blue and white porcelain in the Ming and Qing Dynasties. This research method has entered the scope of modern appraisal thinking. The method described in this article is that the program asks the user by manually setting up questions (including the glaze, texture, shape, and base of ancient ceramics), and the user can answer the answer based on visual inspection and touch, and the result is sent to the simulation expert system, and then the program continues to use the set of reasoning knowledge that adopts the expert’s thinking to organize and finally gives the authenticity and age determination results of the blue and white porcelain. From the methodological point of view of this research, it has not directly mentioned the use of the image itself as the object of recognition.

He et al. [14] have also done some work in this regard. They used image slicing to write a program to realize the feature analysis of ceramic shape in the Delphi programming environment. This program can scan ancient ceramic images. And then, we perform preprocessing (such as bmp storage, decontamination, and true size restoration) and analyze and extract image data (using algorithms to process slices of equal height) and finally get the height, maximum diameter, bottom diameter, and bottle mouth diameter of the ancient ceramics and other data. These data are stored in the database together with the material data, which is convenient for the verifier to query and analyze quickly, which is also conducive to the realization of more efficient and accurate dating. In the same year, Aoulalay et al. [15] applied statistical learning theory and support vector machine methods to determine the origin of ancient ceramics when studying the processing of limited sample sets of trace elements and achieved good results. Wang et al. [16] used the main chemical composition of the porcelain body as the classification basis and proposed the application of the gray relational analysis method to the classification of porcelain and gave the classification effect of these two types of porcelain. In addition, research on ancient ceramics network retrieval has also appeared in the literature of some scholars in recent years. They respectively used ancient ceramics image retrieval based on text and content, or processed first to obtain basic image features, and then were more similar to find information about ancient ceramics quickly.

What has emerged in my country at this stage is mainly the comprehensive application and research of the CNN method for Chinese characters. For example, some scholars use the principal component analysis method (PCA) and CNN together. This method is to first use PCA to reconstruct the image, then use CNN to train the character feature image, and finally get a good result (the error rate is controlled within 10%) [17]. Some researchers use an improved LCNN for handwritten Chinese characters. This method proposes a category accumulation method, which partially solves the problem of slow convergence of large deep networks [18]. In the past two years, another major method in the scope of deep learning, recurrent neural networks (RNN), has developed very rapidly. This situation has also appeared in the improved version of RNN. The RNN (or LSTM) + CNN model is gradually being used in character recognition applications. The seamless combination of the two can greatly improve the detection accuracy. The purpose of using CNN is to extract deep features, while RNN is used to classify the feature recognition of character sequences, its classification performance is better than that of native CNN [19]. In addition, the recent migration learning method is also an effective measure to transform the traditional CNN model and training. This method is used for theoretical analysis, algorithm design, and many experiments [2024].

3. Construction of a Classification Model of Porcelain Fragment Texture Images Based on Convolutional Neural Networks

3.1. Convolutional Neural Network Hierarchy

Among the learning-based methods, the convolutional neural network is a typical deep learning model. Due to its shared structure of local receptive fields and weights, the model has strong learning ability and does not require artificial prior knowledge [2527]. Figure 1 shows hierarchical distribution of convolutional neural network. The network weights can be synchronized. Its existence makes the entire operation of the neuron into a nonlinear operation, thereby increasing the expressive ability of the neural network. In order to further increase the ability of the neural network to fit other functions, the general neuron also has a bias value. Advantages such as optimization are very suitable for processing images.

Artificial neural networks (ANN) is abbreviated as neural network. Since its development, it has experienced the stages of M-P model, Hebbian learning, perceptron algorithm, BP algorithm, and deep learning [28, 29]. The structure of the neural network mainly includes three parts: neurons, connection weights, and propagation functions:

Among them, x is the weight on the corresponding connection of the input, which represents the activation function. A single neuron obtains the output after the weighted summation of all inputs and the activation function. At present, the most commonly used activation functions are Sigmoid, ReLU, tanh, and so on:

The activation function is generally differentiable, and its existence makes the entire operation of the neuron into a nonlinear operation, thereby increasing the expressive ability of the neural network. In order to further increase the ability of the neural network to fit other functions, the general neuron also has a bias value b, and the propagation function becomes

A neural network is composed of multiple neuron connections, but the connections between neurons are not random, but they imitate the connection of human brain nerve cells. The multilayer perceptron (MLP) is a simple neural network with three layers. Data flows in from the input layer, passes through the hidden layer, and finally flows out from the output layer:

Each connection between neurons represents the corresponding weight. The propagation function of a single neuron is as described above. The training process of CDA can be divided into four stages. The tasks of the four stages are LRAE training, HRAE training, mapping layer training, and entire network training. The neurons in each layer are only connected to the neurons in the previous and subsequent layers. There is no connection, and there are generally no structures such as rings in the neural network:

There are connections between any neuron in the figure and all neurons in the previous and subsequent layers, which is also a fully connected network.

3.2. Image Classification Processing Algorithm

Ideally, the image semantic segmentation model should generate the probability distribution of each pixel in the image. When the input image passes through the convolutional AlexNet network layer, only rough features can be extracted.

The purpose of image semantic segmentation is to interpolate these rough features and reconstruct the pixel classification probability distribution for each pixel in the input image. Figure 2 shows image classification processing flow. This can be achieved through a deconvolution layer, which performs the inverse operation of convolution and regards the input feature map as the output of the convolution layer and finds the feature map that produces the output through the deconvolution layer. Each connection between neurons represents the corresponding weight. The propagation function of a single neuron is as described above. The neurons in each layer are only connected to the neurons in the previous and subsequent layers. There is no connection, and there is generally no structure such as a ring in the neural network. There are connections between any neuron in the figure and all the neurons in the previous layer and the next layer, which is also a fully connected network. The convolutional layer is the most basic layer of CNNs. Its purpose is to convolve the input image with a learnable filter to extract features:

Each filter can generate a feature map. When CNNs are used for RGB images, the image is a 3D matrix, which is true for each layer. Each filter is convolved with the corresponding part of the input image and slides across the entire image (where and are the spatial dimensions and i is the number of channels):

Convolution refers to the item-by-item multiplication of the neuron in each filter and the value corresponding to the input layer. Therefore, if the input layer in CNNs is an image, a two-dimensional output can be obtained by convolving the image with each filter of each layer. The size of the output is affected by the step size and padding parameters. The output is expressed as a feature map or activation map:

In each convolutional layer of CNNs, N filters are used, and each filter generates a feature map to stack these feature maps together to get the output of the convolutional layer:

The input image (usually nonoverlapping) is divided into different subregions, and each subregion is downsampled by a nonlinear pooling function. The most commonly used of these functions are the maximum and average pooling functions. Maximum pooling will return the maximum value from each subregion, and average pooling will return the average value of the subregions. Pooling obtains robustness for the network by reducing the number of inconsistent translation features. In addition, pooling removes unnecessary and redundant features, reduces the computational cost of the network, and improves the efficiency of the network. A single neuron in a filter can be mapped to connected neurons in all previous layers. This is called the effective receptive field of the neuron. It is easy to see that convolution leads to local connections between neurons in the lower layer (closer to the input layer) and neurons with smaller receptive fields than the connections between higher-level neurons and neurons with smaller receptive fields. Figure 3 shows comparison histogram of algorithm index evaluation. Lower layers learn to characterize small areas of the input, while higher layers learn more specific semantics because they respond to larger subdivisions of the input image.

In this paper, deep features are used for multi-image segmentation, and the high-level semantic features learned by deep networks are used to improve the accuracy and segmentation consistency of multi-image segmentation. The three algorithms are the super-resolution reconstruction algorithm based on interpolation, the super-resolution reconstruction algorithm based on reconstruction, and the algorithm of this article. First, we improve the PSPNet-50 network model by fusing the high-resolution details of the shallow network to reduce the impact of the loss of spatial information as the network deepens on the segmentation edge details and then merge a small amount of segmentation priors into the new model, the segmentation ambiguity of foreground/background and the segmentation of multiview images are resolved through relearning of the network, and finally the edge effect of the segmentation target is further optimized by constructing a fully connected conditional random field model.

3.3. Optimization of Network Model Weights

By adjusting the value of learning parameters such as weight filter and bias, the class output obtained by the network can be optimized. The uncertainty in the process of determining which set of learning parameters can be quantified by a loss function can be expressed as a loss function optimization problem. Weight sharing has another advantage; that is, it can greatly reduce the number of weight parameters that need to be trained. It is a one-dimensional example of feature map weight sharing. In the k layer, each unit has a receptive field of 3 units, and this receptive field is convolved by a filter of size 3. The filter is shared across cells in the feature map, so connections of the same color in the map have the same weight. We use the CNN model to set the parameters; that is, we use n_in neurons in the input layer and n_out neurons in the output layer. There are m training samples, where x is the input vector, the feature dimension is n_in, and y is the output vector, and the feature dimension is n_out, plus some hidden layers containing several neurons. Figure 4 shows the process of network model weight optimization processing. It can be seen that in this example, only 3 weight parameters need to be trained after the weights are shared, and if the weights are not shared, the relevant theoretical basis needs to train 9 weight parameters, which greatly reduces the number of weight parameters that need to be trained.

Optimization algorithms are usually used to find the direction of gradient descent. Simply applying the gradient descent method to the deep network may not work well, because it faces the problem of falling into a local optimal solution. This situation can be corrected by adding momentum parameters to help update the gradient descent, which is required to achieve an optimal point. If the image contains multiple objects, the desired result cannot be obtained because a classification network model like AlexNet is trained with more than 1 million images containing only one object. In addition, the classification network cannot obtain the location information of the target in the image, because they have never been trained in this aspect. In order to adjust the parameters of the two autoencoders and the mapping layer, the CDA algorithm first trains LRAE and RAE at the same time and then trains the mapping layer based on the internal representation of the obtained high and low resolutions. The process so far can be regarded as a parameter initialization. Instead of estimating a single probability distribution for the entire image, it divides the image into several blocks, and each block is assigned its own probability distribution. The gray value starts from 0 to indicate different types of pixels, and RGB index colors are added for easy observation.

4. Application and Analysis of Porcelain Fragment Texture Image Classification Model Based on Convolutional Neural Network

4.1. Texture Image Preprocessing

Ancient ceramics in the Ming and Qing Dynasties, especially the year number of the base paragraph on blue and white porcelain, is an important basis for determining the authenticity of the porcelain in this article. The judgment of the year number is mainly based on the characteristics of the content, method, and position of the inscription. As far as machine learning learns from human visual recognition, it is reliable and convenient to extract features from calligraphic images of high-level characters. Figure 5 shows schematic diagram of image training of convolutional neural network model.

The algorithm in this chapter is implemented based on Matlab 2014b. The image block size is set to 7 × 7, the number of similar blocks is 60, and the search window size of similar blocks is 30 × 30. During processing, in order to reduce the amount of calculation, the interval between adjacent blocks is 2 pixels. p in the nonconvex low-rank model is set to 0.2. When solving subproblem, the maximum number of iterations is set to 30, and the iteration step is 1. In this chapter, the reconstruction result of the WSGSR algorithm is used as a comparison. The sparse representation uses a 3-layer wavelet transform, the observation matrix uses a Gaussian matrix, and the iterative shrinkage threshold algorithm is used for reconstruction. Regarding algorithm evaluation, this article uses both subjective visual effects and objective evaluation parameters to evaluate the reconstructed image. The combination of subjective and objective methods can compare the performance of algorithms more comprehensively and fairly. The model effectively improves the expression ability of the model. The residual learning method also avoids learning the redundant part of the model’s multi-sample data, which improves the efficiency of learning. Finally, the pretraining method initializes the network coefficients to more appropriate values. The objective evaluation parameters are PSNR and SSIM, which are commonly used in the compressed sensing field.

4.2. Neural Network Feature Extraction

In order to adapt to the data requirements of different schemes, the basic text data format used in this article is mainly input in two ways (including reading in jpg format and converting jpg to csv). In addition, in order to obtain a larger amount of sample data, before training in this article, the experimental method is to first carry out various types of data enhancement and then use four different CNN methods to train the model. Figure 6 shows image texture sensitivity versus pixel point dependence curve. For the final implementation of the migration learning stage, more data enhancement is not necessary.

Before the model is generated, because the input ancient ceramic base text image and the ImageNet image size are different, the experiment needs to remove the fully connected layer in the back of the model in the VGG16 and only load the front multilayer inside the model during training. There are two specific methods for constructing the model in training. One is to directly download the No-Top version provided by the model, and the other is to set up their own custom weight parameters layer by layer to train it again. In this scheme, after the model is trained again, it will get a multilayer fully connected layer with more classification ability after training. Its model can meet the requirements of the nine types of Ming and Qing basic texts scheduled to be output in this article.

Figure 7 shows comparison of model accuracy before and after algorithm application. It can be seen that as the number of sample groups increases, the accuracy of the algorithm shows a decreasing trend. Compared with the model before optimization, the accuracy of the optimized model shows excellent accuracy in terms of the objective evaluation parameters of the reconstructed image. The improvement range of PSNR is about 2.5 dB–10.6 dB, and the improvement range of SSIM value is about 0.05–0.3. There are obvious noise and artifacts in the reconstruction result of the WSGSR algorithm. In comparison, the compressed sensing reconstruction algorithm based on the shape-adaptive nonconvex low-rank model proposed in this chapter is ideal for the smoother image and the image with rich texture details used in the experiment. It can be seen from the results that when the sampling rate is low, the reconstruction result of the proposed method still has a certain gap with the original image, and there is a small amount of artificial effects in the reconstructed image, but it is better than the reconstruction result of the WSGSR algorithm as a whole.

4.3. Example Application and Analysis

With the help of Tensorflow tool, this article can quickly establish this sequence layer-by-layer model structure, which is very conducive to the shutdown and startup of training in subsequent projects. After the program model structure design is completed, this article can also introduce other development tool libraries. At the end of the program training, we output the structure diagram of the model and save it for future use. XgBoost is a very good model for structured data learning. Figure 8 shows matchstick graph of image smoothness varying with pixel points. The experiment chose it as the baseline algorithm to compare the performance of the deep learning CNN model in terms of image learning performance.

The quality of the reconstructed image gradually improves as the number of iterations increases. Especially when the number of iterations is small, the improvement is very rapid. When the number of iterations is greater than 40, the change in visual effects is not obvious, which is also consistent with the changes in objective parameters. Figure 9 shows two-dimensional scatter distribution of image training deviation under convolutional neural network. Considering the number of iterations and time complexity required for different test images to achieve the optimal reconstruction effect, this chapter sets the maximum number of iterations of the algorithm to 50.

In order to achieve the diversity of data, the experiment uses the image generator to generate a unified batch size format character image dataset. Because it supports real-time data upgrade, the function will be generated infinitely during training, a large amount of data until the specified number of training sessions is reached. Considering the experimental effect and time complexity, the CNN model designed in this paper is composed of 4 layers of convolutional layer and ReLU activation function. The size of the convolution kernel is 11 × 11, 9 × 9, 7 × 7, and 3 × 3, and the number of filters is 128, 32, 16, and 1, respectively. The input image block size for network training is 39 × 39, and the label is the corresponding clear image block; the size is 13 × 13.

Figure 10 shows statistical distribution of different test groups after neural network training. It can be seen that the traditional CNN method takes advantage of the characteristics of the saturated area image and uses singular values to implement the kernel decomposition operation to train the traditional CNN model and obtains a better deblurring effect; the PSNR value of this method is higher than other methods, but the PSNR value of the CNN method is lower than that of the CNN method, indicating that the CNN method is also suitable for deblurring images with saturated regions. It can be seen that neither the MLP method nor the traditional CNN method takes into account the high-frequency information of the image, resulting in low PSNR values.

5. Conclusion

Aiming at the compressed sensing reconstructed image, this paper constructs a quality improvement method based on convolutional neural network to quickly improve the subjective and objective quality of the reconstructed image. The constructed image quality improvement network takes the reconstruction result of the traditional compressed sensing algorithm as input, and the output is the image with enhanced quality. First, three sets of convolution kernels of different sizes are used to extract the features of the input image at different scales and combine to realize the mining and fusion of the multiscale information of the reconstructed image; secondly, the extracted multiscale features are transformed by a multilayer convolutional neural network. We use the neural network to reconstruct the feature information with multiscale and multilevel characteristics to obtain the residual estimated image; finally, the initial reconstructed image is added to the estimated residual image to obtain the final processing result. Experimental results on commonly used test images show that the proposed quality improvement neural network can improve the quality of compressed sensing reconstructed images. Compared with the random initialization in ordinary training, the network coefficients after pretraining have been fine-tuned to obtain better final values, and the results of super-resolution reconstruction have been significantly improved. While suppressing the noise and artifacts in the reconstructed image, it restores part of the image detail information, and the proposed compressed sensing reconstructed image quality improvement method takes a short time in the test phase. The problem of saliency detection is one of the most basic and key research directions in the field of image research. This article introduces the development process of two saliency detection and some of the more influential models and finds that the current trend of gradual improvement is mainly due to contrast feature factors. With the success of convolutional neural networks in many other computer vision tasks, the extraction of feature factors has more room for exploration. Finally, this article also looks forward to the unfinished tasks in the industrial and academic directions in the future research of ancient ceramics. Later work can increase the amount of data through cropping, rotation, and so on. In terms of network model design, the Siamese Convolutional Neural Network is used to fully extract the global and local features of the image.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The study was supported by Pingdingshan University.