Abstract

With the increasing expansion of virtual reality application fields and the complexity of application content, the demand for real-time rendering of realistic graphics has increased sharply. This research mainly discusses the intelligent mosaic method of virtual reality Lingnan cultural heritage panorama based on automatic machine learning. In order to effectively make up for the impact of the insufficiency of the collection process on the quality of the final panoramic image of Lingnan cultural heritage, it is necessary to minimize the irregular rotation of the camera and collect images according to the overlapping area between adjacent images of appropriate size. In order to make Lingnan cultural heritage panoramic images have better visual effects, it is necessary to preprocess the images before image registration and fusion. Image preprocessing mainly includes image denoising and image projection transformation. In this study, cylindrical projection is used to construct the panorama of Lingnan cultural heritage. For each Lingnan cultural heritage training image, we first perform image segmentation to obtain multiple regions and extract the visual features of each region. We use automatic machine learning models to train the visual feature set and use the bagging method to generate different training subsets. In order to generate each component classifier, we determine the overlap area of the two images according to the matched SIFT feature points and determine the best stitching line during the implementation of stitching. In this paper, the number of pixels in the first row of the overlapping area is used to determine the candidate stitching line column, and the best stitching line position should be determined in consideration of the smallest color difference in the stitching area and the most similar texture on both sides. This article uses a Java Applet-based approach to realize virtual roaming of viewing panoramic images of Lingnan cultural heritage in IE browser. The highest accuracy of SIFT is 82.22%, and the lowest recognition time is 0.01 s. This research will promote the development of Lingnan cultural heritage.

1. Introduction

Image registration is the core step of image splicing, and its quality and efficiency are the key to determining the effect of image splicing. Although the commonly used image registration algorithms represented by the SIFT algorithm are relatively mature and widely used, they need to be further optimized and improved due to the large amount of calculation, the relatively long operation time, and the relatively general accuracy of the algorithm.

Lingnan culture refers to the culture of the Lingnan region of China, covering academic research, literature, painting, calligraphy, music, opera, crafts, architecture, gardens, folklore, religion, food, language, overseas Chinese culture, and many other contents. As far as image stitching technology is concerned, it is a complete process from image collection to stitching, but the current academic research is mostly aimed at a specific step in the process, and there is no standardized and efficient complete solution. Therefore, when working on the research of Lingnan cultural heritage panoramic display technology based on image mosaic, this paper needs to explore and give a reasonable solution by itself.

The emergence of panoramic stitching technology makes image-based VR technology more interactive. Virtual reality immersion (VRI) is an advanced computer-generated technology that reduces subjective reports of pain in procedural medical treatment. Vera et al. believe that VRI reduces brain activity related to pain as measured by functional magnetic resonance imaging. They randomly assigned 24 patients with chronic itching (16 patients with psoriasis vulgaris caused by dermatitis and 8 patients with chronic itching caused by psoriasis vulgaris) to an interactive computer game screen using special goggles or a computer. Before exposure, during exposure, and 10 minutes after exposure, self-assess the intensity of itching using a visual analog scale (ranging from 0 to 10). Although their research processes only selected twenty-four patients with chronic pruritus, the number of samples in the study was too small [1]. Bastug et al. believe that the success of immersive VR experiences depends on solving numerous challenges across multiple disciplines. They used storage/memory, fog/edge computing, computer vision, artificial intelligence, etc., for research. The main requirements of wireless interconnection VR are described, and then some key elements are introduced. In addition, although they have studied three VR case studies and provided numerical results under various storage, calculation, and network configurations, their research method is not logical [2]. Mental health issues are inseparable from the environment. Freeman et al. believe that, with virtual reality (VR) and computer-generated interactive environments, individuals can repeatedly experience their problematic situations and learn how to overcome difficulties through evidence-based psychotherapy. They conducted a systematic review of empirical research. 285 studies were identified, 86 of which involved evaluation, 45 theoretical developments, and 154 treatments. The main diseases studied are anxiety, schizophrenia, substance-related diseases, and eating disorders. Although many treatment methods have been identified in his research, their research methods are unreasonable [3]. Freeman et al. believe that the use of virtual reality can promote new learning. They evaluated the delusions and pain of 30 patients with compulsive delusions. Then, the patients were randomly assigned to virtual reality cognitive therapy, and both were performed in a hierarchical virtual reality social environment for 30 minutes. Although they reassessed their delusional beliefs and real-world troubles, their research experiment data are insufficient [4].

This research mainly discusses the intelligent mosaic method of virtual reality Lingnan cultural heritage panorama based on automatic machine learning. In order to effectively make up for the impact of the insufficiency of the collection process on the quality of the final panoramic image of Lingnan cultural heritage, it is necessary to minimize the irregular rotation of the camera and collect images according to the overlapping area between adjacent images of appropriate size. In order to make Lingnan cultural heritage panoramic images have better visual effects, image preprocessing mainly includes image denoising and image projection transformation. In this study, cylindrical projection is used to construct the panorama of Lingnan cultural heritage. This article is based on the number of pixels in the first row of the overlapping area to determine the candidate suture line column, and the best suture line position determination should consider the smallest color difference in the suture area and the most similar texture on both sides. This article uses a Java Applet-based approach to realize virtual roaming of viewing Lingnan cultural heritage panoramic images in IE browser.

2. Panorama of Lingnan Cultural Heritage

2.1. Virtual Reality

With the increasing expansion of virtual reality application fields and the complication of application content, especially the rapid development of network graphics technology in the past two years, the demand for real-time rendering of realistic graphics has increased sharply. Therefore, we have studied and adopted some efficient graphics rendering algorithms that can be applied to the existing general-purpose computer platforms to further accelerate the rendering of the modeling of complex scenes and solve the problem of rendering algorithms that are becoming more and more prominent between speed, quality, and scene complexity [5]. The correlation function between and can be expressed as [6]

In the formula, . The requirements for panning are [7]

Then, they correspond to the Fourier transform satisfying [8]

The phase difference between them can be expressed as the cross power spectrum phase [9]:

Repeat a series of filtered low-pass images and a series of filtered slow path images. The pixels of the level image are defined as follows [10]:

The level of the filtered area image is obtained by calculating the difference between the images after the Gaussian filtering process [11].

2.2. Automatic Machine Learning

On the one hand, automatic machine learning makes the application of machine learning easier. Users do not need to do a lot of relevant knowledge reserves as before, and they can get a well-performing machine learning solution; on the other hand, automatic machine learning can cover a large number of data processing and classification algorithms are beyond the reach of a domain expert or a scientific research team. This allows automatic machine learning to give full play to the potential of various algorithms for a certain task, rather than just relying on certain algorithms. In addition, for professionals, automatic machine learning can provide an excellent reference and can help them make in-depth creations on this basis. Automatic machine learning (auto ML) is a recently proposed concept, which refers to the automatic design of feature selection, feature transformation, and other preprocessing, classification, and recognition processes and automatic hyperparameter tuning. In addition to input data, no more manual intervention is required. In addition to the input data, manual intervention is no longer required, and machine learning of the input data can be realized. The general idea of this new concept is to use various intelligent search and optimization algorithms to replace humans to find specific data, processing, and identification algorithms and schemes suitable for experimental tasks. Once this concept was proposed, it caused a violent reaction in the field of machine learning, and the realization of this method may be able to surpass most experts in the application effect [12].

Automatic machine learning uses statistics and deep learning as well as relevant expert knowledge to automatically complete this time-consuming work. It also makes data mining that must be expert-driven easier. To a certain extent, automatic machine learning is called an intelligent tool of artificial intelligence, and users can use it to easily develop their own machine learning models. In the near future, we will see the emergence of more automatic machine learning platforms and will also see its further in-depth integration with business scenarios [13]. Auto-sklearn uses the SMAC tool based on random forest to optimize the hyperparameters of the algorithms in the data processing and classification algorithm library. Because the meta-learning method is used to quickly determine the learning algorithm framework, this greatly reduces the follow-up SMAC’s optimization space. In addition, Auto-sklearn not only chooses the best one of the models as the output model but also but automatically performs model integration, which can avoid discarding some models that perform relatively well. This automatic integration model makes the model more robust and less prone to overfitting, which has a great advantage in the robustness of the model [14].

The plane width is , and the camera focal length is f; then [15, 16],

Bilinear interpolation is calculated using 4 pixels adjacent to (x, y) [17]:

2.3. Image Stitching

Affected by factors such as the performance of the hardware equipment and the changes in the brightness of the light during the collection process, the collected images usually have a certain degree of noise. Because the existence of noise will affect the accuracy and quality of the image registration and fusion link in the later stage, how to effectively eliminate the image noise is a common problem at present. In the image projection part of image mosaic, the current commonly used cylindrical projection algorithm will produce obvious jagged image edge, which will seriously affect the image effect, and it will also increase the difficulty of image registration and fusion in the later stage. However, there is no effective method to eliminate edge jaggedness. Therefore, the whole process is divided into image acquisition, image matching, and image fusion [18, 19]. The image splicing process is shown in Figure 1.

The transformation model can be expressed in homogeneous coordinates as [20]

Among them [21, 22],where is the transformation matrix. The distance between any two pixels of the image after the rigid body transformation is the same as the original image [23].

The corresponding relationship before and after the nonlinear transformation is [24, 25]where is the point after nonlinear transformation. The SIFT chromaticity normalization algorithm is simple and easy to implement [26].

Let ; then [27],where r is the desired mapping function.

3. Intelligent Splicing Experiment of Panorama Lingnan Cultural Heritage

3.1. Image Acquisition

Image acquisition is the prerequisite for image stitching. There must be at least two Lingnan cultural heritage images for stitching, and image stitching for the purpose of generating a wide-view, high-resolution panoramic image of Lingnan cultural heritage requires several or even dozens of original images. In order to ensure the processing quality of the postimage registration process, it is necessary to ensure that there is a certain proportion of overlapping areas between adjacent images when collecting images. The device that collects the image can be a common device such as a common digital camera, a mobile phone camera, and a computer camera. Based on considerations for daily applications, the image acquisition device selected in this work is a handheld camera.

Considering the advantages and disadvantages of the above-mentioned shooting methods, this research adopts the method of handheld camera shooting. In order to effectively make up for the impact of the insufficiency of the collection process on the quality of the final panoramic image of Lingnan cultural heritage, it is necessary to minimize the irregular rotation of the camera and collect images according to the overlapping area between adjacent images of appropriate size.

3.2. Image Preprocessing

Due to the influence of factors such as equipment and scene illumination in the collection process, the collected image is not an ideal image. Generally, there will be certain noise and deformation, which is not suitable for direct splicing. Therefore, in order to make Lingnan cultural heritage panoramic images have better visual effects, it is necessary to preprocess the images before image registration and fusion.

3.2.1. Image Denoising

Image noise refers to the interference information formed by the limitations of the device itself and some irremovable influences from the outside when using cameras and other equipment to collect images. The presence of noise will affect the visual effect of the image and will also have a negative impact on the registration and fusion in the image stitching. Therefore, the image denoising must be performed before the stitching.

Among the currently commonly used methods of noise processing, methods such as Mean Filter, Median Filter, and Nonlocal Means (NLM) are more commonly used.

3.2.2. Image Projection

Each image coordinate system in the image sequence taken by the camera is different. Only when all the images in the sequence are transformed to a unified coordinate system, can the next image registration be performed. The process of transforming the images in the sequence to the unified coordinate system is the image projection transformation. Corresponding to different collection methods and scenes, the panoramic images of Lingnan cultural heritage obtained by different projections will be very different in construction methods and image quality. There are four types of commonly used image projection transformations, specifically planar projection, spherical projection, cube projection, and cylindrical projection.

Cylindrical projection is to project the image sequence onto the cylindrical surface of a cylinder, the observation point is at the center of the cylinder, and the radius of the cylinder is the focal length. Compared with the other three projections, although the viewing angle of the cylindrical projection in the up and down direction is less than 180°, the sky and the ground in most conventional scenes have less useful information that people pay attention to, and according to the current technology, the interpolation method is adopted. Good image effects can also be obtained after making up. At the same time, compared with the spherical projection, the cylindrical image can be expanded into a plane image, which provides convenience for the image access in the computer. In addition, cylindrical projection does not require high requirements for the image acquisition process, and the image character sequence acquired by the photographer rotating around a fixed axis in situ can be projected using the cylindrical model. Based on the analysis and comparison of commonly used projections, combined with the collection method of this research, this research uses cylindrical projection to construct the panorama of Lingnan cultural heritage.

3.3. Image Registration

As the core step of image stitching technology, the quality of image registration directly determines the quality of the panoramic images of Lingnan cultural heritage spliced together. Therefore, only by realizing the accurate registration of adjacent images in the image sequence can a good stitching effect be ensured.

3.3.1. Image Transformation

This study chooses to apply the affine transformation model for registration: if any straight line in the image is still a straight line after a certain geometric transformation, and any two parallel lines still maintain a parallel relationship under the geometric transformation, it is called the transformation. It is called affine transformation. Affine transformation is suitable for translation, rotation, scaling, and flipping (mirroring) motion.

3.3.2. Image Registration

Feature-based registration: the feature-based registration method has received widespread attention recently and has been studied by many researchers, forming the characteristics of a wide variety of algorithms and high efficiency.(1)Registration based on point features: feature points are often points with sudden changes in pixel values, which are different from other pixels and contain a lot of information, which can uniquely identify the location information of a certain place in the image. The idea of point feature-based registration is to identify feature points, describe each feature point through the detection of point features, and then match the feature points of different images through certain operation rules according to the description to complete the registration.(2)Registration based on edge features: edge features are used to describe the contours of scene objects in the image, and they are also very distinguishable. However, there is still no feasible method to reasonably store the edge information, so the algorithm is not widely used.(3)Registration based on regional features: simply put, a regional feature can be understood as an enlarged feature point, which is a collection of pixel neighborhoods. The regional feature is the overall presenting of a certain characteristic, which can uniquely identify the location characteristic of a certain place in the image. Regional feature-based registration is done by detecting the feature area in the image and marking the area, so as to perform comparison and pairing to complete the registration.

3.4. Image Fusion

(i)Pixel-level fusion: it means that the original image information does not need to go through image preprocessing, and the data association fusion process is directly carried out on the image data layer.(ii)Feature-level fusion: feature-level fusion first performs feature extraction on the collected original image information and then comprehensively analyzes and processes the extracted feature information.(iii)Decision-level fusion: decision-level fusion is the highest level of image fusion, which involves logical or statistical reasoning on multiple image information, including steps. One is the basic processing of each image, and then the decision-level fusion decision is made through association processing. The comparison of the three image fusions is shown in Table 1.

3.5. Multi-Image Panoramic Stitching Based on SIFT Algorithm
3.5.1. Chromaticity Standardization of Multiple Color Images

In this paper, the histogram of the three channels of the image is standardized, and the tones of the two images are converted into the same. Generally, the three-channel histogram of the image with the best visual effect among multiple images is used as the reference standard, and the three-channel histogram of other images is used as a reference standard. The channel histogram is adjusted to realize the unified processing of the chromaticity of multiple images.

3.5.2. Feature Point Matching

Feature point matching is actually to calculate the minimum and second minimum required Euclidean distance between two feature points, set the threshold, and compare the threshold with the ratio of the minimum distance to the second minimum distance. When the ratio is less than the threshold, the match is correct.

3.5.3. Image Feature Transfer

For each Lingnan cultural heritage training image, first perform image segmentation to obtain multiple regions and extract the visual features of each region. Use automatic machine learning models to train the visual feature sets and use the bagging method to generate different training subsets. To generate each component classifier, the final classification result is decided by simple majority voting, and finally the labeling model is obtained. Filter the noise area through image information entropy, and extract the features of each area after noise removal.

3.5.4. Determination of the Best Suture

According to the well-matched SIFT feature points, the overlap area of the two images is determined, and the best stitching line must be determined during the implementation of stitching. This article is based on the number of pixels in the first row of the overlapping area to determine the candidate suture line column, and the best suture line position determination should consider the smallest color difference in the stitching area and the most similar texture on both sides.

3.5.5. Panoramic Image Display

Java Applet is a subset of Java language, so it can be platform-independent. Users can directly watch it as long as they open it in a browser without downloading plug-ins. This method has several drawbacks. The display format is relatively small, the image resolution is low, and it occupies more system resources. The display speed is slower, and it is prone to beating when browsing. However, the Java Applet-based method played a vital role in the early development of Lingnan cultural heritage panoramic images, and it is still one of the most popular display methods. For Windows XP and Windows 2000 systems, when browsing the panoramic view of Lingnan cultural heritage displayed by Java Applet, Java virtual machine must be installed in advance. This article adopts this method to realize the virtual roaming of the panoramic images of Lingnan cultural heritage in IE browser. Put the Lingnan cultural heritage panoramic image file and the E page file in the same directory, and the compiled Java Class file is also put in the same directory. Embed the following code in the page to specify the .class file, the location of the image file, and the width and height of the observation window.

4. Results and Discussion

As the preprocessing of Lingnan cultural heritage panoramic image stitching, it is necessary to standardize and unify the chromaticity and brightness of Lingnan cultural architectural images under different lighting conditions. In order to verify the normalization algorithm proposed in this article, the experimental arrangement is as follows: select two images from different perspectives in the same scene, the leftmost is the reference image, and the middle is the image to be normalized. It can be seen that the lighting conditions of the two are significantly different, and if the input image is too exposed, it needs to be normalized to the tonal state of the reference image the third image in the figure is the output of the algorithm in this paper, and it is visually felt that the output image is basically close to the chroma state of the reference image. The different images generated are shown in Figure 2.

Then, show the normalization process of the histogram of the red, green, and blue (RGB) channels of the three pictures. Among them, the first line is the original RGB three-channel histogram of the input image, and the second line is the RGB three-channel histogram of the target reference image. The result of the output image after the histogram is normalized is shown in Figure 3.

The histogram similarity between the input image and the reference image R channel is shown in Table 2. It can be seen from Table 2 that the histogram similarity of the input image and the reference image R channel increased from 0.63 to 0.87; the histogram similarity of the G channel increased from 0.55 to 0.91; and the histogram similarity of the B channel increased from 0.50 to 0.86. The significant increase in the similarity of these histogram data proves the effectiveness of the histogram normalization algorithm proposed in this paper.

Matching time comparison: in order to compare the computational efficiency of the two matching strategies of BBF and KD-tree in calculating the matching feature points, this paper conducts running time statistics on the feature point matching process of the above three complex texture images. The experimental hardware environment is Intel® Core™ i7-4510U CPU @ 2.00 GHz, 8 GB RAM; 64-bit Windows 7 system, VC++ running environment under VS2010 development environment. The experimental results are shown in Table 3.

After the running environment is successfully configured, the stitching between the two images can be carried out. Based on the SIFT algorithm, the two images to be processed must be read and detected first. Of course, the two images must have overlapping image areas. For this, two groups of simple texture images were selected in this design. The image with simple texture is shown in Figure 4.

By using two different methods to perform feature matching on simple texture images and complex texture images, it can be found that there are many mismatches in the traditional distance ratio method for matching, and the RANSAC algorithm refines the matching points to avoid false matches. The appearance of dots provides good conditions for the subsequent image fusion. After finding the feature points of the two images, the next step is to match the feature points. This step is the most critical step in image stitching. In this design, we first use the advantages of the algorithm to match the feature points. The feature matching of the simple texture image obtained by the distance ratio method is shown in Figure 5.

The weighted average fusion algorithm is tried during image fusion, and the SIFT coefficients in the weighted average fusion are fused in a progressive manner; that is, the fusion ratio is set according to the weight distance of the overlapping part. This method can make the merged image have no seems like the above algorithm in the overlapping area and can achieve a smooth transition of the overlapping part. The result of image fusion with relatively simple texture is shown in Figure 6.

In order to analyze the effectiveness of the function forced correction algorithm more clearly, the SIFT spliced image is compared with the directly spliced image and the image spliced by the forced correction method on the curve change of the gray value. Extract the 200th line example from the two images, the gray values of the left and right sides of the stitching seam, and draw the gray value curve changes of the changed lines, as shown in Figure 7.

In the experiment, based on the five sets of data in the Corel image library, the DT algorithm is used to compare the classification accuracy and classification time with other machine learning algorithms such as SVM, NB, and KNN. Five discrete data sets are used for testing, and the classification accuracy and classification time results of various classification algorithms in the final experiment are shown in Table 4. The highest accuracy of SIFT is 82.22%, and the lowest recognition time is 0.01 s.

There are 5 different sets of data in Table 4: 180C_TCT_SP represents a total of 180 training samples, and the training samples include 8 classes (blue sky, bird, plane, firework, flower, grass, horse, mountain), each of which takes 20 training samples; images are trained using C, T, CT, and SP features. 225C.T_CT_SP represents a total of 225 training samples, including 9 classes in the training samples; each class takes 25 training samples; and the images are trained using C, T, CT, and SP features. 270CT_CT__SP represents a total of 270 training samples, including 9 classes in the training samples; each class takes 30 training samples; and the image is trained with C, T, CT, and SP features. 380C_TCT_SP represents a total of 380 training samples. The training samples include 19 classes, and each class has 20 training samples. The images are trained with C, T, CT, and SP features. The comparison of classification accuracy and classification time is shown in Figure 8.

In order to make the experimental results more clear, we averaged the running results of the first four sets of data and calculated the average classification accuracy (%) and average classification time (s) corresponding to each algorithm, as shown in Table 5.

The comparison of the average classification accuracy and average classification time of various algorithms is shown in Figure 9. It can be seen that the difference in the comparison result is also different.

5. Conclusion

With the continuous development of virtual reality technology, image-based graphics rendering technology has made great progress as the main technology of virtual reality technology, especially the key technology of graphics rendering. Lingnan cultural heritage panoramic view has become a research hotspot in recent years. It is widely used in various fields such as virtual reality and computer vision. This research mainly discusses the intelligent mosaic method of virtual reality Lingnan cultural heritage panorama based on automatic machine learning. In order to effectively make up for the impact of the insufficiency of the collection process on the quality of the final panoramic image of Lingnan cultural heritage, it is necessary to minimize the irregular rotation of the camera and collect images according to the overlapping area between adjacent images of appropriate size. In order to make Lingnan cultural heritage panoramic images have better visual effects, it is necessary to preprocess the images before image registration and fusion. The application of panoramic image splicing technology is very wide, and different application scenarios have different specific requirements for splicing. In the process of research and testing, this article stipulates the selection criteria of shooting scenes, which artificially limits the application scope of the system. If you want to have universal applicability to all environments, there are still many issues that need to be explored.

Data Availability

No data were used to support this study.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by “The Mixed Reality Applied in Digital Heritage Research of Lingnan Intangible Cultural Heritage” on the Project of Guangdong Philosophy and Social Sciences “The 13th Five-Year” Plan Subject Co-Construction Project in 2018 (GD18XYS13), “The Mixed Reality Applied in the Digital Heritage Research of Guangdong-Hong Kong-Macao Greater Bay Area Canton Intangible Cultural Heritage” on the Project of Guangzhou Philosophy and Social Sciences Development “The 13th Five-Year” Plan Subject Co-Construction Project in 2019 (2019GZGJ150), and “The Mixed Reality Applied in Digital Heritage Research of Lingnan Intangible Cultural Heritage” on the Project of Guangdong General College Key Scientific Research Platform and Research Project in 2018 (2018GWTSCX033).