Abstract

The tobacco in plateau mountains has the characteristics of fragmented planting, uneven growth, and mixed/interplanting of crops. It is difficult to extract effective features using an object-oriented image analysis method to accurately extract tobacco planting areas. To this end, the advantage of deep learning features self-learning is relied on in this paper. An accurate extraction method of tobacco planting areas based on a deep semantic segmentation model from the unmanned aerial vehicle (UAV) remote sensing images in plateau mountains is proposed in this paper. Firstly, the tobacco semantic segmentation dataset is established using Labelme. Four deep semantic segmentation models of DeeplabV3+, PSPNet, SegNet, and U-Net are used to train the sample data in the dataset. Among them, in order to reduce the model training time, the MobileNet series of lightweight networks are used to replace the original backbone networks of the four network models. Finally, the predictive images are semantically segmented by trained networks, and the mean Intersection over Union (mIoU) is used to evaluate the accuracy. The experimental results show that, using DeeplabV3+, PSPNet, SegNet, and U-Net to perform semantic segmentation on 71 scene prediction images, the mIoU obtained is 0.9436, 0.9118, 0.9392, and 0.9473, respectively, and the accuracy of semantic segmentation is high. The feasibility of the deep semantic segmentation method for extracting tobacco planting surface from UAV remote sensing images has been verified, and the research method can provide a reference for subsequent automatic extraction of tobacco planting areas.

1. Introduction

Tobacco is a crop of high economic value, which plays a significant role in the national financial accumulation and part of local economic development. In China, Yunnan Province is the main concentrated tobacco-producing area. In 2018, the total planted area and totaled output of tobacco accounted for 38.97% and 37.69% of the country. It is the largest tobacco production base in China [1]. But at the same time, the tobacco planting process has a high risk and is vulnerable to natural disasters and pests. Therefore, a timely grasp of tobacco spatial distribution, planting area, growth, yield and disaster loss, and other pieces of information is of great significance for achieving accurate tobacco management, accurate production estimation, and assisting government decision-making. Among them, the rapid and accurate extraction of the tobacco planting area is an important prerequisite for fine tobacco management.

Due to the wide area of tobacco planting and the large distribution range, manual surveys are less efficient and susceptible to errors caused by human factors [2]. The emergence and development of remote sensing technology have made up for the shortcomings of manual surveys, and remote sensing technology has become the main technical means for monitoring tobacco planting area. The technology and methods of using remote sensing images to monitor tobacco planting area have made great progress in the past ten years. Data sources range from medium and low spatial resolution satellite remote sensing images (such as Landsat, HJ-1) to high spatial resolution satellite remote sensing images (such as SPOT-5, China-Brazil Earth Resources Satellite 02B, ZY-1 02C, ZY-3) [37] and from an optical image to synthetic aperture radar [8]; a platform ranges from high-altitude satellite remote sensing to low-altitude UAV remote sensing [911]; monitoring methods range from statistical methods based on pixel features [5] to object-oriented methods [6, 7, 9]; monitoring content ranges from area to individual tobacco plant [12, 13].

Because Yunnan Province belongs to a low-latitude plateau area, tobacco planting area is small, spatial distribution is scattered, and tobacco and other crops are generally mixed or intercropped [3, 4]. Therefore, it is difficult to accurately extract planting area by using low and medium spatial resolution remote sensing images, and small plots are easy to miss detection. However, it is difficult to ensure that images can be obtained in specific regions and specific phenophase by using high spatial resolution satellite remote sensing images. UAV remote sensing has become the main means of tobacco planting area monitoring because of its flexible and high spatial resolution. Object-oriented image analysis and deep learning are the main classification methods of high spatial resolution remote sensing images. However, image segmentation and feature extraction restrict the development of object-oriented image analysis methods. At present, the deep semantic segmentation method has been widely used in the agricultural field and has achieved gratifying results. For example, a large-scale crop mapping method using multitemporal dual-polarization SAR data was proposed in the literature [14], and the U-Net was used to predict the different crop types. In literature [15], the convolutional neural network (CNN) and the Hough transform were used to detect crop rows in images taken by a UAV. Literature [16] used the deep learning framework TensorFlow to construct a platform for sampling, training, testing, and classifying to extract and map crop area based on DeeplabV3+. In conclusion, to realize the accurate extraction of tobacco planting area in the plateau of Yunnan Province, the planting area of tobacco was extracted accurately by using four deep semantic segmentation models, DeeplabV3+ [17], PSPNet [18], SegNet [19], and U-Net [20]. At the same time, to reduce the training cost, the MobileNet series [21] networks to replace the backbone networks of four deep networks.

2. Data and Methods

2.1. Overview of the Study Area

The study area is located in Xiyang County of Yi Nationality, Jinning District, Kunming City (24°23′N∼24°33′N, 102°11′E∼102°22′E), as shown in Figure 1. The township covers an area of 160.32 km2, with complex terrain, and the difference between the highest and lowest elevations is 1223 m, belonging to a unique three-dimensional climate. Tobacco planting is the pillar industry of the town.

2.2. Data Acquisition and Preprocessing

The low-altitude remote sensing platform used for data acquisition is phantom 4 RTK UAV, and the camera is fc6310r_8.8_5472 × 3648. To obtain the local tobacco data, several flight belts were designed, and one of them was used as case data for processing and analysis. The case data is Lvxi Village, Xiyang Country of Yi Nationality. The aerial photography time is July 29, 2020. The route planning is shown in Figure 2, and the data thumbnail is shown in Figure 3. The spatial resolution of the image is 0.027 m, and the coverage area is 0.1984 km2. The coordinate system is UTM zone 48 and Northern Hemisphere transverse Mercator WGS 84.

The original UAV remote sensing image is processed by PIE-UAV V6.0 for image matching, image alignment, camera optimization, orthophoto correction, image blending, and mosaic to generate a digital orthophoto map. The image size is 19439 × 22081 pixels.

2.3. Production of Tobacco Semantic Segmentation Dataset

In this paper, the original DOM image was cut into 1280 × 720 pixels in the batch, and the images without tobacco cover were deleted. The remaining 238 images containing tobacco were included. Labelme, an image marking tool, is used to label the tobacco single category manually, as shown in Figure 4.

2.4. Semantic Segmentation of Tobacco

Due to the great differences in tobacco growth (Figure 5(a)), planting area (Figure 5(b)), planting density (Figure 5(c)), and planting environment (Figures 5(d)5(f)), it is difficult to find ideal features for high-precision extraction of tobacco from UAV remote sensing images by using an object-oriented method. Because of the advantages of self-learning features, deep learning can not only learn simple features but also learn more abstract features. Therefore, this paper uses the method of deep semantic segmentation to extract tobacco from UAV remote sensing images.

At present, there are many network models for deep semantic segmentation, including fully supervised learning image semantic segmentation methods and weakly supervised image semantic segmentation methods. However, the performance of most weak supervised methods still lags behind that of full supervised methods [22]. Therefore, this paper adopts fully supervised image semantic segmentation methods. In this paper, four network models, DeeplabV3+, PSPNet, SegNet, and U-Net, are used to the semantic segment of tobacco in UAV remote sensing images. To greatly reduce the network training time under the premise that the prediction accuracy is not affected, this paper uses the lightweight MobileNet series model [21] to replace the original backbone network of the four network models. Among them, the DeeplabV3+ network adopts the MobileNetV2 model, and the other three networks use the MobileNetV1 model; the structure of the four network models is shown in Figure 6.

2.4.1. Network Training

In order to verify the computing efficiency and efficiency of the lightweight backbone network model, a medium configuration hardware device is selected on the processing platform. The specific configuration is as follows: Intel Core i7-8700 four-core processor, NVIDIA GTX1070, 8G GDDR5 video memory, and 16G DDR4 memory. The training environment of DeeplabV3+, PSPNet, SegNet, and U-Net is TensorFlow GPU version 1.13.1 and Keras version 2.1.5. Because of the small dataset in this paper, the ratio of training data (including verification data) and prediction data is 7 : 3, 167 training images (151 training images and 16 verification images) and 71 predicted images. The training steps and skills are as follows:(1)Setting parameter: Defining the number of classes (NCLASSES), learning rate (LR), Batch Size (BS), and Epoch. The images are divided into the tobacco image and nontobacco image, so NCLASSES = 2 is defined. In order to verify the impact of other parameters on time efficiency and accuracy, based on comprehensive consideration of the experimental platform and data set, taking the U-Net network as an example, the BS values are set as 2, 4, 6, and 8, the LR values are set as 1 × 10−2, 1 × 10−3 and 1 × 10−4, the Epoch values are 40, 50, and 60, respectively. Table 1 shows the time efficiency and accuracy under different values of the three parameters. According to Table 1, LR = 1 × 10−2, BS = 4, and Epoch = 50 are selected in this paper.(2)Downloading the weight file: The MobileNetV1 and MobileNetV2 weight files are downloaded at https://github.com/fchollet/deep-learning-models/releases.(3)Disrupting the training data randomly: When a CNN is trained, although the training data is fixed, due to the minibatch training mechanism, the training data set can be shuffled randomly before each Epoch of the model training. Such processing not only will increase the rate of model convergence but also can slightly improve the model’s prediction results on the test set.(4)Batch Normalization (BN) [23]: To speed up the convergence speed of model training, in the model training, the network activation is normalized by minibatch, so that the mean value of the result is 0 and the variance is 1.(5)Selecting optimizer: In order to reduce training time and computing resources, optimization algorithms that make the model converge faster are needed. The Adam [24] training optimizer is selected in this paper. The LR of each parameter is adjusted dynamically by the first-order moment estimation and second-order moment estimation of the gradient in Adam [24], so the parameter update is more stable.

Figure 7 shows the training accuracy and test accuracy of the four kinds of networks. It can be seen from Figure 6 that the training accuracy and test accuracy of the four kinds of networks have a small gap, and the semantic segmentation performance is good.

3. Results and Discussion

The training parameters are obtained by network training, and it is used for semantic segmentation of tobacco planting area from 71 predicted images in the prediction function. In order to verify the accuracy of semantic segmentation, mIoU [25] and mPA [25] are used to evaluate the overall accuracy of 71 scene images; Precision [26], Recall [26], F1 [27], IoU [25], and PA [25] are used as evaluation indicators to quantitatively evaluate semantic segmentation accuracy of each scene image.

Assume that there are k + 1 classes (0, …, k) in the data set, and 0 usually represents the background. The calculation formula of each indicator is as follows:

In formulas (1)∼(7), i represents the i-th class; TP is True Positive; FP is False Positive; FN is False Negative; TN is True Negative.

Deeplabv3+, PSPNet, SegNet, and U-Net networks are used to semantically segment 71 scene prediction images; the mIoU and mPA obtained are shown in Table 2. It can be seen from Table 2 that, in the two indicators of mIoU and mPA, the results obtained by using the four networks are better than 90%. The results show that deep learning has a very good performance for semantic segmentation of tobacco planting areas. Among them, the U-Net network has the highest mIoU and mPA, and the semantic segmentation results have the best overall performance; the PSPNet network has the lowest overall prediction accuracy. The overall prediction accuracy of U-Net network, Deeplabv3+, and SegNet network has a small difference.

Due to the large number of predicted images, it cannot be fully displayed. This paper selects six images as example data, as shown in Figures 8(a)8(c) and 9(a)9(c). Among them, the tobacco in Figure 8(a) grows well and is densely planted; in Figure 8(b), some areas of tobacco are exposed too much and there are crops similar to the tobacco spectrum; in Figure 8(c), the planting density of tobacco is sparse and the growth is different; in Figure 9(a), there are roads, tobacco, and weeds; in Figure 9(b), there are buildings, roads, tobacco, and weeds, and some tobacco planting areas are small; in Figure 9(c), there are large areas of crops similar to tobacco spectrum.

It can be seen from Figures 8 and 9 and Table 3 that the four kinds of networks used in the data of example 1 have obtained good semantic segmentation results. Among them, the U-Net network has the highest scores in the five evaluation indicators, but there are also some ridges misdetected as tobacco planting areas. The PSPNet network is particularly obvious than the other three networks, and the IoU index in which is also the lowest. In the data of example 2, no network scores the highest in all five indicators. The PSPNet network scores are relatively high and balanced; it also has good performance for a scene with uneven exposure. From the visual comparison, it can be seen that there is false detection in Deeplabv3+, SegNet, and U-Net networks; some crops close to the tobacco spectrum are mistakenly detected as tobacco. Although the problem does not appear in the PSPNet network, some tobacco is missing in the PSPNet network. In the data of example 3, none of the networks achieved the highest scores in all five indicators. In all four networks, some sparse areas of tobacco growth are missed, so the semantic segmentation of uneven growing areas needs to be further strengthened. In the data of example 4, the segmentation results of the four kinds of network semantics are good. The U-Net network scores the highest among the five evaluation indicators; in the Deeplabv3+ and PSPNet networks, the small tobacco planting area in the lower right corner was missed. In the data of example 5, the U-Net network scores the highest among the five evaluation indicators, but the performance of precision, IoU, and PA is lower than that of the first four scenes. The four kinds of networks have different degrees of missed detection, and U-Net has the smallest missed detection area. In the data of example 6, the Deeplabv3+ network scored the highest among the three indicators of Recall, F1, and IoU. It can also be seen from Figure 9(i) that the results obtained by the Deeplabv3+ network are more consistent with the labeled image. Some ridges are marked as tobacco planting areas in the PSPNet network, and some other crops are marked as tobacco planting areas in SegNet and U-Net network, but the false detection areas are small.

Combining Figures 8 and 9 and Table 3, it can be seen that good semantic segmentation results can be obtained for tobacco planting areas in different scenes by using the four kinds of deep learning networks. But different networks still have certain differences in performance for different scenarios. There are a total of 30 scores for 6 scenes of images and 5 evaluation indicators, among which U-Net has 18 highest scores, DeeplabV3+ has 6 highest scores, SegNet has 4 highest scores, and PSPNet has 2 highest scores. Therefore, U-Net and Deeplabv3+ are better than SegNet and PSPNet for small sample tobacco planting areas data set. However, the dependence of the U-Net network on devices is lower than that of the DeeplabV3+ network, and the operation efficiency is higher. It can be seen from Table 4 that the prediction time and training time of the U-Net network are less than those of the DeeplabV3+ network, especially the training time.

4. Conclusions and Discussion

4.1. Conclusions

This paper mainly discusses the application potential of the deep semantic segmentation method on automatic extraction of tobacco planting areas in plateau mountains. Using four deep semantic segmentation methods of DeeplabV3+, PSPNet, SegNet, and U-Net, 151 images are trained, 16 images are verified, and 71 images are predicted. The experimental results show that, compared with the traditional object-oriented image analysis method, the deep semantic segmentation method does not need feature selection and optimization and has higher automation and better universality. At the same time, compared with the four networks, the performance of the U-Net network in tobacco semantic segmentation under a small sample set is better than other networks, and the equipment requirements are not too high, which is convenient for the promotion of deep semantic segmentation method in tobacco planting areas extraction.

4.2. Discussion

The advantages of deep learning in tobacco planting area extraction have been effectively verified, but there are still some problems to be further studied:(1)The results showed that the ridges between tobacco planting areas are mistakenly detected. It is worth further study whether the extraction accuracy can be improved by adding field boundary information.(2)Weeds or crops with similar spectral characteristics are easy to be mistakenly detected. It is worthy of further attempt to eliminate the same and different spectra by using the “shape-spectrum” joint feature.(3)Semantic segmentation of multiple crops needs to be verified. The image is only divided into tobacco and nontobacco for semantic segmentation in this paper, which can weaken the problem of sample imbalance. If a variety of crops are to be semantically segmented at the same time, the applicability of the proposed method needs to be verified.(4)The planting areas with poor growth or no harvest are easy to be missed. Morphological operation is an effective method to deal with holes. It is a feasible choice to use the morphological method to deal with the problem of missing detection caused by poor growth area.(5)There are many hyperparameter settings in deep learning. Different parameters will have a certain impact on the subsequent time efficiency and accuracy. How to choose the best parameters can still be a relatively difficult task.(6)The extraction effect of tobacco planting areas under different spatial resolution needs to be further verified, because the flying height of the UVA determines the spatial resolution of the image. For areas with a large drop, there may be differences in tobacco features obtained at the same height. The effect of the terrain drop on the semantic segmentation effect of tobacco needs further research.

Data Availability

The original data have not been made publicly available, but it can be used for scientific research. Other researchers can send emails to the first author if needed.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This research was funded by the National Natural Science Foundation of China, Grants nos. 41961039 and 41961053, and the Applied Basic Research Programs of Science and Technology Department of Yunnan Province, Grant no. 2018FB078.