Next Article in Journal
Massive MIMO-Based Distributed Signal Detection in Multi-Antenna Wireless Sensor Networks
Previous Article in Journal
Gait Rhythm Dynamics for Neuro-Degenerative Disease Classification via Persistence Landscape- Based Topological Representation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Fast Anchor Point Matching for Emergency UAV Image Stitching Using Position and Pose Information

Department of Cognitive Communication, College of Electronic Science and Technology, National University of Defense Technology, Changsha 410000, Hunan, China
*
Author to whom correspondence should be addressed.
Sensors 2020, 20(7), 2007; https://doi.org/10.3390/s20072007
Submission received: 6 March 2020 / Revised: 25 March 2020 / Accepted: 1 April 2020 / Published: 3 April 2020
(This article belongs to the Section Remote Sensors)

Abstract

:
With the development of unmanned aerial vehicle (UAV) techniques, UAV images are becoming more widely used. However, as an essential step of UAV image application, the computation of stitching remains time intensive, especially for emergency applications. Addressing this issue, we propose a novel approach to use the position and pose information of UAV images to speed up the process of image stitching, called FUIS (fast UAV image stitching). This stitches images by feature points. However, unlike traditional approaches, our approach rapidly finds several anchor-matches instead of a lot of feature matches to stitch the image. Firstly, from a large number of feature points, we design a method to select a small number of them that are more helpful for stitching as anchor points. Then, a method is proposed to more quickly and accurately match these anchor points, using position and pose information. Experiments show that our method significantly reduces the time consumption compared with the-state-of-art approaches with accuracy guaranteed.

1. Introduction

With the development of unmanned aerial vehicle (UAV) techniques, aerial images are becoming cheaper, easily accessed, and of higher resolution. In many fields, such as surveying, mapping, resource exploration, and disaster monitoring, remote-sensing images from UAVs are widely used.
In some emergency situations, such as disaster rescue, image-stitching results need to be obtained rapidly. If the computational complexity of the stitching algorithm is low enough, a portable computation device (for example, a laptop or tablet PC) can generate the real-time stitched images of the target area on site after the UAV is released for investigation. Such a process can support the first aid teams in disaster rescue actions, such as earthquakes, floods, mud-rock flow or avalanches. Therefore, reducing the computation cost of UAV image stitching holds great promise for meaningful outcomes.
Works in the literature have proposed various approaches to stitch UAV images, however, most of them address the improvement of precision. Some approaches [1,2,3] are presented to improve the stitching speed. However, the stitching time of these methods is still too long to meet the needs of emergency applications.
Utilizing the Global Positioning System (GPS) and inertial measurement unit (IMU) carried by the UAV, the position and pose information of the UAV image can be obtained easily. The position and pose information can determine the approximate position of the UAV image. However, limited by the accuracy of the instrument, the accuracy of the position information cannot meet the requirements for stitching.
As described above, conventional image-stitching-based methods face the disadvantage of slow speed, while the position and pose information of the image recorded by UAV do not have sufficient accuracy for stitching. To address this problem, this paper proposes a stitching approach that uses some optimization methods to simplify the stitching computation with position and pose information. This approach stitches images by finding several anchor-matches instead of a large number of feature-matches, and reducing the range where features need to be extracted and the number of feature points that need to be matched, which is why it is faster.
The main contributions of this paper are as follows:
  • A novel UAV image-stitching approach based on feature point matching is proposed. Compared to mainstream methods [4,5], it can use position and pose information effectively and reduce the computation time significantly. The main purpose of our approach is to fast stitch UAV images during emergency situations, represented by its name, FUIS (fast UAV image stitching).
  • A novel anchor points selection approach is designed, which can select fewer anchor points from a large number of feature points to accelerate the stitching process with accuracy guaranteed.
  • To validate the proposed approach, we conducted experiments to compare FUIS to existing approaches. The result shows that FUIS can be faster than the existing approaches with guaranteed accuracy.
The remainder this paper is organized as follows. In Section 2, related work on UAV image stitching is introduced. The problem of UAV image stitching is clearly defined in Section 3. Then, Section 4 introduces FUIS by first outlining and then detailing the main points. In Section 5, the experiments and discussions are presented. Finally, the conclusions are drawn in Section 6.

2. Related Works

UAV image stitching has a lot in common with general image stitching. It refers to the technique that merges an obtained sequence of two or more UAV images into one wide-field image by finding an appropriative image transformation model [6]. Various methods are proposed to find the model, such as region-based methods, represented by phase correlation [7], however, feature-point-based methods are more widely used [8].
The feature-point-based methods align images by finding the common feature points of the two images. They are mainly divided into three steps: feature point extraction and description, feature point matching, and image transformation. For feature extraction, the mainstream methods are SIFT (scale-invariant feature transform) [9], SURF (speeded up robust features) [10,11], KAZE [12], and ORB (oriented fast and rotated brief) [13]. In addition to the handmade feature points, deep learning methods, like LIFT (Learned Invariant Feature Transform) [14], are also applied in this field. These feature algorithms are widely used in various applications of UAV image stitching [3,5,15,16,17]. For feature matching, some excellent algorithms are applied, such as RANSAC (random sample consensus) [18], GMS (grid-based motion statistics) [19,20]. These methods improve the accuracy of feature points matching, and avoid the adverse effect of error-matched points on aligning. Therefore, some UAV image-stitching approaches adopt these methods for better performance [19,21]. Then, the matched feature pairs are used to calculate the transformation model to align images. In this step, the transformation homography matrix is widely used because it is simple and effective, as UAV image stitching mainly addresses stitching planar scenes [22]. However, to obtain more accurate results, some literatures [17,23,24,25,26] proposed more complex models and effectively removed parallax.
After finding the appropriative image transformation model, image fusion is usually performed. The main purpose of fusion is to enhance the visual effect and remove the seam line effect. Currently, image-blending methods [27,28] and seam line extraction methods [29,30,31] are commonly used because of their excellent performance.
In addition to traditional image-stitching methods, some methods that are specially designed for UAV image stitching are proposed. Compared to general image stitching, UAV image stitching has unique characteristics. First, as an aerial image, the scene is the ground which is approximately planar, and the subject distance, which is the height, does not change drastically. Second, the position and pose information of the UAV when it took the image is available, but because a UAV is too small, the stability and accuracy of this information is low. Using these characteristics, the literature has proposed some optimization approaches to improve the accuracy and speed of the stitching process. Camera position information is used by [1] to streamline images by removing unnecessary frames, and reducing the cumulative error by finding the lateral relative positional relationship of images and matching these images. Li et al. [2] assumes the height of the UAV is constant, and reduces the computation by removing some octaves from SIFT. GPS is used by [32] to predict the position of the feature points to be extracted in the next key frame, thereby reduce the time consumption of stitching. However, this approach does not use the pose information and still consumes a lot of time to process a large number of feature points. These approaches reduce the computational cost to some extent, but they do not make full use of the position and pose information recorded by a UAV, and are still too time-consuming to meet the needs of some emergency applications. That is the problem we have addressed.

3. Problem Definition

The premise of the problem is as follows. The UAV images of the target area are denoted as { I 1 , I 2 , , I n } . These images have the overlapping field of view at a certain overlapping rate. The position information (including altitude, latitude, and longitude), the pose information (including pose angles of the camera ω , φ   and   κ ) of the UAV when it took the images are available, denoted as { c a m e r a i } , i = 1 , 2 , , n , c a m e r a i = ( a l t i , l a t i , l n g i , ω i , φ i , κ i ) . The focal length is f . Therefore, we can find transform functions inside the homography domain { F 1 , F 2 , , F n } , F i to stitch the images together.
Using the premise above, we define the emergency UAV image-stitching problem as follows:
Given { c a m e r a i } and the UAV images { I i } , emergency UAV image stitching is to find appropriate transformation functions { F 1 , F 2 , , F n } in the least amount of time, so that { F 1 , F 2 , , F n } can stitch { I 1 , I 2 , , I n } together to produce a combined image I with a certain degree of accuracy.
Ideally, a set of absolutely accurate transformation functions { F 1 , F 2 , , F n } , should make the pixels of the same feature in different images { I 1 , I 2 , , I n } be transformed to the same location in I . Assume there are k features { p 1 , p 2 , p k } , each feature appears in m i ( i = 1 , 2 , , k ) images, then we have to find transformation functions { F 1 , F 2 , , F n } that satisfy:
F 1 ( x 1 ( i ) ) = F 2 ( x 2 ( i ) ) = F m i ( x m i ( i ) )   i = 1 , 2 , , k
where x m ( i ) is the pixel coordinate of p i in I m . In practice, we should make Equation (1) approximately hold.
When finding the transformation functions, we can use the position and pose information to simplify the calculation or use a parallel computation strategy to accelerate the stitching process.

4. Methodology

4.1. Overview

To address the problem defined above, we propose a novel stitching approach. The overview of our approach (FUIS) is shown in Figure 1.
Rough stitching is the preparation for optimized stitching. The principle is as follows.
According to the photogrammetry theory [33], the projection relation between the world coordinate and the camera coordinate can be determined:
Z c D [ x y 1 ] = K [ R T 0 1 ] [ X w Y w Z w 1 ]
where Z c is Z value in the camera coordinate. D is the pixel pitch of the imaging sensor and ( x , y ) is the pixel coordinate on the UAV image. R , T   and   K represent the rotation matrix, the offset vector and the projection matrix, respectively.
Using the position, pose, and camera parameters, the above parameters can be determined. The image coordinates in image I 1 ( x 1 , y 1 ) are mapped to world coordinates ( X w , Y w , Z w ) , and then mapped to the image coordinates in image I 2 ( x 2 , y 2 ) . In this way, the mapping relationship between two images I 1 and I 2 is obtained: ( x 2 , y 2 ) = f ( x 1 , y 1 ) , producing the rough registration between the pictures.
The three steps: selecting the anchor points, matching them to obtain anchor point pairs, and calculating the transformation matrix, are the core of image stitching and the focus of our optimization. Feature point matching will inevitably have errors, including error-matches and matching deviations. The traditional methods guarantee robustness against these errors by increasing the number of feature matches. They extract and match many more feature points than needed to make sure the incorrect matches are a minority. And then filter out the incorrect matches using some methods, such as RANSAC. However, the initial inclusion of large numbers of feature points also requires extensive calculations. To overcome this difficulty, we select anchor points from feature points and propose a method to improve the matching accuracy. These methods will be detailed in the following Section 4.2, Section 4.3 and Section 4.4.
For a sequence of images { I 0 , I 1 , I 2 , I n } taken by a UAV, the adjacent image frames are also adjacent in position. Under this condition, we use the following method to stitch the image sequence. Firstly, according to the above steps, the transformation matrixes between each two adjacent images in the sequence are determined, denoted as { H 1 , H 2 , H n } . Then, the first image is used as the reference image and the other images are transformed to the coordinate of the first image. That is, multiply { H i } in series to obtain the transformation matrixes that transform each image to the stitching result { H 0 * , H 1 , * , H n * } .
H i * = { I ,   ( i = 0 ) H i H i 1 , * ( i = 1 , 2 , , n )
The advantage of this method is that the matching task is easy to divide, and, therefore, it is easy to implement in parallel. This will be proven in the later experimental section.
The seam line technology refers to the technology of selecting the best seam line in the overlap area, aimed at eliminating the stitching traces brought about by the geometric misalignment, and improving the stitching visual effect. The Voronoi graph method [30,31] is one of the seam line extraction methods. The principle of this method is to let the seam line pass through the center of the overlapping area instead of the corner, for the deviation at the center of the overlapping area is generally smaller than that at the edge. This paper applies the Voronoi graph method to extract the seam line since few calculations are required and the stitching effect is significantly improved.

4.2. Feature Extraction and Anchor Point Selection

In this section, we extract feature points and select several most stitching-helpful ones as anchor points to be matched in the next step. We call these points anchor points.

4.2.1. Find Feature Points Inside the Overlapped Area

According to the rough registration results, we can use the Sutherland–Hodgman algorithm [34] to find the approximate overlapped area of the adjacent images. In the matching process, only the feature points inside the overlapped area are useful, so we only extract feature points inside the overlapped area. The process is simplified by extracting fewer points, as compared to the traditional full picture extraction.

4.2.2. Use Speeded up Robust Features (SURF) as Feature Extractor and Descriptor

There are many excellent feature-extraction algorithms to choose from, including ORB, SURF, and SIFT. Considering both stitching speed and accuracy, in this paper, we select SURF as the feature descriptor. According to our experiments, compared with ORB, SURF has a higher quality of feature points, which means a higher accuracy of matching. While compared with SIFT, SURF has a faster speed [10,11], due to the use of box filters and Haar wavelet filters. The parameters of SURF will be further discussed in Section 5.1.

4.2.3. Select Anchor Points

To select the most stitching-helpful ones in the extracted feature points as anchor points, we designed the selecting method in accordance with the following principles.
a)
Given priority to feature points with large response
The response is the result of the feature extractor acting on the image. Feature points with larger response are more likely to be more remarkable and discernible, and thus, feature matches between these feature points are more likely to be successful. When we select anchor points, priority is given to the feature points with larger response.
b)
Select an appropriate number of anchor points
It is obvious that the fewer the anchor points, the faster our algorithm. Since the transformation matrix has 8 degrees of freedom, theoretically four pairs of matches are enough to solve the transform matrix. But if the anchor pairs are few, the stitching error will be more sensitive to the error of each point. We find the best tradeoff through the experiments discussed in Section 5.4.1.
c)
Make the distance between the anchor points as large as possible
In this part, we will discuss what principles of spatial distribution should be followed when we select anchor points.
Given a set of anchor point matches S = { ( p n , q n ) } , where p n and q n are the locations of the matching anchor points in the adjacent images I 1 and I 2 , respectively. Due to various matching errors, the positions of points have noises δ p n and δ q n . The accurate position is ( p n * , q n * ) = ( p n δ p n , q n δ q n ) . Our purpose is to determine an anchor point spatial distribution that can obtain the transformation between two images with the least error when the number of matches and δ p n , δ q n is constant.
To make the problem more intuitive and less complex, we decompose the image transformation into translation, rotation, and scaling, and analyze them separately. Among them, the translation is unrelated to the distribution of anchor points. Thus, we focus on the rotation and scaling.
Assume we have two matches ( p 1 , q 1 ) and ( p 2 , q 2 ) . The corresponding noise is ( δ p 1 , δ q 1 ) and ( δ p 2 , δ q 2 ) . The accurate position is ( p 1 * , q 1 * ) and ( p 2 * , q 2 * ) . Denote l 1 = p 1 p 2 , l 2 = q 1 q 2 and δ 1 = δ p 1 δ p 2 , δ 2 = δ q 1 δ q 2 . Use the anchor point matches to estimate the rotation angle θ , we have:
c o s θ ^ = l 1 · l 2 | l 1 | | l 2 |
while true θ is:
c o s θ = ( l 1 δ 1 ) · ( l 2 δ 2 ) | l 1 δ 1 | | l 2 δ 2 |
The error of the estimated angle is:
Δ c o s θ ^ = c o s θ ^ c o s θ = l 1 l 2 | l 1 l 2 | ( l 1 δ 1 ) ( l 2 δ 2 ) | l 1 δ 1 | | l 2 δ 2 | l 1 | l 1 | · δ 2 | l 2 | + l 2 | l 2 | · δ 1 | l 1 |
As can be seen from the above equation, for a certain δ 1 , δ 2 , if we want to make the error of the estimated angle Δ c o s θ ^ smaller, we need to select anchor points with greater distance between each other, which is | l 1 |   o r   | l 2 | .
Similarly, when we estimate the scale factor s , we have:
s ^ = | l 1 | | l 2 |
while true s is:
s = | l 1 δ 1 | | l 2 δ 2 |
The relative error of the estimated s is:
Δ s s ^ = s ^ s s ^ = 1 | l 1 | | l 2 δ 2 | | l 2 | | l 1 δ 1 | 1 1 2 l 2 · δ 2 | l 2 | 2
Similar to the angle estimation, to minimize the error of the estimated scale factor Δ s , we should still increase the distance between feature points | l 1 |   o r   | l 2 | . It should be noted that when calculating the transformation model, we directly solve the transformation matrix H instead of calculating multiple transformation factors separately, such as θ and s . However, the conclusion that the greater the distance between anchor points, the smaller the error is consistent.
From the theoretical derivation above, we find that for a certain match error and a limited number of anchor points, to obtain a transformation model with as little stitching error as possible, we need to make the distance between anchor points ( | l 1 |   a n d   | l 2 | ) as large as possible. Intuitively, if the anchor points can uniformly distribute throughout the image, there will not be a situation where areas with dense anchor points have small error, while areas with sparse anchors or even no anchor points have large error.
According to all the principles above, we designed the grid method to select the anchor points, so that the distance between them is larger and they are evenly distributed in the whole image. The specific process is: first, use S-H algorithm [34] to find the overlapped area of two adjacent images I 1 and I 2 . Choose I 1 as the reference image and determine the minimum enclosing rectangle (MER) of the overlapped area in I 1 . Second, extract SURF feature points in this MER. Third, divide the MER into n*n grids. (The selection of n will be further discussed in experiment Section 5.4.1) Select the feature points that have the largest SURF response from each grid as anchor points, and then find the feature point that matches to it in I 2 . The method to find the matching feature point will be described in Section 4.3.
With this grid selection method, we can guarantee that no more than one anchor point in the same grid will be selected and, thus, the distance between the two anchor points | l 1 |   a n d   | l 2 | is no less than the distance between the grids in which they are located. Therefore, according to Equations (6) and (9), we can obtain a transformation model with less error.

4.3. Find Matching Feature Points in the Neighborhood Window

In this section, we will attempt to ascertain the feature points in I 2 that match the anchor points, and determine whether they match correctly. If the match is correct, it can be used as an anchor point pair.
The traditional stitching method matches the feature points of the whole images, creating two drawbacks. First, there are too many feature points to participate in the matching, which greatly increases the amount of calculation. Second, the more the feature points, the more similar points, thus increasing the probability of error-match. To address these two problems, we propose our matching method: Firstly, we use the neighborhood window to increase the accuracy and reduce the computation. Secondly, we use the threshold of feature matching to filter the error matches.

4.3.1. Neighborhood Window

Generally, the points in I 2 that match each anchor point in I 1 are located near their corresponding positions determined by rough registration. Therefore, we just have to search nearby. Compared with whole picture matching, this matching method has two advantages: first, the amount of required calculations is reduced, resulting in improved stitching speed. Second, in this method, only feature points that are consistent in the feature description and reasonable in position at the same time can be matched. The feature points whose feature descriptors are similar but far from their corresponding positions cannot be matched. This method is equivalent to adding a position constraint to the feature matches in addition to the descriptor. Thus, partial error matches that may occur in whole image matching are avoided.
To facilitate the use of the existing SURF feature point extraction algorithm, the neighborhood adopted in this chapter is a square neighborhood window, which is centered at the corresponding position, with 2 × m a r g i n as the side length. Assume the rough registration error brought by GPS error is n ~ N ( 0 , σ ) and the SURF template’s size of that feature is t , set the window size m a r g i n to:
m a r g i n = 3 σ + t

4.3.2. The Scale of the Feature

Since the height of the UAV does not change drastically when taking adjacent images, the scale of the same feature should be similar. Therefore, when we find matches, we only consider those with similar feature scales.

4.3.3. Feature Match Threshold

We use the Euclidean distance Δ of the two descriptors to match the feature and determine whether the match is correct.
Δ = i = 0 63 ( v p i v q i ) 2
where v p i and v q i represent the dimension i of the feature descriptors. For the SURF feature descriptor, the number of descriptor dimensions is 64.
If Δ is less than a threshold (which will be further discussed in Section 5.4.2), then the matching is successful; otherwise the matching fails. This may occur when the error of GPS exceeds our expectation or in an instance of excessive terrain undulation. In this case, this anchor point should be discarded, and the feature point that has the largest response other than this point will try to be matched as the anchor point, and so on. The process repeats until achieving the correct feature match that meets the threshold.
Figure 2 shows the results of feature point matching in two 4000 × 3000 aerial images I 1 (left) and I 2 (right). The squares in I 2 are the neighborhood windows around the corresponding area of the feature point. The lines connecting the two images represent pairs of anchor points. The black points in I 1 are the anchor points selected according to the method of Section 4.2, and the right endpoints of the lines are the feature points that match the anchor points.

4.4. Calculate the Transform Matrix with Added Constraints

We define the homography matrix as:
H = [ h 11 h 12 h 21 h 22 h 13 h 23 h 31 h 32 1 ]
Then we substitute the anchor points matches into the following formula to solve H .
w [ x y 1 ] = [ h 11 h 12 h 21 h 22 h 13 h 23 h 31 h 32 1 ] [ x y 1 ]
As explained in Section 4.1, when stitching a sequence of images, { H } are multiplied in series to obtain { H * } . This method has the advantages of direct and easy parallel implementation, but it also poses potential problems. If an error occurs when calculating H i , all of the following { H i * , H i + 1 * , , H n * } will be affected, which will destroy the stitching result. We propose a method to prevent this instance.
The change of perspective will generally cause 4 kinds of image transformation: displacement transformation T , scale transformation K , rotation transformation R , and perspective transformation P . The transformation matrix can be written as a combination of these four transformations:
H = T K R P = [ 1 0 0 1 t 1 t 2 0 0 1 ] [ k 0 0 k 0 0 0 0 1 ] [ c o s θ s i n θ s i n θ c o s θ 0 0 0 0 1 ] [ 1 0 0 1 0 0 p 1 p 2 1 ]
H = [ k c o s θ k s i n θ t 1 k s i n θ k c o s θ t 2 p 1 p 2 1 ]
Denote the set { H | H   s a t i s f i e s   E q u a t i o n   ( 12 ) } as 1 , { H | H   s a t i s f i e s   E q u a t i o n   ( 15 ) } as 2 . We will find that 2 1 . Because H in 1 has 8 degree of freedom, while H in 2 has 6. That means some of H that we find according to assumption (12) and Equation (13) are not reasonable. Based on this conclusion, we add two constraints to ensure that H is reasonable, that is:
{ h 11 = h 22 h 12 = h 21
In practice, when we obtain a transformation matrix H , we test it with these constrains. If they are approximately satisfied, which means
{ | h 11 h 22 | < ε | h 12 + h 21 | < ε
where ε is the threshold, then we judge that H is reasonable. Otherwise we discard this H and use the result of rough registration as the positional relationship of the two pictures.
By using this method, we can generally prevent overall stitching failure due to a single stitching error between two images in the image sequence.

4.5. The Specific Process of Stitching Two Adjacent Images

According to Section 4.2, Section 4.3 and Section 4.4, the process of matching two adjacent images, I 1 and I 2 , is determined. First, an anchor is selected from feature points in I 1 according to the principles described in Section 4.2. Second, this anchor is matched according to the method described in Section 4.3. If this feature point pair meets the requirements in Section 4.3.3, this point pair is recorded as an anchor point pair, otherwise the anchor point is discarded and we continue to find the next anchor point. We repeat these two steps until one anchor pair is found in each grid described in Section 4.2.3, part c). Finally, according to the method described in Section 4.4, the transform matrix is calculated by means of the anchor point pairs. The process is detailed in Algorithm 1.
Algorithm 1: The Pseudo-Code of Stitching Two Adjacent Images
Input: Adjacent images I 1 , I 2 ;
The camera parameter of images c a m e r a 1 ,   c a m e r a 2
1. Use c a m e r a 1 ,   c a m e r a 2 to get the rough registration of two images f
2. Use S-H algorithm [34] and rough registration to get the overlapped area of the two images
3. Use SURF to extract feature points { p i }   ( i = 1 m ) in the overlapped area on I 1
4. Dived the overlapped area into n × n grids, and A r e a F l a g i   F A L S E   ( i = 1 , , n 2 )
5. Sort feature points, such that p i . r e s p o n s e > p i + 1 . r e s p o n s e   ( i = 1 , , m )
6. A n c h o r K e y p o i n t P a i r s { }
7. for p i in { p i } :
8. if ( p i is in the k t h grid) and ( A r e a F l a g k = F A L S E ):
9.    p i is used as anchor point, its corresponding position: ( u , v ) f ( p i . x , p i . y )
10.   let region R be the neighbor window around ( u , v ) on I 2 with m a r g i n = 3 σ + p i . F e a t u r e S i z e
11.   extract SURF features points in R { q j } ( j = 1 l )
12.   for q j in { q j } :
13.    Δ j = | p i . d e s c r i p t o r q j . d e s c r i p t o r |
14.   if Δ min < t h r e s h o l d :
15.    push p i , q m i n into set A n c h o r K e y p o i n t P a i r s
16.     A r e a F l a g k T R U E
17.   if all the A r e a F l a g = T R U E :
18.   break
19. Solve transform matrix H with A n c h o r K e y p o i n t P a i r s
20. if | h 11 h 22 | > ε or | h 12 + h 21 | > ε ( h i j is the term of H ):
21.   H use rough registration f ( x , y ) to calculate transform matrix
22. Use H to stitch the images to get the stitched image
Output: a stitched image I

4.6. Theoretical Analysis of Computational Complexity

To analyze the reduction of computation in optimization strategies presented in Section 4.2 and Section 4.3, we assume the size of I 1 and I 2 is M × N and the average density of the feature point is one feature point per P pixel. The overlap rate is r . The size of the neighbor window is 2   m × 2   m , and the num of grids is n × n . The possibility that the found match does not meet the threshold mentioned in Section 4.3.3 is p Δ . Then, in the traditional whole image match, the number of feature points extracted and descripted from two images I 1 and I 2 is:
p 1 = p 2 = M × N   P
while in FUIS, the number of feature points extracted from I 1 is:
p 1 = r M × N   P
and the number of feature points extracted from I 2 is:
p 2 = n × n × 1 1 p Δ   2 m × 2 m   P
In the traditional method, the number of feature-matches we need to make is:
q = p 1 p 2
while in our method, we only need to match the feature points in I 1 with the points in its corresponding neighbor window in I 2 . The number of matches is:
q = p 2 = n × n × 1 1 p Δ   2 m × 2 m   P
In conclusion, using our strategies, the amount of feature extraction and description is reduced to r 2 + 4 n 2 m 2 M N ( 1 p Δ ) of the traditional approach, and the time of required for feature matching is reduced to 4 n 2 m 2 M 2 N 2 ( 1 p Δ ) of the traditional approach.

5. Experiments and Analysis

5.1. Experimental Settings

To verify the performance of the proposed FUIS approach, several experiments were carried out on two sets of images data:
Data set 1. A sequence consisting of 20 UAV images of Xishuangbanna, Yunnan Province, China. They were taken by DJI PHANTOM 3, and are 4000 × 3000 pixels each. The features in the image are mainly houses, so the feature points are relatively easy to extract. However, houses also might lead to parallax in the stitching results. The average overlap rate of adjacent images is 85.1075%.
Data set 2. A sequence consisting of 15 UAV images of Changed, Hunan province, China. They are provided by Beijing Yingce Space Information Yechnology Co. LTD. Their size is 7952 × 5304 . The ground features in the images are mainly farmland. The ground is flat, but it is difficult to extract significant features. The average overlap rate of adjacent images is 71.9428%.
Figure 3 displays these two data sets. These images use EXIF (exchangeable image file format) to record their position and attitude information.
The following experiments are completed on a laptop computer with an Intel® Core™ I7-8750H CPU, 2.20GHz, and 16G RAM. Programming tools and development platforms are Visual Studio 2019 and OpenCV 2.4.1.
In this paper, we select SURF [11], ORB [13], and GMS [19] as baselines. SURF and ORB are mainstream methods for feature extraction and description.
SURF (speeded up robust features) [11] applies the pyramid of DoG (difference of Gaussian) image and orientation assignment, thus providing a great degree of rotation and scale invariance. It is similar to SIFT but faster because of it utilizes acceleration methods, box filter and Haar filter [5]. Therefore, SURF is usually used in UAV image stitching [3,5]. The parameters of SURF are set as follows: set the threshold for the hessian keypoint detector to 100. Use 64-element descriptors rather than 128-element. Set the number of pyramid octaves to 4. Set the number of octave layers within each octave to 3.
ORB (oriented FAST and rotated BRIEF) [13] is a combination of two excellent feature-point algorithms: FAST feature point extractor and BRIEF feature point descriptor, and it achieves excellent results in UAV image stitching [4]. Using these two feature points, we use existing methods, whole image match, as our baseline. When extracting ORB feature points, too many feature points will lead to numerous computations, while too few will lead to inaccuracy in stitching. Based on the experiment, we set the maximum number of features to 100000.
GMS (grid-based motion statistics) [19] is a novel approach that uses ORB as the feature descriptor, and selects the correct matching feature point by considering the relative positional relationship between the feature point pairs, thus significantly improving the accuracy of feature point matches. We use default parameters when applying GMS.
The baselines and some steps of FUIS, including feature extraction and description, transformation matrices calculation, and seams extraction are partially based on the implementation of OpenCV [35].

5.2. Experimental Analysis of Computational Complexity

To verify the computational analysis of Section 4.6, we selected a pair of adjacent images from each of the two image sequences described in Section 5.1. We stitched these two image pairs with FUIS, SURF, ORB, and GMS, respectively, and analyzed the computational complexity. The result is recorded in Table 1.
In Table 1, in addition to extracting feature points and matching, the total time also includes the time required to calculate H and other calculations. The matching time of FUIS includes the time wasted on extracting feature points in the neighborhood window and feature point matching.
As can be seen from the table, as previously analyzed, the feature extraction time can be reduced to less than half of the traditional SURF. OpenCV has optimized the matching algorithm by establishing indexes and applying other methods. However, due to the large number of feature points, the matching time remains extensive, while FUIS significantly reduces the matching time. This advantage is more obvious when the image size is larger and the number of feature points is greater. These experimental results prove the theoretical analysis in Section 4.6, that FUIS can greatly reduce the calculations for both extracting and matching.

5.3. Results Comparison

We apply these approaches to stitch the UAV images and evaluate their performance based on computing time and accuracy.

5.3.1. Computing Time

We stitched the image sequence by FUIS, SURF, ORB, and GMS, respectively, and recorded the matching results and the time they used. The time is recorded in Table 2, and Figure 4 presents the matching results of FUIS.
The experiment results above indicate that FUIS is faster than any mainstream approach. Compared to traditional SURF, the time cost by FUIS on the two datasets was reduced by 69.63% and 72.74%, respectively. When parallel processing was applied, the reduction in time did not achieve the expected increase. Because the SURF of OpenCV [35] that we used is already been optimized with parallel processing. Therefore, even without our parallel processing method, the computing power of our device was almost fully utilized.

5.3.2. Accuracy

The purpose of the accuracy experiment is to find out that to what extent FUIS improves the accuracy of rough registration, and whether FUIS can achieve an accuracy comparable to that of mainstream non-simplified approaches. To this end, we picked two adjacent images from each of the two data sets to compare the stitching result of the rough registration result, FUIS and the baseline mentioned above.
To quantize the error, according to the definition of the stitching problem in Section 3, we manually select a set of corner point pairs in the two adjacent images, as displayed in Figure 5. The points are denoted as P = { p 1 , p 2 , , p n } and P = { p 1 , p 2 , p n } . We then calculate the mean deviation of corner points after stitching. Figure 6 demonstrates the stitching result of FUIS with corner points on it.
b f = 1 n | f ( p i ) p i |   ,   p i P , p i P
where f is the image transformation function to be tested.
As can be seen from Table 3, the error of FUIS is comparable to mainstream approaches. Comparing the results of rough registration, the error is much larger than the other approaches. This indicates that FUIS significantly reduces the error of rough registration, and achieves the accuracy of mainstream homography-matrix-based stitching models.

5.4. The Influence of the Parameters in FUIS

5.4.1. Grid Density

To find a proper grid density for feature reduction and anchor point selection, we use the algorithm with different grid rows and columns to stitch the 2 adjacent images, record the time used, and evaluate the stitching accuracy with the mean deviation defined in 5.3.2.
Figure 7 indicates that when the grid density is less than 4 × 4 , the stitching accuracy decreases significantly. If it is larger than 4 × 4 , the accuracy does not continue to drop dramatically, but there is a corresponding rise in the amount of time used. Therefore, in FUIS, a 4 × 4 grid is used.

5.4.2. The Threshold of Δ

To ensure that the feature matches are correct, we conducted an experiment to find an appropriate threshold for Δ (which is the Euclidean distance between the feature descriptors mentioned in 4.3.3).
With the data sets mentioned in Section 5.1, we obtained 260 feature matches picked by FUIS, manually judged whether the matching is correct, and recorded the Δ of matches. Among them, there were 107 incorrect matches and 153 correct matches. We set the incorrect match as “Positive” and drew a receiver operating characteristic (ROC) figure based on these data. The ROC figure is shown in Figure 8.
The AUC (area under the curve) is 0.9483, which means that using Δ to judge the correctness is reasonable. According to the ROC figure, we set the threshold at 0.042516, where the TPR (the probability of correctly removing the match when it is an incorrect match) is 95%. It should be noted that this threshold is only applicable to FUIS, because the error matching point here is not an arbitrary error match, but an error match that passes the various filters mentioned in Section 4.3. It is in the neighbor window and has a similar feature scale.

6. Conclusions

For the requirement of emergency stitching of UAV images, this paper proposes FUIS that can significantly decrease the computation amount of stitching with the use of position and pose information. This approach meets the emergency requirement efficiently and quickly. This benefit results from the use of the proposed anchor point selection and matching method. According to the experiment, our stitching accuracy is not lower than the mainstream methods, while the speed has been greatly improved.
Since in order to improve speed, we use image transformation instead of orthomosaic generation to stitch the images, our method may have errors when stitching images of areas with large terrain undulations.

Author Contributions

Conceptualization, R.S. and J.L.; Methodology, R.S. and C.D.; Software, R.S.; Validation, R.S., C.D., and H.C.; Data Curation, H.C.; Writing-Original Draft Preparation, R.S.; Writing-Review & Editing, C.D. and J.L.; Supervision, H.C.; Project Administration, H.C. All authors have read and agreed to the published version of the manuscript.

Funding

The work is supported by the Chinese National Natural Science Foundation of [Grant number 61806211].

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Lu, J.; Bai, Y. Research on Low Altitude Aerial Image Stitching. In Proceedings of the 37th China Control Conference (CCC2018), Wuhan, China, 25–27 July 2018; p. 5. [Google Scholar]
  2. Li, M.; Li, D.; Fan, D. A study on automatic UAV image mosaic method for paroxysmal disaster. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2012, XXXIX–B6, 123–128. [Google Scholar] [CrossRef] [Green Version]
  3. Zhang, W.; Li, X.; Yu, J.; Kumar, M.; Mao, Y. Remote sensing image mosaic technology based on SURF algorithm in agriculture. EURASIP J. Image Video Process. 2018, 2018, 85. [Google Scholar] [CrossRef]
  4. Guiqin, Y.; Chang, X.; Jiang, Z. A Fast Aerial Images Mosaic Method Based on ORB Feature and Homography Matrix. In Proceedings of the 2019 International Conference on Computer, Information and Telecommunication Systems (CITS), Beijing, China, 28–31 August 2019; pp. 1–5. [Google Scholar]
  5. Yuan, M.; Liu, X.; Lei, T.; Li, S. Fast image stitching of unmanned aerial vehicle remote sensing image based on SURF algorithm. In Proceedings of the Eleventh International Conference on Digital Image Processing (ICDIP 2019), Guangzhou, China, 10–13 May 2019. [Google Scholar]
  6. Ghosh, D.; Kaabouch, N. A survey on image mosaicing techniques. J. Vis. Commun. Image Represent. 2016, 34, 1–11. [Google Scholar] [CrossRef]
  7. Douini, Y.; Riffi, J.; Mahraz, A.M.; Tairi, H. An image registration algorithm based on phase correlation and the classical Lucas–Kanade technique. SignalImage Video Process. 2017, 11, 1321–1328. [Google Scholar] [CrossRef]
  8. Mou, W.; Wang, H.; Seet, G.; Zhou, L. Robust homography estimation based on non-linear least squares optimization. In Proceedings of the 2013 IEEE International Conference on Robotics and Biomimetics (ROBIO), Shenzhen, China, 12–14 December 2013; pp. 372–377. [Google Scholar]
  9. Lowe, D.G. Distinctive Image Features from Scale-Invariant Keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
  10. Bay, H.; Tuytelaars, T.; Van Gool, L. SURF: Speeded Up Robust Features; Springer: Berlin/Heidelberg, Germany, 2006; pp. 404–417. [Google Scholar]
  11. Bay, H.; Ess, A.; Tuytelaars, T.; Gool, L.V. Speeded-Up Robust Features (SURF). Comput. Vis. Image Underst. 2008, 110, 346–359. [Google Scholar] [CrossRef]
  12. Alcantarilla, P.F.; Bartoli, A.; Davison, A.J. KAZE features. In Proceedings of the European Conference on Computer Vision, Florence, Italy, 7–13 October 2012; Springer: Berlin/Heidelberg, Germany, 2012; pp. 214–227. [Google Scholar]
  13. Rublee, E.; Rabaud, V.; Konolige, K.; Bradski, G. ORB: An efficient alternative to SIFT or SURF. In Proceedings of the 2011 IEEE International Conference on Computer Vision (ICCV), Barcelona, Spain, 6–13 November 2011. [Google Scholar]
  14. Yi, K.M.; Trulls, E.; Lepetit, V.; Fua, P. LIFT: Learned Invariant Feature Transform. arXiv 2016, arXiv:1603.09114. [Google Scholar]
  15. Cui, H.; Li, Y.; Zhang, K. A Fast UAV Aerial Image Mosaic Method Based on Improved KAZE. In Proceedings of the 2019 Chinese Automation Congress (CAC), Hangzhou, China, 22–24 November 2019; pp. 2427–2432. [Google Scholar]
  16. Zhao, J.; Zhang, X.; Gao, C.; Qiu, X.; Tian, Y.; Zhu, Y.; Cao, W. Rapid mosaicking of unmanned aerial vehicle (UAV) images for crop growth monitoring using the SIFT algorithm. Remote Sens. 2019, 11, 1226. [Google Scholar] [CrossRef] [Green Version]
  17. Wang, Y.; Lei, T.; Yang, H.; Zhang, J.; Wang, J.; Zhao, C.; Li, X. The application of UAV remote sensing in natural disasters emergency monitoring and assessment. In Proceedings of the Eleventh International Conference on Digital Image Processing (ICDIP 2019), Guangzhou, China, 10–13 May 2019. [Google Scholar]
  18. Fischler, M.A.; Bolles, R.C. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 1981, 24, 381–395. [Google Scholar] [CrossRef]
  19. Bian, J.; Lin, W.Y.; Matsushita, Y.; Yeung, S.K.; Cheng, M.M. GMS: Grid-Based Motion Statistics for Fast, Ultra-Robust Feature Correspondence. In Proceedings of the IEEE Conference on Computer Vision & Pattern Recognition, Honolulu, HI, USA, 21–16 July 2017. [Google Scholar]
  20. Li, H.; Ding, W.; Wang, Y.; Systems, I. Three-step registration and multi-thread processing based image mosaic for unmanned aerial vehicle applications. Int. J. Smart Sens. Intell. Syst. 2016, 9, 1091–1109. [Google Scholar] [CrossRef] [Green Version]
  21. Yan, K.; Han, M. Aerial image stitching algorithm based on improved GMS. In Proceedings of the 2018 Eighth International Conference on Information Science and Technology (ICIST), Cordoba, Spain, 30 June–6 July 2018; pp. 351–357. [Google Scholar]
  22. Xu, Y.; Ou, J.; He, H.; Zhang, X.; Mills, J. Mosaicking of unmanned aerial vehicle imagery in the absence of camera poses. Remote Sens. 2016, 8, 204. [Google Scholar] [CrossRef]
  23. Chen, J.; Xu, Q.; Luo, L.; Wang, Y.; Wang, S. A robust method for automatic panoramic UAV image mosaic. Sensors 2019, 19, 1898. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  24. He, Q.; Wu, H.; Wang, X.; Li, N. Stitching video streams captured by multi-UAVs with stabilization. In Proceedings of the Tenth International Conference on Graphics and Image Processing (ICGIP 2018), Chengdu, China, 12–14 December 2018. International Society for Optics and Photonics. [Google Scholar]
  25. Luo, L.; Xu, Q.; Chen, J.; Lu, T.; Wang, Y. Uav Image Mosaic Based on Non-Rigid Matching and Bundle Adjustment. In Proceedings of the IGARSS 2019—2019 IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan, 28 July–2 August 2019; pp. 9117–9120. [Google Scholar]
  26. Zaragoza, J.; Chin, T.-J.; Brown, M.S.; Suter, D. As-projective-as-possible image stitching with moving DLT. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 23–28 June 2013; pp. 2339–2346. [Google Scholar]
  27. Allene, C.; Pons, J.; Keriven, R. Seamless image-based texture atlases using multi-band blending. In Proceedings of the International Conference on Pattern Recognition, Tampa, FL, USA, 8–11 December 2008; pp. 1–4. [Google Scholar]
  28. Abdelfatah, H.R.; Omer, H. Automatic Seamless of Image Stitching. Int. Knowl. Shating Platf. 2013, 4, 7–13. [Google Scholar]
  29. Rubinstein, M.; Shamir, A.; Avidan, S. Improved seam carving for video retargeting. ACM Trans. Graph. (TOG) 2008, 27, 1–9. [Google Scholar] [CrossRef] [Green Version]
  30. Pan, J.; Wang, M.; Li, D. Generation method of seam line network based on overlapping Voronoi diagram. Wuhan Univ. J. Inf. Sci. Ed. 2009, 34, 518–521. [Google Scholar]
  31. Song, M.; Ji, Z.; Huang, S.; Fu, J. Mosaicking UAV orthoimages using bounded Voronoi diagrams and watersheds. Int. J. Remote Sens. 2018, 39, 4960–4979. [Google Scholar] [CrossRef]
  32. Zhang, T.; Zhu, M. GPS-assisted Aerial Image Stitching Based on optimization Algorithm. In Proceedings of the 2019 Chinese Control Conference (CCC), Guangzhou, China, 27–30 July 2019; pp. 3485–3490. [Google Scholar]
  33. Eisenbeiß, H. UAV Photogrammetry; ETH: Zurich, Switzerland, 2009. [Google Scholar]
  34. Foley, J.D.; Van, F.D.; Van Dam, A.; Feiner, S.K.; Hughes, J.F.; Angel, E.; Hughes, J. Computer Graphics: Principles and Practice; Addison-Wesley Professional: Boston, MA, USA, 1996. [Google Scholar]
  35. OpenCV. Available online: https://opencv.org/ (accessed on 3 April 2020).
Figure 1. Flow diagram of fast unmanned aerial vehicle (UAV) image-stitching (FUIS) program. The right side is the process of finding the transformation matrix between two adjacent images, and the left side is the process of stitching an image sequence.
Figure 1. Flow diagram of fast unmanned aerial vehicle (UAV) image-stitching (FUIS) program. The right side is the process of finding the transformation matrix between two adjacent images, and the left side is the process of stitching an image sequence.
Sensors 20 02007 g001
Figure 2. Find the matching feature points in the neighborhood.
Figure 2. Find the matching feature points in the neighborhood.
Sensors 20 02007 g002
Figure 3. Two image sequences, (only four of which are shown here. The left one is data set 1. The right one is data set 2.).
Figure 3. Two image sequences, (only four of which are shown here. The left one is data set 1. The right one is data set 2.).
Sensors 20 02007 g003
Figure 4. Stitching result of image sequences (the left one is the result of data set 1. The right one is data set 2.).
Figure 4. Stitching result of image sequences (the left one is the result of data set 1. The right one is data set 2.).
Sensors 20 02007 g004
Figure 5. Manually selected feature points in two images.
Figure 5. Manually selected feature points in two images.
Sensors 20 02007 g005
Figure 6. The stitching result of FUIS with corner points on it.
Figure 6. The stitching result of FUIS with corner points on it.
Sensors 20 02007 g006
Figure 7. The influence of grid number. 0 grids represent the result of rough registration.
Figure 7. The influence of grid number. 0 grids represent the result of rough registration.
Sensors 20 02007 g007
Figure 8. The receiver operating characteristic (ROC) figure for Δ . The + sign denotes the location of the selected threshold.
Figure 8. The receiver operating characteristic (ROC) figure for Δ . The + sign denotes the location of the selected threshold.
Sensors 20 02007 g008
Table 1. Computational complexity of different approaches.
Table 1. Computational complexity of different approaches.
4000 × 3000   Images Time(Second)Number of Feature Points
TotalExtractingMatching
FUIS1.7631.4480.305Checked 2010 anchor points
SURF5.1483.9171.214 58 , 264 × 58 , 745
ORB33.7332.11032.255 91 , 391 × 91 , 491
GMS44.543
7952 × 5304 ImagesTime(Second)Number of Feature Points
TotalExtractingMatching
FUIS12.4576.6975.746Checked 593 anchor points
SURF58.04720.04637.969 351 , 873 × 334 , 461
ORB49.1613.38438.436100 K × 100 K
GMS65.457
Table 2. Time cost by the methods to stitch the image sequence (second).
Table 2. Time cost by the methods to stitch the image sequence (second).
FUISFUIS Using Parallel ProcessingSURFORBGMS
Data set 174.23462.113244.392447.952448.307
Data set 2176.964146.747649.242685.805861.097
Table 3. Mean deviation of different methods’ stitching result (pixel).
Table 3. Mean deviation of different methods’ stitching result (pixel).
Rough RegistrationORBGMSSURFFUIS
Data set 147.7356117.8667117.7465921.0769222.04122
Data set 2243.355612.99215.857344.2700766.284166

Share and Cite

MDPI and ACS Style

Shao, R.; Du, C.; Chen, H.; Li, J. Fast Anchor Point Matching for Emergency UAV Image Stitching Using Position and Pose Information. Sensors 2020, 20, 2007. https://doi.org/10.3390/s20072007

AMA Style

Shao R, Du C, Chen H, Li J. Fast Anchor Point Matching for Emergency UAV Image Stitching Using Position and Pose Information. Sensors. 2020; 20(7):2007. https://doi.org/10.3390/s20072007

Chicago/Turabian Style

Shao, Ruizhe, Chun Du, Hao Chen, and Jun Li. 2020. "Fast Anchor Point Matching for Emergency UAV Image Stitching Using Position and Pose Information" Sensors 20, no. 7: 2007. https://doi.org/10.3390/s20072007

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop