1 Introduction

Computer vision (CV) is the technology used to obtain information from digital image/video, and it has been increasingly applied in vibration tests for civil engineering structures. There are no strict requirements for image/video when using CV technology. Both consumer-grade cameras (e.g. single-lens reflex camera [1], action camera [2]) and professional cameras (e. g. high-speed camera [3]) have been used for static or dynamic deflection tests. According to previous research, most applications are long-span bridge tests, since the structural displacements in these situations are large enough to be captured by cameras [4,5,6]. In addition, some lab tests with relatively large displacement can be performed using CV technology [5, 7,8,9]. Although the measuring performance of CV technology is not as good as conventional vibration data acquisition methods such as inertial accelerometers, CV technologies are still a hot research topic, due to convenience and the huge amount of information which can be extracted from videos. For example, CV technology can be used to estimate the excitation caused by vehicles [10] or pedestrians, as well as measuring structural response [11].

Telecom structures can be high guyed masts, lattice towers, monopoles, and so on. A large number of high guyed masts were constructed in the UK between 1960 and 1980 for long-range analog television and radio broadcast, and they are now beyond their original design life [12]. As broadcast and communications technologies evolve, many telecom structures need to be assessed to install the new types of antenna, and a good knowledge of the structural vibrations helps to assess their condition, for example, fatigue damage, mainly in the connections. Accelerometers [12] and some non-contact vibration sensors [13] have been used to investigate the vibrations in these kind of structures. Some structural assessment methods are based on the global structural vibration characteristics [14], while local vibrations like guy cable galloping and antenna vortex shedding can influence the main structure and should be considered separately or regarded as an excitation. Because of the reasons above, it is worth developing new methods for high guyed mast cable galloping and antenna vortex shedding measurements.

Smartphones are a powerful tool in people’s daily lives, integrating several kinds of sensors (camera, barometer, GPS module, gyroscope, accelerometer, compass, and so on). The smartphone accelerometers have been used for structural vibration[15] and vibration serviceability [16] research. There is some research on using smartphone cameras for dynamic tests, but due to the low resolution and high distortion of smartphone cameras of the day, these tests are mostly lab trials [17,18,19,20,21,22]. Nevertheless, digital images obtained with a smartphone have even more information than other cameras, e.g., the GPS location information, and as they are more convenient to use than other devices, and with fast improvement driven by consumer demand, these devices have great potential to be used in field tests. As hardware on newer models improves and more functions are integrated into newer smartphones, resolution and distortion will be decreasing problems.

Cable galloping and antenna vibrations induced by vortex shedding can unexpectedly occur in low wind speed conditions. During routine inspections, these phenomena were observed by inspectors and videos were captured using their smartphones. These videos were processed to provide the vibration data of different high guyed mast components, i.e., guy cable, guy cable anchor, and antenna. The purpose of this paper is to identify proper video processing procedures to extract vibration information from videos obtained in the field with smartphone cameras, and provide fast, convenient, and efficient ways for vibration investigation for high guyed masts and similar structures.

2 Subject and methodology

2.1 Telecom structures

Videos used in this paper were collected from several sites; three from high guyed masts and one o from a high lattice tower. Among telecom structures, high guyed masts are the most complex structure type. Figure 1 shows a typical high guyed mast. It should be noted that the videos used in this paper are not all related to the structure shown in Fig. 1, so the structural details and corresponding vibration characteristics can be different.

Fig. 1
figure 1

Typical high guyed mast

The heights of high guyed masts are mostly in the 200–400 m range. Cables in different directions are used to stay the mast. High guyed masts are a kind of vibration-sensitive structure. Vibrations occur not only with the main structure but also with components and this can cause serviceability and fatigue problems. Cable galloping and antenna vortex shedding are two vibration problems that occur frequently with high guyed mast structures and they have not been rigorously studied. Once galloping or vortex shedding occurs, the corresponding vibration amplitudes are high, and not easy to measure with contact sensors, since these kinds of vibrations are related to the section shapes of structural components, and contact sensors can change the section shape and mass.

2.2 CV algorithms used for dynamic tests

Valid algorithms for structural motion capture include correlation-based matching, optical flow, scale-invariant feature transform, [5, 23, 24] etc., for target or non-target situations. Among them, image binarisation and template matching are used in this work as the basic algorithms for video processing, and they will be introduced later in this section. Additionally, some other processing stages are also performed to improve the quality of the result; these processes will be explained in the corresponding subsections.

Some CV-based methods have been applied for cable vibration measuring. For example, Ji et al. [25] and Kim et al. [26] used correlation-based methods. As mentioned before, these methods are for structures with a small deflection. Duan et al. [27] used the digital image correlation method for bridge cable vibration tests in the lab and obtained high accuracy measurements compared with the displacement meter measurement, yet this method was used along with artificial targets. Xu et al. [28] developed a method based on edge detection and Hough Transform line fitting, which can be used for bridge cable vibration field tests when the cable is taut. Zhu et al. [24] used the optical flow method, which is suitable for most motion capture, but the process is time-consuming. Chu et al. [29] used Scale Invariant Feature Transform (SIFT) to capture the motion of a cable, while artificial targets are needed when the background is complex. For cable galloping, the motion amplitude is high, and the cable will not stay in the same shape, so in this work, image binarisation, which is not the highest precision method but is fast and easy to apply, is chosen for cable galloping measurement. A correlation-based template matching method is selected for cable anchor and antenna vibration measuring.

2.2.1 Image binarisation

Telecom mast cables should stay taut and vibrate with a small amplitude, but galloping can occur in a low wind speed situation. The large vibration amplitude of cable galloping makes it possible to record with a smartphone camera; however, the large cable deformation will cause the shape-based CV detection algorithms to fail.

Image binarisation is the most widely used segmentation method in image processing. It sets every pixel in a grayscale image as black or white based on a threshold [30], as shown in Eq. 1, where \(\mathrm{src}\left(x,y\right)\) is the pixel in the original image and \(\mathrm{dst}(x,y)\) is the pixel in the processed image. In this way, the image volume can be reduced and some of the objects will stand out more than in the original image.

$${\text{dst}}\left( {x,y} \right) = \left\{ {\begin{array}{*{20}c} {\max {\text{val}} {\text{ if}} {\text{src}}\left( {x,y} \right) > {\text{thresh}}} \\ {0 {\text{otherwise}}} \\ \end{array} } \right.$$
(1)

In this paper, different videos corresponding to guy cable vibration (for a single cable vibration and multiple cables vibration, respectively) are processed with the image binarisation method.

2.2.2 Template matching

The approach used in the template matching method to establish the movement of a target is to select a certain area of one frame of a video as a template, and then compare the template with the following frames of the video to find the locations of the template in different frames. The similarity between the template and a new frame can be calculated with the correlation, as shown in Eq. 2, and the peak value location is the location of the template in the new frame [31]. In Eq. 2, \(f\left({x}_{i},{y}_{i}\right)\) and \(g\left({{x}_{i}}^{^{\prime}},{{y}_{i}}^{^{\prime}}\right)\) are the image intensity values corresponding to each pixel, \({f}_{m}\) and \({g}_{m}\) are the mean image intensity value and \(\Delta f\) and \(\Delta g\) are the image intensity standard deviations, of the template and the new frame, respectively.

$$C_{{{\text{ZNCC}}}} = \mathop \sum \limits_{i = - M}^{M} \mathop \sum \limits_{j = - N}^{N} \frac{{\left( {f\left( {x_{i} ,y_{i} } \right) - f_{m} } \right)\left( {g\left( {x_{i}^{^{\prime}} ,y_{i}^{^{\prime}} } \right) - g_{m} } \right)}}{\Delta f\Delta g}$$
(2)

The template matching method can be used for objects with no (or small) deformation. The precision and accuracy of template matching are often higher than simple image binarisation and, therefore, it can be used for more detailed measurements. In this paper, cable anchor and antenna vibration videos are processed using the template matching method.

3 Case study

3.1 Single cable galloping measuring

Figure 2 shows a frame from the first cable vibration video. During this video, one of the cables (pointed out with a blue arrow in Fig. 2) was galloping in a large amplitude while the other cables and the main structure had no visible movement. The sampling frequency of this video is 30 fps, the original size of each frame is 1920 × 1080, and the duration of the video is 16.8 s. The image binary method is used to process the video (also shown in Fig. 2), and the mean location of black pixels is regarded as the location of the cable.

Fig. 2
figure 2

Comparison of original image and binary image (single cable vibration case)

To get the cable movement in a real-world unit, the known cable diameter is taken as a reference: Fig. 3 shows a section of the cable. The resolution of the image in Fig. 3 is extended with cubic interpolation between pixels to obtain a reliable measurement of the cable diameter in pixels. The displacement (in real-world units) of the section concerned in the normal direction \(D\) can be calculated with Eq. (3).

$$D = \frac{{d \cdot D_{yp} }}{{d_{yp} }}$$
(3)
Fig. 3
figure 3

Binary image of a cable section and the geometrical relationship between image and real-world displacement

in which, \({D}_{yp}\) indicates the displacement in image axis direction in pixel (\({D}_{y}\) is \({D}_{yp}\) in real-world unit), and can be obtained with CV processing; \(d\) is the diameter of the cable, in this case, it is 44.45 mm; \({d}_{yp}\) is the pixel number of the cable in image axis direction (\({d}_{y}\) is \({d}_{yp}\) in real-world unit) and can be counted with Fig. 3. In this case, \({d}_{yp}\) is 4.0 pixels. Theoretically, the resolution of \({d}_{yp}\) is in 0.1-pixel level since the image pixel number is increased to 10 times its original size with interpolation processing, but in fact the precision is less than that because the cable edge cannot be perfectly detected.

Figure 4a shows the movement data of one section (the 700th row of each frame) of the cable, and the resolution is at pixel level. The time history contains a drift (data trend) since the smartphone was not fixed on a tripod but held by the investigator. The least-square polynomial fitting is used to calculate the drift. In this case, the data trend is relatively small, and a section in the image is regarded as the section of the cable. Figure 4b shows the detrended data in a real-world unit (mm), and the resolution is at a 10 mm level according to the mm/pixel ratio (how many mm one pixel in the image equals in a certain section) in this case. According to Fig. 4b, the vibration time history of the cable is approximately harmonic. Figure 4c gives the Power Spectral Density (PSD) plots of both the original data and detrended data. To get the highest frequency resolution based on short data, the PSDs were calculated without sectioning or averaging, and the corresponding resolution is \(\frac{1}{T}\), where \(T\) is the data length (in second). The same PSD calculation is also used in the following cases. In Fig. 4c, only the ultra-low frequency part is removed from the original data, and the main frequency of a single cable vibration is 0.96 Hz.

Fig. 4
figure 4

a Original vibration history, b detrended data, and c PSD of a section of the high guyed mast cable

3.2 Multiple cable galloping measurement

Figure 5 shows a single frame of the second cable vibration video. The sampling frequency and the frame size of this video are the same as in the first one (30 fps and 1920 × 1080, respectively), and the duration is 8.6 s. During this video, 3 cables in the same direction (pointed out with orange arrows as Cable 1, 2, and 3, respectively, in Fig. 5) were galloping with large amplitudes.

Fig. 5
figure 5

Comparison of original image, binary image, and Gaussian adaptive threshold binary image (multiple cable vibration case)

The classic binary algorithm failed to segment Cable 3, as shown in Fig. 5b, and part of the background is shaded black. To obtain a better division of the image, an adaptive threshold binary algorithm is used here to get a clear map of all the 3 cables (Fig. 5c). Compared with the original threshold binary algorithm, the adaptive threshold is a weighted sum of the surrounding pixels rather than a constant; therefore, the adaptive threshold binary algorithm will be less influenced by the background.

Here, the adaptive threshold with a Gaussian window is used. The adaptive threshold \(T\left(x,y\right)\) with a Gaussian window can be calculated using Eq. 4, where \(p\left(x,y\right)\) is the greyscale of the pixels around \(\left(x,y\right)\), in this case, the calculating blocks are \(9\times 9\) pixels, \(G\left(x,y\right)\) is the Gaussian window, \(G\left(x,y\right)=\frac{1}{\sigma \sqrt{2\pi }}\mathrm{exp}\left(-\left({x}^{2}+{y}^{2}\right)/2{\sigma }^{2}\right)\), and \(C\) is a constant, in this case, \(C\) was set as 2.

$$T\left( {x,y} \right) = \mathop \sum \limits_{i} \mathop \sum \limits_{j} p\left( {x,y} \right)p\left( {x + i,y + j} \right)G\left( {x,y} \right) - C$$
(4)

For multiple object vibration measuring, if a multi-channel data acquisition system is used, the data of each object can be obtained separately. However, the data sets obtained from different cables can be mixed since they appear in the same image (as shown in Fig. 7a), just as different analog/digital signal appear in the same channel. People can use their eyes to separate out these data sets, but this is difficult to achieve using a computer. However, cables in the same image can be mixed, which is similar to different data in the same channel, human beings can divide them with experience while machines cannot (as shown in Fig. 7a). Therefore, how to separate the different data obtained corresponding to different cables from the preliminarily processed data is also worth studying. In this case, the distances between the cables and the camera are different and therefore they have different widths (number of pixels) in the image. This principle is used here to separate the pixels corresponding to different cables. Figure 6 shows how the pixels are classified. The separated data and corresponding PSDs are shown in Fig. 7b, c. The same process as the single cable galloping case is used here to obtain the displacement in real-world units. The real-world diameters of the cables are all 44.45 mm, and the \({d}_{yp}\) are 9.0, 5.5, and 2.0 pixels for Cable 1, Cable 2, and Cable 3, respectively. Generally, the higher \({d}_{yp}\) value leads to higher displacement precision.

Fig. 6
figure 6

The sketch of the pixel classification method used in this paper

Fig. 7
figure 7

a Raw data obtained with binarisation, b separated data, and c PSD of multiple cable galloping

According to the PSD plot of cable vibration data, (Fig. 7c), the main frequencies of Cable 1 and Cable 2 are 0.47 and 1.18 Hz, respectively. The main frequency of Cable 3 is not recognizable in the PSD plot, since the video is not long enough, while it can be estimated as 0.3 Hz from the time history in Fig. 7b. Compared with the single cable vibration situation (Fig. 4), when multiple cable vibration occurs, the time history is not harmonic, but instead contains several frequencies, and the vibration type is more complex.

3.3 Cable anchor vibration

Vibration of high guyed mast cables can also induce joint fatigue. Figure 8 shows a frame from the cable anchor vibration video. The sampling frequency of this video is 30 fps, the original size of each frame is 1920 × 1080, and the duration is 17.2 s. Since in this case, all the components’ shapes generally stayed the same, the template matching algorithm introduced in the former section was used to obtain the vibration measurement of the cable anchor.

Fig. 8
figure 8

A frame of high guyed mast cable anchor vibration and the template part (green rectangle)

The joint of the cable and the cable anchor [shown in Fig. 8 (green rectangle)] was selected as the template since its shape stays the same and it has a low probability of matching any other part of the same image. Figure 9 shows the template matching result. The cable diameter, which is also 44.45 mm, is used to transform the displacement into a real-world unit, and the corresponding \({d}_{yp}\) here is 6.0 pixels. Only one cable is vibrating during this video, and the result is generally harmonic with little drift, which is coincides with the situation shown in Fig. 8. The frequency calculated with the anchor vibration measurement is 0.94 Hz, which is very close to the one obtained with cable vibration measurement (0.96 Hz as shown in Fig. 4), and the vibration amplitude is much lower since the target is close to the fixed end. The resulting precision in this case should be higher compared with the former two cases, since the precision of template matching is higher than binarisation.

Fig. 9
figure 9

a Original data and data trend, b detrended data, c PSD of original data and detrended data

3.4 Antenna vibration

Figure 10 shows a single frame from the antenna vibration video. Although this antenna is located at the top of a free-standing lattice tower, cantilever antennae are also found at the top of high guyed masts. They are critical elements as they are more efficient than wrap-around antennae further down a high guyed mast and are used as the primary transmitter. The sampling frequency of this video is 30 fps, the original size of each frame is 640 × 352, and the duration is 13.6 s. During this video, the antenna vibrates in a large amplitude while the main mast structural movement cannot be detected.

Fig. 10
figure 10

A frame of high guyed mast antenna vibration, the target part (white rectangle) and the reference part (green rectangle)

The resolution of this video is lower than the videos in the former cases, furthermore, the vibration amplitude of the antenna is also smaller than the galloping cables. To obtain an adequate resolution of the antenna vibration measurement, in this paper, the video is refined with cubic interpolation between pixels. A comparison of the data obtained with the original video and the pixel-interpolated video is shown in Fig. 11. Despite the drift of the video, the vibration of the antenna tip is approximately one pixel, so processing of pixel-interpolation is necessary to obtain sufficient vibration information for this case.

Fig. 11
figure 11

Comparison of data obtained with the original video and pixel-interpolated video

Figure 12 shows the movements of the antenna tip and base vibration data obtained with a template matching algorithm. Figure 12a shows that the movement trends of the antenna tip and base are quite close at the first 10 s, while there is little difference between the movements at 10–14 s. It means the angle of the smartphone altered slightly at the 10th second. The dominant camera movement is the translation and the rotation of the camera can be neglected. Figure 12b shows the PSD plots of the original antenna tip and base vibration data. The peak in Fig. 12b is submerged and inconspicuous, therefore the data needs to be further refined, reducing the drift of the video by one of two methods. One is to align the frames in the whole video; and the other one is to use the antenna base, which is relatively stable, as a reference point to adjust the data of the antenna tip.

Fig. 12
figure 12

Original vibration data of the antenna tip and base

The drift can be eliminated with image alignment. Here, the maximum correlation of the Enhanced Correlation Coefficient (ECC) [32] is used to align the frames. The principle of ECC is to transform one frame to achieve the maximum correlation coefficient with the reference frame, and it can be used for translation, Euclidean, affine, and homography transforms. In this case, the Euclidean transform is chosen to eliminate the drift caused by both translation and rotation of the camera. The ECC can be used here since the lattice takes up most of the video frames, while the method is not suitable for the cable vibration cases because the main structures take up relatively small proportions of the corresponding video frames.

After the frames are aligned, the template matching method is used to obtain vibration measurements from the antenna tip, not the base, which is used as the reference. The diameter of the antenna, which is 425 mm, is used to transform the displacement to a real-world unit, and the corresponding \({d}_{yp}\) is 14 pixels. In this case, the color of the antenna is close to the background and thus the precision of \({d}_{yp}\) is in a pixel level. The template matching result is shown in Fig. 13a. According to Fig. 13a, there is no obvious drift remaining, while after 10 s, frame alignment fails and abnormal peaks occur. The reason why frame alignment failed can be found in the original video. According to the video, the focus of the frame was lost after 10 s. Figure 14 shows a comparison of the normal frame and the focus lost frame. Figure 14 shows that focus lost frame has a lower resolution that can cause the failure of frame alignment. The first 10 s vibration measurement and the corresponding PSD is shown in Fig. 13b, c. According to the PSD plot, a vibration frequency of 2.8 Hz, which is the same as the reference target method, is obtained.

Fig. 13
figure 13

Vibration data of antenna tip (frame aligned): a video length, b first 10 s, c PSD

Fig. 14
figure 14

A comparison of normal frame and focus lost frame

As mentioned before, the main structure is relatively stable compared with the antenna, thus the base of the antenna (the tip of the mast) can be taken as a reference point to refine the measurement corresponding to the tip of the antenna. To deduce the drift of the antenna tip, data are simply subtracted from the antenna base data, and the result is shown in Fig. 15.

Fig. 15
figure 15

Vibration data of antenna tip: a adjusted data, b detrended data, c PSD

A comparison of vibration measurements obtained with reference target and frame alignment methods can be found in Fig. 16. The time histories obtained with different methods are close to each other. That indicates that both reference target and frame alignment methods can be used to reduce the frame drift when processing videos. When the video quality is stable, the frame alignment method should be taken as the first choice since it can eliminate the drift.

Fig. 16
figure 16

Vibration data of antenna tip obtained with different methods

4 Discussion and conclusion

In this paper, the vibration time histories of high guyed mast cables (in both single and multiple situations), cable anchor, and antenna are obtained with different computer vision approaches.

Image binary and template matching algorithms are used to extract vibration information from videos, and among them, the template matching method has a higher resolution. With some pre-processing, such as pixel interpolation, the resolution can be further improved.

When only one of the mast cables vibrates, the vibration is harmonic, while when multiple cables vibrate, the vibrations of some of the cables are no longer harmonic, but contain several frequencies.

Videos captured by smartphone cameras without a tripod can contain a large-amplitude drift. Taking a fixed object as a reference point can reduce the influence of the drift. The ECC can be used for frame alignment, thus eliminating the drift, but it could fail if the video loses focus. Using these methods, the reference target method is shown to be more robust, while the frame alignment method has a better performance in reducing drift.