1 Introduction

At present, domestic ports still rely on backward manual methods for the detection of foreign bodies in the transport corridor. The staff driving the road car through the human eye for close-range search, this method is inefficient, poor reliability, and on the other hand, the precious passage time is delayed, and the number of transportations is reduced. With the continuous development of technology, computer network technology and image processing technology have gradually matured, promoting the development and innovation of video detection technology. By combining the foreign object detection of the transportation channel with the video surveillance technology, a port-channel foreign object detection and recognition system based on the vehicle camera is studied, which is a simple and effective technical method. The patrol car is used as a carrier to automatically locate foreign objects on the transport path during the inspection of the road car [1].

In recent years, machine vision has developed rapidly, and monitoring technology has become increasingly mature. Video surveillance technology has been used in many industries, and it has also been adopted in the safety monitoring of transport corridors [2]. In the late 1970s, Marr first proposed a more complete visual system framework from the perspective of information processing and integrated image processing to form a new stereo vision computing theory. After entering the 1990s, stereo vision has gradually developed into a new discipline of cross-integration in multiple fields. It has received great attention from academia and industry and has been widely used in industrial image detection, robot vision technology, medical image analysis, spatial remote sensing, military navigation technology and traffic management, and its application field is expanding [3]. In 1999, Lowe D. of Columbia University proposed the SIFT (Scale Invariant Feature Transformation) algorithm, which was summarized in 2004. The algorithm can be invariant to the image due to rotation, scale scaling and brightness changes, while maintaining a certain degree of stability due to visual angle changes, partial occlusion and image noise, which has been successfully applied to the image matching field [4].

Germany’s FrankScherer et al. use a multi-sensor fusion technology to achieve on-board foreign object detection using a camera mounted on a train (passive detection sensor) and a laser radar (active detection sensor). This technology can realize the detection of foreign objects in the 400 m range in front of the train on a train with a speed of 120 km/h. The method only needs to install the sensor on the train, the installation is simple, the cost is low, but the detection range is limited. This method can only detect the 400 m range; however, the train braking distance of 120 km/h is 800 m [5]. In addition, this method makes it difficult to take effective emergency measures in time even if foreign matter is detected. Femando J. Alvarez et al. at the University of Extremadura, Spain, used ultrasonic detectors mounted on either side of the track to monitor foreign objects falling into orbit. Alcala University uses infrared detectors installed on both sides of the road to form an infrared barrier to detect foreign objects with a size exceeding 0.5 × 0.5 × 0.5 m, and uses data fusion technology to calculate the size and orientation of foreign objects. The method is sensitive and easy to implement, but the installation is complicated, the coverage is small, the cost is high, and it is greatly affected by the environment [6]. Arvind HadNarayanan and others at University College London in the UK used a 24.225 GHz MIMO radar installed on the side of the road to achieve cross-section foreign object detection. The system can cover a range of 30 m, enabling the detection of foreign objects with a size exceeding 0.5 × 0.5 × 0.5 m. The method has high reliability, but the coverage is small and the cost is high. It is only suitable for special road sections such as level crossings [7]. Sehchan Oh et al. of South Korea implemented foreign object detection at a station based on an image processing method. The method mainly uses image difference to distinguish foreground and background, and distinguishes vehicles and pedestrians by the size and shape of foreign objects. This method is easy to implement, low in cost, but its reliability is low, and it is difficult to distinguish foreground and background from stationary foreign objects, and accurate positioning of foreign matter cannot be achieved [8]. Giovanni Garibotto, Marco Corvi et al. proposed the use of binocular stereo vision for foreign body ablation detection. The method simultaneously captures two images simultaneously by two 640 × 480 pixels, 12 mm focal length digital micro cameras fixed at 4.5 m above the track. Then, feature points are extracted and matched to the acquired image pairs, and the three-dimensional coordinates of the feature points are calculated by using the off-camera parameters of the off-camera calibration. Finally, the three-dimensional coordinates of the feature points are analyzed to realize foreign object monitoring and alarm. The method has high reliability and can realize precise positioning of foreign objects, but the coverage is small (15 m), which is only suitable for special road sections. There is currently no fast and accurate matching algorithm, and it is still in the exploration phase [9].

Considering the complicated physical conditions such as road cracks in the transportation channel and ground lamps, it is often impossible to accurately detect foreign objects based on a single two-dimensional image processing method [10]. To this end, this study is based on the traditional vehicle-mounted camera monitoring technology, first to make the system adapt to changes in weather conditions, using image enhancement and other pre-processing methods to improve the quality of video images. Secondly, based on the binocular stereo vision model, this study uses polar line constraints and epipolar correction to quickly match the common field of view of the binocular camera and calculate the depth information progressively. At the same time, the depth information obtained by comparative analysis can effectively exclude the influence of complex conditions on the channel on the detection results, so as to achieve accurate positioning of foreign objects in the transportation channel [11].

2 Research methods

It is difficult for a monocular imaging system to recover the three-dimensional information of an object; the two-dimensional image acquired by the binocular imaging system from two angles can effectively recover the original three-dimensional information of the object [12]. Therefore, this study is based on binocular imaging.

In Fig. 1, the coordinates of the object point M in the scene in the camera coordinate system Oc − XcYcZc are (xc, yc, zc). The point M mapped to the image coordinate system O — XYZ is M’, the coordinates are (x, y), and the focal length is f. The mapping is a three-dimensional to two-dimensional process, the mapping relationship is shown in Eq. 1, and can be represented as a matrix form, as in Eq. (2) [13].

$$ \left\{\begin{array}{c}x=f\frac{x_c}{z_c}\\ {}y=f\frac{y_c}{z_c}\end{array}\right. $$
(1)
$$ {z}_cm=\left[\begin{array}{c}f{x}_c\\ {}f{y}_c\\ {}{z}_c\end{array}\right] $$
(2)
Fig. 1
figure 1

Pinhole imaging model

Among them, m and n are respectively

$$ \left\{\begin{array}{c}m={\left[x\ y\ 1\right]}^1\\ {}n=\left[{x}_c\ {y}_c\ {z}_c\right]\end{array}\right. $$
(3)

At present, the foreign matter recognition method based on the digital image processing technology is basically processed by the gray image, which can improve the detection efficiency. In the actual scene, the image captured by the camera on the spot is a color image, so it is necessary to convert the color image into a grayscale image. Each pixel in the color image is composed of three color components of red, green, and blue. The gray level of color image is the process of converting the color values of three components into a certain value according to a certain correspondence. The mathematical expression is [14]

$$ \mathrm{Gray}=0.299\mathrm{R}\left(i,j\right)+0.587\mathrm{G}\left(i,j\right)+0.114B\mathrm{R}\left(i,j\right) $$
(4)

Among them, Gray represents a gray value, R(i, j) represents a red component, G(i, j) represents a green component, and R(i, j) represents a blue component. The color image is converted into a grayscale image by the Eq. (4), and the result is shown in Fig. 2. Figure 2a is a road color image, and Fig. 2b is a road grayscale image.

Fig. 2
figure 2

Color image grayscale. a Color image. b Grayscale image

The finite difference calculation gradient amplitude is obtained in the 2 × 2 neighborhood, which is sensitive to noise, easy to detect false edges, and the detection result is rough. Finally, the adaptive performance of artificially detecting the edge of the image to set the high and low thresholds is poor. Based on this, this study improves the traditional Canny edge detection algorithm and obtains an improved edge detection method for Canny operator.

For an original image to be processed, it is often the same as signal and noise. Therefore, maximizing the signal retention and eliminating noise is the key to smoothing an image. This requires a better filtering method to smooth the image to eliminate noise and facilitate further processing. The traditional Canny edge detection operator uses Gaussian filtering to smooth the image. Its mathematical expression is [15]

$$ G\left(x,y\right)=\frac{1}{2\pi {\sigma}^2}\exp \left[-\frac{x^2+{y}^2}{2{\sigma}^2}\right] $$
(5)

Gaussian filtering is low-pass filtering, and the choice of variance is critical, and its size represents the narrowness and width of the band. The larger the variance, the narrower the frequency band, which can suppress the noise very well, but it may reduce the sharpness of the image edge due to the smooth transition, and the image edge details are lost. The smaller the variance, the wider the frequency band, the more edge detail information can be maintained, but the ideal noise reduction effect cannot be obtained. The image is smoothed by using extreme median filtering. The noise in the image has its own characteristics. The extreme value median filtering algorithm gives the criteria for judging the signal points and noise points of the image pixels according to their characteristics, and processes them. In this paper, the mean value median filtering is used to replace the Gaussian filter in the traditional Canny edge detection operator to smooth the image. The mathematical expression of the noise judgment criterion and the filtering method principle of the filtering algorithm are as follows [16]:

$$ {\mathrm{x}}_{ij}=\left\{\begin{array}{c}\mathrm{Noise},{x}_{ij}=\min \left(W\left[{x}_{ij}\right]\right),\max \left(W\left[{x}_{ij}\right]\right)\\ {}\mathrm{Signal},\min \left(W\left[{x}_{ij}\right]\right)<{x}_{ij}<\max \left(W\left[{x}_{ij}\right]\right)\end{array}\right. $$
(6)

Among them, [xij] represents a digitized image, Signal represents a signal point in the image, Noise represents a noise point in the image, W[xij] means taking a window operation on the point xij in the image centered on the point (i, j), min(W[xij]) represents the minimum value for all points in the window W[xij], and max(W[xij]) represents the maximum value for all points in the window W[xij].

The filtering method can be expressed as follows [17]:

$$ {\mathrm{y}}_{ij}=\left\{\begin{array}{c}\mathrm{med}\left(W\left[{x}_{ij}\right]\right),{\mathrm{x}}_{ij}\in \mathrm{Noise}\\ {}{\mathrm{x}}_{ij},{\mathrm{x}}_{ij}\in \mathrm{Signal}\end{array}\right. $$
(7)

Among them, ed(W[xij]) represents the median of all points in the window W[xij]. In order to verify the filtering effect of Gaussian filtering and extremum median filtering, 10% salt and pepper noise was added to the gray image of Fig. 2, and the filtering results of Gaussian filtering and extreme median filtering were compared. The results are shown in Fig. 3a–c.

Fig. 3
figure 3

Filtering results with 10% salt and pepper noise. a The effect of adding noise to the original image. b Effect after Gaussian filter processing. c Results after extreme value median filter processing

Figure 3a–c is the extreme median filtering results for adding the same salt and pepper noise image, respectively. It can be seen from the filtering result that the extreme median filtering smoothed the image, which improves the filtering effect, improves the sharpness of the image, and maintains a good output signal-to-noise ratio. The improved Canny operator edge detection method proposed in this paper uses extreme median filtering to smooth the image, the weighting coefficient calculation method calculates the gradient amplitude and direction, and the edge detection high and low threshold is determined by the improved iterative threshold segmentation method. The method improves the accuracy of image edge detection and can achieve good image edge detection effect.

Improved Canny operator edge detection method is qualitatively analyzed. In the MATLAB simulation software, the improved Canny operator for determining the gradient amplitude and the improved Canny operator are respectively determined by the traditional calculation of the finite difference of the first-order partial derivative in the eight neighborhoods of the pixel. Edge detection is performed on the image of Fig. 3, and the detection result is shown in Fig. 4.

Fig. 4
figure 4

Comparison of effect detection of improved Canny operator edge detection algorithm. a Traditional Canny operator edge detection algorithm. b Improved Canny operator edge detection algorithm

The traditional Canny operator detects the edge of the transport channel with less useless information, which is greatly affected by the background environment interference, and the detected transport channel edge results have better continuity, but still contain more false edges. However, the modified Canny operator edge detection method has better resolution and better continuity and integrity, and has fewer pseudo edges and better overall contour. This provides good conditions for extracting the edge of the transport channel and establishing a foreign object detection window for the dangerous area of the linear transport channel.

In this paper, the spatial domain method is used for detecting video processing, which is that the object-oriented is the image plane itself, and is based on direct processing of image pixels. According to different processing methods, it can be divided into a pixel-based processing method and a template processing method. The pixel-based approach is the familiar global transformation method, such as logarithmic transformation, gamma transformation and sigmoid function transformation, and histogram averaging. However, the spatial filtering method belongs to the category of template-based processing methods, and the operation object is the image pixel value of a certain field in the image. The object-oriented approach of the spatial domain method is the image plane itself, based on the direct processing of image pixels. The histogram equalization method is one of the most common and important methods in image enhancement. Histogram equalization is based on probability theory. It corrects the histogram of the original image to a histogram of uniformly distributed gray scales through the gray scale transformation function, and then corrects the original image according to the equilibrium histogram to achieve the purpose of enhancement.

We assume that r∈[0,1], r is used to represent the continuous image gray level, r = 0 to represent black, r = 1 to represent white, and s is the transformed gray image value. For the above conditions, r defines the following transformation [18]:

$$ s=\mathrm{T}(r)\ 0\le r\le 1 $$
(8)

The cumulative distribution function of the image gray histogram can determine the transformation function. Some images have a high frequency in the low-value gray interval, so that the details in the darker regions of the image are not clear. At this time, the grayscale range of the image can be separated, and the grayscale level with a small grayscale frequency can be made larger. The transformation function is to transform an image of a known gray probability distribution into a new image with a uniform probability distribution. When the histogram of the image is a uniform distribution, the information entropy of the image is the largest. At this point, the image contains the largest amount of information, and the image looks clear, the research result was shown in Fig. 5.

Fig. 5
figure 5

Comparison of histogram equalization image enhancement effects. a Image before histogram equalization processing. b Image after histogram equalization processing

In fact, most of the visual effects of the image are not good due to the histogram equalization process directly on the original image. Although it can effectively improve the brightness and contrast of the image, it enhances the useful information and enlarges the useless information, which is not conducive to the post-computer processing. Therefore, the histogram equalization method is not commonly used in practical applications. Therefore, in the research, it is necessary to determine how to use it in combination with actual needs.

The research system adopts binocular stereo vision model, and there is a correspondence between common area pixels in the left and right views of the camera according to the conditions of polar line constraint and order consistency. The traditional binocular vision algorithm uses the same feature point to obtain three-dimensional information at the corresponding points of the two images. The matching of feature points is the focus and difficulty of the traditional binocular vision algorithm. On the basis of calibration of the camera, firstly, the image acquired by the camera is corrected for distortion, and the polar line correction is introduced to ensure that the ordinates of the corresponding points in the left and right plane views are equal. On this basis, the deviation between the abscissas is obtained, and the corresponding matching relationship in the common field of view of the left and right cameras is established. In order to avoid feature matching and improve system detection efficiency, based on the actual detection requirements, on the basis of the known correspondence, only the corresponding image in the right view of the left view is required, and the difference is obtained from the difference between the actual right view. If there is no foreign matter in the common field of view of the camera, the difference is close to zero; otherwise, there should be a distinct non-zero area in the difference map, which is a foreign object, and then the foreign object in the corresponding area is detected in the original view by inverse transformation. At the same time, in order to adapt to the changes of weather conditions during the detection process, the image is first pre-processed by image enhancement technology. In order to study the effectiveness of the research method, the port transportation channel was simulated and analyzed, and the corresponding simulation results were obtained. The specific results are shown below.

3 Results

In order to evaluate the performance of the detection algorithm in this paper, this paper conducts experiments on the outdoor actual track surface to simulate the port transportation orbit. We placed the binocular camera on the tower perpendicular to the ground, simulating the working mode of the transport vehicle and performing continuous sampling. The binocular camera selects the digital camera Firefly MV FFMV-03M2C with IEEE 1394 interface, the image resolution is 640 × 480, and the video capture rate is up to 30 fps. The acquisition card uses IEEE 1394b (FireWire 800) four-interface FWBX2-PCIE1XE220. According to the distance between the head of the actual port transport vehicle and the ground, the height H of the camera is 80 cm. According to the FAA standard, the scales standard measured by the foreign exchange detection system of the port transport channel should be at least a cylinder with a radius of 3 cm and a height of 3.8 cm. In this paper, suitable foreign objects are selected as experimental objects. The port channels are simulated on the ordinary road surface, and experiments are carried out in ordinary light environment, strong light environment, and low light environment. At the same time, the experiment also fully considers the complex road conditions that may exist in the port channel.

The experiment firstly uses a binocular camera to capture video images for analysis in a normal lighting environment. Each video image sequence contains two matching images of the left camera view and the right camera view. Distortion correction and polar line correction are performed on the left and right view grayscale images respectively by the algorithm proposed in this paper. It is ensured that in the case where the corrected two images are equal in the vertical direction, only the determined displacement deviation exists in the horizontal direction, so as to further determine the difference value of the corrected left and right images according to the pre-calibrated region mapping matching relationship.

The simulation includes analysis of the monitoring area update process as shown in Fig. 6 and analysis of the foreign matter extraction process as shown in Fig. 7 and left and right view foreign matter matching analysis as shown in Fig. 8.

Fig. 6
figure 6

Update process diagram of the monitoring area. a Video screenshot of the port transport corridor. b Identified rail edges. c Updated monitoring area

Fig. 7
figure 7

Foreign matter extraction. a Screenshot of foreign object invasion. b Extracted foreign matter area. c Extracted foreign matter

Fig. 8
figure 8

Foreign object matching in left and right views

The above simulation diagram is the processing result of the left view of the binocular imaging system, and the processing of the right view is the same as the processing of the left view. Figure 8 is the result of matching the foreign object application matching algorithm extracted from the left and right views of the binocular imaging system. Table 1 is the statistical information of the matching feature points.

Table 1 Information of left and right view feature points

4 Discussion and analysis

Figure 6a is a video screenshot of a port transportation channel in a certain area. The rail identification algorithm is used to accurately identify the rail edge of Fig. 6b. Then, according to the identified rail edge, the track video image is updated and the background of the monitoring area is updated, and the updated monitoring area is defined as shown in Fig. 6c. Figure 7 is a simulation diagram of the foreign matter extraction process. The video image of each frame delineating the monitoring area is compared with the updated background image. If the sum value of the difference is greater than the specified value, it is determined that there is foreign matter intrusion in the port transportation channel section at that time, and the system alarms. Figure 7a is a screenshot of foreign object invasion in the intrusion port transport corridor. The foreign object position area map as shown in Fig. 7b is obtained by comparing the difference of the background image of the monitoring area. At the same time, the grayscale image of the foreign matter is extracted, as shown in Fig. 7c.

From the simulation analysis, the specific location of the intrusion feature point in the camera coordinate system can be seen. The area where the discrete points are located is around 2.4 m on the x-axis, about 1.9 m on the y-axis, and around 30 m on the z-axis. Therefore, the coordinates of the invaders are (2.4, 1.9, 30). The size of the object can be judged by the dispersed area of the feature points. The simulation results show that the size of the invading object is roughly 0.3 m × 0.3 m × 2 m, which is roughly consistent with the size of the actual invader.

The identification results of foreign object intrusion target in port transportation channel generally include the following three situations: first, there is a foreign object in the image, and the recognition result is also the existence of a foreign object, that is, the target is correctly judged as the target. Second, there is no foreign object target in the image, and the recognition result is that there is a foreign object target, that is, the non-target is wrongly judged as the target. Third, there is a foreign object in the image, and the recognition result is that there is no foreign object target, and the target is wrongly judged as non-target.

In this study, an in-depth research and analysis of traditional edge detection operators is proposed to improve the edge detection algorithm of Canny operator. In this study, the edge detection of the linear port transportation channel is realized by this method. At the same time, the edge extraction of the linear port transportation channel is completed by Hough transform, and the foreign object detection window of the dangerous area of the linear port transportation channel is established based on the extracted orbital edge. Secondly, this study determines the dangerous area of foreign object detection in the curved port transportation channel according to the method of marking the foreign object detection area in the actual scene, and establishes the foreign object detection window of the dangerous area of the curved port transportation channel by the method of separating and reassembling the image first. In addition, improves the detection accuracy.

In this study, the background image was extracted by the double background modeling method, and the double background was updated separately. In addition, according to the background modeling construction method, this paper designs a target detection method based on background difference method and interframe difference method to realize the detection of foreign object intrusion target in port transportation channel. At the same time, this study carried out pseudo-target processing, and finally obtained a good foreign object target detection result.

In this study, the image correlation coefficient and the area ratio of the foreign object and the image are used to realize the intelligent identification of the foreign object intrusion target in the port transportation channel. At the same time, according to the area where the train increases or decreases in the foreground of the sequence image, the train separates the train from the foreign objects, which realizes the intelligent identification of the port transporter and effectively improves the adaptability of the intrusion identification of the orbit foreign object.

5 Conclusion

In this paper, from the perspective of stereo vision technology, some new detection algorithms are proposed for video-based port transport channel foreign object detection system, and an image enhancement method is proposed. Considering the low-contrast images collected by light and weather conditions, the original image is analyzed by principal component analysis, the principal component is extracted as the luminance component, and the improved algorithm is used for image enhancement. Secondly, to ensure that the color is not distorted, the non-principal component is restored as a color component. Finally, based on the global analysis, the entire image is adaptively compensated to obtain the final enhanced result. Based on the analysis of the traditional target detection algorithm in the lack of foreign object detection on the port runway, this study proposes a detection method based on binocular stereo vision. Firstly, this study implements camera internal and external parameter calibration based on camera calibration technology, and performs image correction on the left and right views of the binocular camera through distortion correction and polar line correction. Secondly, this study establishes the corresponding matching relationship of the common field of view of the left and right views of the binocular camera, and realizes the foreign object detection by using the difference map of the region map matching, and proves the feasibility of the algorithm through experiments.