Abstract

Sensing navigational environment represented by navigation marks is an important task for unmanned ships and intelligent navigation systems, and the sensing can be performed by recognizing the images from a camera. In order to improve the image recognition accuracy, this paper combined a contour accentuation algorithm into a multiple scale attention mechanism-based classification model for navigation marks. Experimental results show that the method increases the accuracy of navigation mark classification from 95.98% to 96.53%. Based on the classification model, an intelligent navigation mark recognition system was developed for the Changjiang Nanjing Waterway Bureau, in which the model is deployed and updated by the TensorFlow Serving.

1. Introduction

For unmanned ships and vessel traffic service (VTS), intelligent perception of the navigational environment is an important topic [1]. The navigational environment mainly includes two parts: the dynamic vessels and the navigational features marked by the navigational aids. According to the IALA (International Navigation and Lighthouse Administration Navigation Association), the definition of the term Aid to Navigation (AtoN) means any specific equipment, system, or service outside the ship, specifically used to assist navigators in determining their location or safe route or to warn them of dangers or obstacles to navigation [2]. AtoN mainly consists of buoy and beacon; the former is a floating object fixed at the bottom; the latter is a structure permanently set on the seabed or land. Both of them can be categorized as “marks.” They have distinctive shapes, colors, top marks, and other auxiliary markings which can be observed to indicate their purposes during the daytime. The relevant information about navigation marks is usually obtained through the Electronic Chart Display and Information System (ECDIS) or Automatic Identification System (AIS) [3]. However, it is a new challenge about how to visually and automatically detect navigational aids through the camera.

With the development of artificial intelligence technology, many intelligent detection technologies are applied to VTS [4, 5] and smart ships [6, 7]. Among them, the applications of deep learning technology in the detection and classification of ships are currently widely used [8, 9]. The purpose of this type of application is to supplement information about ships not in AIS [10]. For navigation marks, although their basic information can be obtained through ECDIS, sailors are still required to keep visual observing on their realtime states by the eye or with a telescope [11]. At the present stage, the detection and classification of navigation mark images is not as widely studied as ships [12]. Compared to ship image classification and recognition, there are fewer references available. In previous research [12], we exploited deep learning technology to study the navigation marks’ image recognition during the daytime. It proposed a fine-grained ResNet-based classification model to classify navigation marks named ResNet-Multiscale-Attention (RMA). The accuracy of this model reaches at 95.98% on a dataset including 10260 navigation mark images. However, the experimental results showed that the model also has some certain misclassifications of navigation marks, especially in aspect of the images with inconsistent shapes.

To solve these problems, this paper studied further to improve the classification model for navigation mark images, and the contributions are highlighted as follows.(i)An improved navigation mark classification method with contour accentuation is proposed, and its classification accuracy arrives in 96.53%(ii)An intelligent service system is developed and has been applied by the Changjiang Nanjing Waterway Bureau; it provides image recognition service of navigation marks on the Yangtze River

The contents of this article are organized as follows. Section 2 describes the related works. Section 3 describes the improved classification model for navigation mark images by contour accentuation method. Section 4 provides practical experimental results and discussion. Section 5 illustrates the intelligent application system. Finally, conclusions and future work are given in Section 6.

In deep learning technology, convolution neural networks (CNN) are suitable for visual recognition and image classification tasks. AlexNet [13], VGG [14], GoogleNet [15], ResNet [16], and DenseNet [17] are some of the networks that attract attention from researchers. Various image classification methods based on CNN were applied to many fields, such as medical image analysis [18] and face recognition [19]. Some researches about vessel recognition also had been reported. Shi et al. [20] put forward a new deep learning framework, which combined the underlying functions and could effectively use useful information to classify the ship optical image. Oliveau et al. [21] proposed a new vessel classification theory based on semisupervised learning. Shin et al. [22] proposed a model using interest region combined a convolutional neural network for improving the ship images’ classification accuracy. Solmaz et al. [23] proposed a framework and a new loss function to recognize the marine and land vehicles in a fine-grained way using multitasking learning.

Comparing with the vessel images, the different types of navigation marks may only have subtle differences in certain specific positions. To some extent, their image classification is a fine-grained classification. An important method of fine-grained classification is the attention mechanism. The attention mechanism is essentially to imitate the way humans observe objects. Google [24] proposed a novel recurrent neural network model, which extracted information from images or videos by adaptively selecting regions or position sequences and only processing the selected areas with high resolution. Google [25] also presented an attention-based model for identifying multiple objects in an image. In addition to the research on the attention mechanism algorithm, many scholars apply the attention mechanism to image classification. Haut et al. [26] proposed a new visual attention-based classification algorithm. Yang [27] proposed a RetinaNet model based on attention mechanism to match and classify the target ship accurately. In our previous model for navigation mark image classification [12], an attention mechanism based on three scale fusion of feature map was proposed to locate the area of attention and obtain characteristic.

However, the attention mechanism weakens the contour features. The results of the previous study [12] show that the RMA model has misclassification due to inconsistent appearance. The contour accentuation method can correct these problems [28]. In some fields, this method was widely used. Shotton [29] proposed a new type of automatic visual recognition system based on local contour features, which can locate objects in space and scale. It also confirmed that contour was a powerful hint for the multiscale and the multitype visual object recognition. Lin [30] also developed a new technology for detecting fruits in natural environments based on contour information. Their experiments showed that the proposed method was competitive for most types of fruits in natural environments, such as green, orange, circular, and nonround. To obtain higher accuracy of ship recognition, a contour accentuation method combined a ship recognition method based on transfer learning was proposed to analyse the ship images to detect the ship types. The actual results showed that the contour accentuation method with the transfer learning could obtain higher accuracy in ship image recognition [31]. Obviously, contour features are helpful to visual recognition. Therefore, in this paper, contour accentuation was expected to complement the affect of attention mechanism, and it was combined into the RMA model for navigation mark classification to further improve accuracy.

Recently, there are some intelligent information systems were reported about navigation mark management and service [3234]. However, these systems were mainly developed based on telemetry and remote control; their identification mechanism of navigation marks is different from image recognition. Qi et al. [35] proposed a maritime navigation mark system based on electromagnetic waves, and Zhang [36] proposed a navigation mark communication system based on WLAN. These systems mainly provide information service of navigation marks by position instead of visual recognition. In this paper, a novel intelligent service system for image recognition of navigation marks was developed, and it orients to the application scenarios from camera.

3. Classification Models for Navigation Marks

This section firstly introduces the classification model of navigation marks called ResNet-Multiscale-Attention (RMA) model, then describes how to combine the RMA model with contour accentuation.

3.1. The ResNet-Multiscale-Attention (RMA) Model

In the daytime, navigation marks can be recognized by their shape, color, and other auxiliary features. However, some kinds of navigation marks have a similar contour with subtle differences. Accordingly, for the visual navigation mark image recognition, the fine-grained image classification method is better than the general-level ones. Generally, in the deep neural networks of classification, low-level features have less semantic information but more information about the target’s position. Instead, high-level features have more semantic information but less detailed information about the target’s position. General-level models usually do not perform well in fine-level tasks with the high-level features [12].

To tackle the fine-grained classification of navigation marks, a model called RMA was proposed in which the ResNet-50 was enhanced by adding a multiple scale attention mechanism [12]. As shown in Figure 1, in the network structure of RMA, the images of navigation mark were enhanced firstly by an improved ResNet-50, then classified by the second ResNet-50. The first ResNet-50 layer is designed as an attention matrix to capture the attention regions. Three-channel feature maps from different stages of ResNet-50 represent three detail scales; there are integrated to form an attention matrix by Convolution, Upsample, and Concat processes. And the is then multiplied with element-wise of the input image to highlight the favourable classification area. The second ResNet-50 layer performs classification task and outputs final probabilities of all navigation mark types.

Experiment results on a navigation mark image dataset showed that the RMA had classification accuracy about 95.98%, which was better than 94.14% of the ResNet-50.

3.2. RMA Model with Contour Accentuation

Contour features are helpful to visual recognition by enhancing target in the image, which was verified in many types of research and our other experiment about ship recognition. The multiple scale attention mechanism of RMA is aimed at locating the target’s region, and the objective of contour is the enhancement of the target’s features. In this paper, the contour accentuation method is considered to be combined into the RMA model for further improving the classification accuracy of navigation marks.

The original image of navigation mark as shown in Figure 2(a) is a size of color image, with red (R), green (G), and blue (B) three color channels. Its pixel matrix can be denoted as , where the value of red, blue, and green color in the pixel position are , , and , respectively.

The function in Equation (1) is defined to measure the color difference between pixel and pixel . If the color difference with all its neighbours is more significant than a critical value , the pixel can be regarded as a contour point. Otherwise, it is not on contour. So, by Equation (2), the pixels on contour are set to black, otherwise set to white. The contour of navigation mark image can be captured in Figure 2(b).

Furthermore, by Equation (3), which keeps the original color of pixels that is not on contour instead of white, an image with contour accentuation can be obtained. In Figure 2(c), the navigation mark is enhanced by the contour features obviously.

To combine the contour accentuation method with the RMA model, the contour accentuation algorithm can be used as an image preprocessing method, and the RMA model adopts the navigation images with contour accentuation as inputs directly.

4. Experiments and Results

To validate the effectiveness of the RMA model with contour accentuation, a navigation mark image dataset is firstly preprocessed with contour accentuation and then trained and tested with the RMA model.

4.1. Dataset

A total of 10260 images of 42 kinds of navigation marks in the Yangtze River are collected. All images are clipped into a uniform size of to form an original dataset, and then, they are preprocessed with contour accentuation to create a contour enhanced dataset.

In Equation (2), critical value is an important factor which determine the extraction effect of navigation mark’s contour. After contour accentuation, not only the contour of navigation mark is enhanced, other features in the background such as the wave, mark’s shadow, and reflection on the water surface also may be outlined in some extent, as Figure 3(a) showed, which will act as noises to disturb the recognition task. So, in order to eliminate the interference noises in the background and highlight the navigation mark to the maximum, the should be chosen carefully.

From 1 to 5, the affection of different was investigated. As Figure 3 showed, when is less than 3, the noises are obvious (Figures 3(a) and 3(b)), and when equals 3, the contour of navigation mark become sharp and clear, and the noises in background are suppressed (Figure 3(c)), when is great than 3, the contour almost keep unchanged (Figures 3(d) and 3(e)).

Therefore, finally chose , and all images in original dataset were performed contour accentuation to form a new dataset. Figure 4 shows part images of the new contour enhanced dataset, in which “DCZYTHFB,” “ZADCCMFB,” “ZAZVXCMFB,” “YAGXCMFB,” “SDGGXFB,” etc. are the labels of different kinds of navigation marks. And, in both original and contour enhanced datasets, each type images are divided into training and testing parts according to the ratio of 8 : 2.

4.2. Training Details

The RMA model is implemented by Python 3.7, the deep learning framework of TensorFlow 2.0, and trained in a workstation with two graphics cards of NVIDIA GeForce GTX 1080.

Since the number of images of the different navigation types in the dataset is unbalanced, the loss function is designed as Equation (4).

The parameter is a calculation factor in advance based on the datasets, which has a more significant contribution to the loss function of the fewer types of samples. For the purpose of making the model converge faster, an SGD optimizer with momentum was used. For comparison, experiments were carried out on both the original dataset and the contour enhanced dataset.

Figures 5(a) and 5(b) show the loss and accuracy curves of the RMA model on the two datasets, respectively. Furthermore, to verify the effect of contour accentuation, the classification accuracy of six different deep learning network structures on the two datasets were also investigated.

4.3. Experimental Results

Table 1 shows all the experimental results. It can be found all models have slightly higher accuracy on the contour enhanced dataset than on the original dataset, and RMA has higher accuracy than ResNet-50 and other models on both two datasets. The results verified that contour accentuation could improve the classification accuracy generally. Moreover, the results also indicated that, for RMA, contour accentuation and multiple scale attention mechanism could complement each other well and improve the accuracy further.

The confusion matrices for misclassified images are shown in Figure 6. The red number indicates the number of errors in image classification. Rows are predicting types, and column types are reals. The matrices include 678 images of 12 classes in the test dataset. The results show that the misclassification is mainly caused by the subtle differences between classes such as “DCZYTHFB” and “ZADCCMFB,” “GXJXFB” and “SDGXCMFB,” “GXJXFB” and “ZADCCMFB,” “SDGXGXFB” and “ZAZVXCMFB,” and “ZADCCMFB” and “DCZYTHFB.” The comparison results of Figures 6(a) and 6(b) show that the RMA model with contour accentuation reduces the total number of misclassified images from 14 to 7, with reduction in most types. The results show that contour accentuated model is more significant for classification results of navigation marks with different contours such as “GXJXFB” and “ZVXZYTHFB,” “SDGXGXFB” and “ZUXZYFB,” “SDGXGXFB” and “ZVXZYTHFB,” “ZADCCMFB” and “ZUXTSFB,” “ZUXZYFB” and “ZUXWXSYFB,” and “ZVXZYTHFB” and “ZVXWXSYFB.”

In order to verify the effect of the contour accentuation mechanism, the abovementioned navigation mark types with more improved accuracy were investigated further. As shown in Figure 7, the extracted contours of these navigation marks are clear, and there are few environment noises; the enhanced features by contour will help the RMA model to pay more attention on navigation mark; furthermore, they will enlarge the distinguish between different types. This explained why the contour accentuation can improve the classification accuracy of navigation marks.

5. Application of Intelligent Recognition of Navigation Marks

Based on the RMA model with contour accentuation, an intelligent recognition system of navigation marks was developed for Changjiang Nanjing Waterway Bureau. The system has an architecture of front-end and back-end separation shown in Figure 8; the front-end focuses on client page (WEB or APP) rendering. In contrast, the back-end focuses on business logic, and they interact through the interface (REST APIs).

In the back-end, there are three platforms which are deployed independently but interacted with each other through an interface. The web service platform is developed and deployed based on the framework of Spring Boot. It interacts with front-end directly, accepts and transforms the image of request into required size and format, then sends it to the recognition module and gets recognition result. In the recognition module, TensorFlow Serving is used to deploy the RMA models, and a REST API for navigation mark recognition based on the latest model is exposed to the web service platform. In the training module, the RMA will be trained periodically in TensorFlow. Simultaneously, the dataset was enlarged by the image collection process of the digital waterway system (the production system for channel maintenance in Nanjing Waterway Bureau), and the model will be saved with a version number, updated, and loaded into TensorFlow Serving.

The front-end can be a variety of clients, Web, APP, or WeChat Mini Program. The clients accept the uploaded image, send it to the web service platform, and get the responses of recognition and rendering them on the page as Figure 9 showed.

6. Conclusions and Future Work

This paper applies deep learning technology to study the navigation mark image recognition. It proposes a navigation mark classification model based on the combination of multiscale attention mechanism and contour accentuation. The effect of multiple scale attention mechanisms for improving classification accuracy has been validated in our previous works about the RMA model. This paper mainly focused on the impact of contour accentuation. Experimental results on 10260 navigation mark images showed that by enhancing the contour of the object, contour accentuation could improve the image classification accuracy of most general classification models. It also improves the RMA model well and increases the classification accuracy from 95.98% to 96.53%.

Based on the improved classification model, this paper further developed an intelligent service system for the recognition of navigation marks. The system has a flexible architecture based on front-end and back-end separation. It is connected with the digital waterway system to obtain a continuously updated dataset and then realized an automatic navigation mark recognition service including dataset preparation, model training, model deployment, and model update.

In the future, the value of in the proposed contour accentuation algorithm could be optimized for different light conditions. In addition, in order to further enhance the accuracy of navigation mark image classification, the adversarial neural network can be studied and applied to the fine-grained classification of navigation mark images.

Data Availability

Access to the image dataset of navigation marks used to support the findings of this study is restricted, because it belongs to a third party, the Changjiang Nanjing Waterway Bureau of the People’s Republic of China.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was partially supported by the Fundamental Research Funds for the Central Universities under Grant 3132019400. Thanks are due to the Changjiang Nanjing Waterway Bureau of the People’s Republic of China for providing the image dataset of navigation marks and application scenario of the research results. This work was also partially supported by the National Natural Science Foundation of China (Nos. 61906043, 61902313, 61902072, 62002063, 61877010, 11501114, and 11901100), the Fujian Natural Science Funds (Nos. 2020J05112, 2020J05111, 2020J01498, and 2019J01243), the Funds of Education Department of Fujian Province (No. JAT190026), and the Fuzhou University (Nos. 0330/50016703, 0330/50009113, 510930/GXRC-20060, 510872/GXRC-20016, 510930/XRC-20060, 510730/XRC-18075, 510809/GXRC -19037, 510649/XRC-18049, and 510650/XRC-18050).