Abstract

To prevent and control public transport safety accidents in advance and guide the safety management and decision-making optimization of public transport vehicles, based on the forewarning and other multisource data of public transport vehicles in Zhenjiang, holographic portraits of public transport safety operation characteristics are constructed from the perspectives of time, space, and driver factors, and a prediction model of fatigue driving and driving risk of bus drivers based on BP neural network is constructed. Finally, model checking and virtual simulation experiments are carried out. The result of the research shows that the driver’s fatigue risk during the period of 7 : 00-8 : 00 am is much higher than other periods. When the bus speed is about 30 km/h, the driver fatigue forewarning events occur the most. Drivers aged 30–34 years have the largest proportion of vehicle abnormal forewarning, drivers aged 40–44 years have the largest proportion of fatigue forewarning events, and drivers with a driving experience of 15–19 years have the largest overall proportion of various forewarning events. When the vehicle speed range is (18, 20) km/h and (42, 45) km/h, the probability of fatigue driving risk and driving risk forewarning increases sharply; and when the vehicle speed is lower than 17 km/h or 41 km/h, the probability of fatigue driving risk and driving risk forewarning, respectively, is almost zero. The probability of fatigue forewarning during low peak hours on rainy days is about 30% lower than that during peak hours. The probability of driving forewarning during flat peak hours is 15% higher than that during low peak hours and about 10% lower than that during peak hours. This paper realized for the first time the use of real forewarning data of buses in the full time, the whole region, and full cycle to carry out research. Related results have important theoretical value and practical significance for scientifically guiding the safety operation and emergency management strategies of buses, improving the service level of bus passenger transportation capacity and safety operation, and promoting the safety, health, and sustainable development of the public transportation industry.

1. Background Introduction

At present, China mainly evaluates the safety of buses based on the incidence of traffic accidents. The evaluation indicators and analysis methods are relatively single, and there is still a lack of accurate control, effective prevention, and emergency management countermeasures. Since 2019, with the integration and system development of BDS, video, radar, and other technologies, buses in some Chinese cities have installed vehicle driving safety forewarning systems, enabling holographic perception, dynamic monitoring, and risk reminders of the bus operation process [1]. By acquiring the historical data of vehicle forewarning of Zhenjiang Public Transport Company in Jiangsu Province of China, this paper realized for the first time the use of real forewarning data of buses in the full time, the whole region, and full cycle to carry out research. This paper excavates the general rules and main hidden dangers of vehicle forewarning events and carries out objective analysis and situation prediction of bus operation risks [2]. Relevant research conclusions have important practical significance for improving the safe operation of buses, carrying out corresponding optimized dispatching [3] and emergency management, eliminating hidden dangers of bus operation, and improving and promoting the convenience, safety, and sustainable development of public transportation [4].

Many studies believe that the fatigued driver and driver driving state is the most important factor affecting urban public transport safety, and the driver state is affected by the driver’s attributes, external environment, and other aspects. Relevant scholars have researched related factors affecting the safe operation of buses, mainly as shown in Table 1. By studying the related factors that affect the severity of bus collisions [5], it can be seen that the factors such as start inhibition, automatic door opening, bus materials, and internal structure are relatively related to bus safety. Researches on perception and driving behavior [6] have shown that drivers who have experienced accidents are more likely to have collision accidents in the future. By studying the factors affecting road traffic accidents, it is known that the advanced driver assistance assessment system (ADAS) [7] can provide drivers with safety support and help avoid distractions. Besides, the vehicle anticollision forewarning strategy [8, 9] is formulated through the study of the driver’s reaction time when a collision occurs. The Palm probability distribution method [10] is used to study road accident risk under different weather conditions. The research results show that the accident risk probability of snow is higher than that of rain. Among them, the greater the precipitation intensity, the higher the relative accident risk probability. Secondly, the logistic regression model [11] is used to study the correlation between the driver’s age, gender, vehicle, road environment, and other factors and the severity of traffic accidents. The results show that the road infrastructure conditions and the driver’s age have a significant impact on the severity of road traffic accidents. By using a logarithmic linear model [12] to study the impact of time factors on the severity of bus driver collision injuries, the results show that driving in the late night or early morning will increase the risk of serious injury to bus drivers.

Many scholars have researched vehicle safety characteristics and management requirements, mainly as follows. Firstly, utilizing the historical traffic data in the USA from 2005 to 2009, the potential risk factors of public transportation safety accidents are summarized [13], and it is found that the driver is the main factor in the occurrence of public transportation safety accidents. Studies have shown that as the driver’s attention changes, there are significant differences in eye movement and gear operation [14]. At the same time, the characteristics of steering wheel operation and the characteristics of vehicle movement are also related to the characteristics of vehicle movement during lane changes [15]. Secondly, algorithms and models are used to analyze the causes and predictions of traffic accidents. By using the decision tree algorithm [16] to study the causes of vehicle collision accidents, the results show that human factors are the most important factor causing traffic accidents. According to the research results of the driver’s steering characteristics, an evaluation model used to improve the steering stability of the car is established [17]. In [18], the backpropagation neural network model and generalized linear mixed model were used to analyze multisource traffic data, which showed that flow plays an important role in vehicle collision prediction. Thirdly, a variety of models were built to better predict vehicle safety. A traffic accident model based on collaboration theory [19] was proposed to analyze accident scene data by combining driving comfort thresholds. The dynamic prediction model of vehicle operation trajectory based on vehicle trajectory data [20] can calculate the suspicious collision position of the vehicle. The perceived safety of self-driving cars and their application value in transportation and road safety [21] were derived due to the analysis of the driving habits of 1,205 regular vehicle drivers. A hidden Markov model [22] is proposed by analyzing a large amount of traffic trajectory data, and it is verified that the model can better predict the occurrence of traffic conflicts.

Related scholars have also carried out many studies on the safe operation and management of buses. Firstly, preventive measures [23] are proposed through the identification and risk analysis of bus drivers’ dangerous behaviors. Risk assessment and analysis of hazard sources of road traffic safety risks are carried out through the application of the road traffic safety risk index evaluation method [24], and a corresponding road traffic safety risk monitoring index system is constructed. Besides, aiming at the main problems of safety management, traffic safety management countermeasures [25] are proposed to reduce driver unsafe behavior, improve vehicle safety level, reduce the accident rate, and ensure the safe operation of buses. Secondly, the safety of the driver’s visual perception of dangerous areas is proposed by analyzing the eye movement data of changing lanes, cornering driving, and straight driving [26]. An evaluation method based on the driver’s visual perception of safety indicators is established. In [27], a psychological fatigue evaluation system for bus drivers was constructed, and the authors proposed targeted suggestions to reduce driving fatigue. A method [28] that can evaluate the driver’s potential danger prediction ability and the rationality of the system was designed. In [29], a method to analyze the safety operation of buses was proposed based on big trajectory data. In particular, research on the clustering characteristics of road safety factors [30] such as driver, vehicle, road, and environment under different accident types was conducted. According to the actual situation of vehicle safety prediction, different research methods are proposed to make the prediction results more accurate. Thirdly, in the establishment of the bus speed model, parameters such as bus flow and bus ratio [31] were introduced, and a bus speed control system was designed, which realized the dynamic monitoring of the vehicle running speed. The driver evaluation system based on the principal component analysis method [32] was established through the analysis and investigation of the questionnaire information of bus drivers. The research results show that the driver’s driving habits and individual characteristics have a significant impact on driving behavior. Finally, by introducing the practice of traffic congestion charging in Singapore and London [33], it is concluded that public transportation congestion charging should be based on scientific planning and supporting the sustainable development of public transportation. Besides, the analysis method and test method of the index system [34] are used to classify the sustainable development of urban transportation. The evaluation index system and evaluation model of urban transportation sustainable development based on the theory of urban transportation sustainable development are established. This research provides a new perspective on urban sustainable development research.

In summary, objective, real, comprehensive, and effective historical operation data are a prerequisite for the accurate study of bus safety operation situation and risk management. The existing research studies mainly use vehicle accident data, vehicle trajectory data, laboratory data, and questionnaire survey data to carry out related research on vehicle safety characteristics, dangerous driving behavior, or risk situation. Because of the contingency of vehicle accidents and the incompleteness of data collection, it is difficult to realize the comprehensive analysis of bus operation state and the accurate prediction of safety risks [35]. This paper will overcome the shortcomings of the existing research, make full use of the safety forewarning system installed on public vehicles, obtain the real mass historical data of all kinds of public transport forewarning, carry out model construction and simulation analysis, provide auxiliary decision-making for bus operation, dispatching, and safety management, and promote the healthy, green, and sustainable development of urban public transportation [36].

2. Data Acquisition Process

The bus forewarning system installed by Zhenjiang Public Transport Company integrates various technologies such as ADAS yaw forewarning [37], fatigue driving video analysis, and BDS terminal [38]. It can realize the real-time upload of vehicle operating data and ensure the accuracy and reliability of the data. The forewarning equipment is shown in Figure 1. This research makes full use of the vehicle forewarning equipment and vehicle forewarning data platform of Zhenjiang Public Transport Company to obtain historical operating data of the bus.

2.1. Forewarning Equipment
2.1.1. ADAS Vehicle Yaw Forewarning System

ADAS stands for the Advanced Driving Assistance System. The system uses a camera located on the windshield to monitor the lane markings on the road ahead. When the system detects that the vehicle has deviated from the lane [39], it will issue a forewarning to the driver.

2.1.2. Fatigue Driving Analysis Equipment

The fatigue driving analysis equipment uses advanced AI video analysis technology [40] to accurately recognize the driver’s facial characteristics. At the same time, it can record and warn the driver’s fatigue characteristics.

2.1.3. BDS

The artificial satellite’s multifrequency positioning signal can be accepted by the system to achieve precise positioning. The system can calculate the distance of the vehicle ahead, consider the relative speed of the vehicle, determine the possible collision time, and issue a forewarning to the driver.

2.2. Forewarning System and Data Platform

The forewarning system can realize real-time monitoring and summary of vehicle information [41], mainly including 7 forewarning types: eyes closed, yawn, glance about, lane departure, rapid acceleration, rapid deceleration, and forward collision. This paper classifies the types of forewarnings, classifies eyes closed, yawn, and glance as driver fatigue forewarnings, and classifies rapid acceleration, rapid deceleration, forward collision, and lane departure as vehicle abnormal state forewarnings, as shown in Figure 2.

Data platforms mainly include current online, forward forewarning, driver forewarning, the total number of abnormalities, vehicle distribution, forewarning type distribution, forewarning occurrence trend, and other data. This paper obtained 297,189 forewarning data from November 2019 to March 2020 through the forewarning platform system of Zhenjiang Public Transport Company. The original forewarning data mainly include information such as license plate number, forewarning time, forewarning type, forewarning level, forewarning speed, latitude and longitude coordinates of forewarning points, location of forewarning points, driver names, and other information.

2.3. Research Period and Weather Conditions

Since the system started trial operation at Zhenjiang Public Transport Company in October 2019, this paper selected a 27-day official operation period from November 2019 to March 2020 to conduct research, as shown in Table 2.

2.4. Data Cleaning

Since the actual data obtained may have data missing, disordered format, abnormal data, and other phenomena, cleaning the data is an indispensable link. The principle of data cleaning [42] should ensure the accuracy, completeness, consistency, uniqueness, timeliness, and effectiveness of data. There are mainly the following 4 methods [43]:(1)Supplement incomplete data(2)Detection and resolution of error values or abnormal values(3)Detection and elimination of duplicate records(4)Detection and resolution of data inconsistencies

The forewarning data set obtained and the driver information data set are associated and fused with the data table [44]. Finally, 297189 forewarning samples are associated with 1435 driver data samples. The research data samples after the fusion are shown in Table 3.

3. Analysis of the Bus Forewarning Characteristics from Multiple Perspectives

There are many factors related to the bus safe operation, and different factors have different effects on bus forewarning [45, 46]. To study the bus forewarning characteristics in different weather conditions, sections, driver characteristics, and periods, this paper makes a holographic portrait of the bus operation from multiple perspectives such as weather, time, space, speed, and driver characteristics distribution [47], as shown in Figure 3. Make full use of various multisource forewarning data to study the influence mechanism of various factors on bus forewarning.

3.1. Weather Distribution

Vehicle operation safe is closely related to bad weather conditions [48]. This paper compares and analyzes the forewarning data of buses according to the four weather conditions of sunny, rain, fog, and snow. The results are shown in Table 4.

From the analysis of Figure 4, it can be seen that the proportion of forewarnings on sunny and foggy days is relatively large. The total proportion of forewarnings on sunny days reaches 31.29%. It is mainly due to the glare of the sun and dizziness. Drivers are easily sleepy and fatigued. The total proportion of forewarnings on foggy days reached 31.27%, mainly due to low air visibility, obstructed line of sight, and low road adhesion coefficient. Drivers need to maintain a high degree of attention for a long time and are prone to fatigue.

The vehicle’s abnormal state forewarnings are greater than the number of driver fatigue forewarnings under all weather conditions. The vehicle’s abnormal state forewarnings mainly refer to situations such as rapid acceleration, rapid deceleration, forward collision, and lane departure, which are not only closely related to the driver’s bad driving behavior but also affected by road facilities, traffic environment, and other restrictive factors, resulting in a higher proportion.

3.2. Time Distribution

According to the forewarning data of buses, statistics are summarized by time, and the result is shown in Figure 5. Analysis shows the following:(1)Whether it is the forewarnings of driver fatigue state or the forewarnings of vehicle abnormal state, it shows a three-stage change rule of rising, local fluctuations, and falling over time.(2)The first stage is a rapid rise period. At this time, the driver fatigue forewarning period is [4 : 00, 7 : 00], and the vehicle abnormal state forewarning period is [4 : 00, 9 : 00], which is 2 hours longer than the fatigue forewarning, mainly due to complicated traffic environment and other factors restrict.(3)The second stage is a period of partial fluctuation. The driver fatigue forewarning and the vehicle abnormal state forewarning period are generally similar, within the range of [7 : 00, 19 : 00], and the vehicle abnormal state forewarning period is [9 : 00, 16 : 00].(4)The third stage is the partial descent period, and the overall is divided into 2 changes. During the period of [16 : 00, 20 : 00], the reduction range is relatively large; during the period of [20 : 00, 22 : 00], the range of change is relatively small.(5)The vehicle abnormal state forewarning is in [10 : 00, 11 : 00] and [16 : 00, 17 : 00] peaks, appearing in the two periods, and the driver fatigue forewarning is in the peak hours at [06 : 00, 09 : 00].

3.3. Spatial Distribution

The driver fatigue forewarning and vehicle abnormal state forewarning data are imported into the electronic map of Zhenjiang, and the locations corresponding to the relevant forewarning samples are all mapped to the map, as shown in Figure 6.

Using the kernel density analysis method [49], the density distribution corresponding to the driver fatigue forewarning is obtained. From Figure 7, the following holds:(1)The two forewarning types are generally consistent in the spatial distribution of urban road networks, but there are large differences in local areas, as shown in Figure 7.(2)The driver fatigue forewarning density center is concentrated on Zhongshan Road (Jiuhuashan Road-Jiefang Road), as shown in Figure 7(a). Zhongshan Road is the main road of Zhenjiang, and most of it is commercial land nearby. This area has a high density of people and traffic, which can easily cause vehicle congestion. Drivers need to maintain a high degree of tension for a long time in this complex traffic environment. It is prone to fatigue characteristics.(3)The forewarning of vehicle abnormal state is mainly concentrated at the intersection of Dongwu Road and Mengxi Road, which is located in the Beigu Mountain Scenic Area, as shown in Figure 7(b). Beigu Mountain is an AAAAA-level tourist scenic spot in Zhenjiang, Jiangsu. The traffic volume of buses, private cars, tourist buses, and walking tourists is large, and the surrounding road network traffic congestion is serious. In this traffic environment, drivers are easy to make emergency operations and induce forewarning of abnormal vehicle status.

3.4. Speed Distribution

Select the period from December 2019 to January 2020 as the research period, analyze the corresponding vehicle speed when various forewarning events occur, obtain a total of 297189 data samples, and make summary statistics according to the speed, as shown in Table 5. Study the speed characteristic law under different forewarning types, as shown in Figure 8.

It can be seen from Figure 8 that as the speed increases, the number of driver fatigue forewarnings and the number of vehicle abnormal state forewarnings both fluctuate to a certain extent. The number of driver fatigue forewarnings reaches the peak at 30 km/h, and the number of driver forewarnings is less when the speed is less than 15 km/h or greater than 70 km/h. Since most buses operate in urban areas, the speed of running on urban roads is not high. When the speed of the vehicle is greater than 60 km/h, the number of driver fatigue forewarning and vehicle abnormal state forewarnings are kept at a low level. Both driver fatigue forewarning and vehicle abnormal state forewarnings have some abnormal values. The corresponding speed when the driver fatigue state forewarning occurs is generally greater than the speed when the vehicle abnormal state occurs.

To facilitate the analysis of the vehicle speed distribution characteristics under different forewarning density areas, this paper divides the forewarning occurrence areas into three types: low, medium, and high. The correlation analysis of the speed when the forewarning occurs in the three regions is carried out, and the characteristic law of the speed is studied, as shown in Figures 9 and 10.

It can be seen from Figure 9 that the low forewarning density area has the largest proportion when the speed is 30 km/h-39 km/h, and the smallest proportion when the speed is below 20 km/h. In areas with low forewarning density such as suburban areas, there are fewer vehicles, smooth roads, and a small number of forewarnings. The number of forewarnings is the highest when the speed of buses reaches about 35 km/h. The medium forewarning density area has the largest proportion when the speed is 30 km/h-39 km/h, and the smallest proportion when the speed is above 60 km/h; the high forewarning density area has the largest proportion when the speed is 20 km/h-29 km/h, and the smallest proportion when the speed is above 60 km/h. In high forewarning density areas such as the city center, due to traffic congestion, the speed of buses is slow, and a large number of forewarnings are generated. The number of forewarnings reaches a peak when the speed is about 25 km/h. Through observation and comparison, it can be seen that, due to different road congestion conditions, the speed of vehicles in high forewarning density areas is relatively low, and the speed of vehicles in low forewarning density areas is relatively high.

To be able to analyze the number of forewarnings that occur per unit area in each region more reasonably, this paper proposes the definition of unit forewarning density, that is,

It can be seen from the analysis of Figure 10 that the forewarning frequency of each speed in the low warning density area is low. In the medium forewarning density area, the forewarning frequency reaches the peak when the speed is 30 km/h–39 km/h. In the high forewarning density area, when the speed is 20 km/h–29 km/h, the forewarning frequency is highest.

3.5. Driver Characteristics

This paper takes 324 drivers of Zhenjiang Public Transport Company as the research object, analyzes the influence of drivers’ age, driving years, gender, and educational background on the forewarning of buses [50], and conducts research on the distribution law of forewarning under the action of various factors.

3.5.1. Driving Years

The  statistical analysis of the forewarning data of bus drivers of different driving years is shown in Table 6; in terms of the total number of forewarnings, drivers with driving experience between 15 and 19 years of the four driving years have the most forewarnings, accounting for about 35.66% of the total forewarnings, and; drivers with a driving experience of fewer than 5 years have the least number of forewarnings, accounting for about 0.68% of the total number of forewarnings.

As shown in Figure 11, drivers with a driving experience of 15 to 19 years have the largest number of driver fatigue forewarnings and vehicle abnormal state forewarnings. Drivers in this age group are more daring after having certain driving experience, have more aggressive driving styles, and are prone to aggressive operations, so they are more prone to forewarning of abnormal vehicle conditions.

3.5.2. Age

The statistical analysis of the forewarning data of bus drivers of different ages is shown in Table 7. In terms of the total number of forewarnings, drivers in the 40–44 age group of the eight age groups have the most forewarnings, accounting for about 23.13% of the total forewarnings. Drivers under the age of 25 have the least proportion, about 0.58%.

As shown in Figure 12, from the perspective of different forewarning types, drivers in the 40–44 age group have the largest number of fatigue driving forewarnings, and drivers in the 30–34 age group have the largest number of vehicle abnormal state forewarnings. This is because 30–34 years old drivers have a more aggressive driving style, are more daring after having certain driving experience, and are easier to make aggressive operations. Drivers of different ages are generally more likely to have forewarnings of abnormal conditions than driver fatigue forewarnings. This is related to the fact that drivers are more aggressive in driving operations on the premise that they are safe.

3.5.3. Gender

Statistical analysis of the forewarning data of bus drivers of different genders is shown in Table 8; from the total number of forewarnings, male drivers have a much higher probability of having forewarnings than females. Among them, males account for approximately 91.44%, and the proportion of driver fatigue forewarning is about 91.26%, which has a certain relationship with the high proportion of male drivers in public transportation companies. As shown in Figure 13, comparing the number of warnings for male and female drivers, it can be seen that the number of abnormal state forewarnings for drivers is higher than the number of fatigue forewarnings, while the numbers of fatigue forewarnings and abnormal state forewarnings for female drivers are very similar.

3.5.4. Degree

Statistical analysis of the forewarning data of bus drivers with different educational levels is shown in Table 9; drivers with junior high school and below have the most forewarnings, accounting for about 64.28% of the total number of forewarnings; drivers with high school education account for the least, about 12.98% of the total number of forewarnings. As shown in Figure 14, as the driver’s educational background changes, the number of driver fatigue forewarnings and the number of vehicle abnormal state forewarnings show roughly the same changes.

4. Research on Risk Prediction of Public Transportation Safety Based on BP Neural Network Model

BP neural network is a concept proposed by Rumelhart and McClelland et al. It is a multilayer feedforward neural network with error backpropagation. The model has arbitrary complex pattern classification ability and excellent multibit function mapping ability and is suitable for complex nonlinear systems such as bus safety risk prediction.

4.1. Basic Principle
4.1.1. Structure of BP Neural Network

The topological structure of the BP neural network is shown in Figure 15, is the input vector of BP neural network, is the output vector of BP neural network, and and is the weight of BP neural network.

BP neural network can be regarded as a nonlinear function, and the network input value and predicted value are the independent variables and dependent variables of the function. When the number of input nodes is and the number of output nodes is , BP neural network expresses the functional mapping relationship from independent variables to dependent variables [51].

4.1.2. Error Backpropagation Algorithm

As a multilayer feedforward neural network, BP neural network is characterized by signal forward transmission and error backpropagation. In forward transmission, the input signal is processed layer by layer from the input layer through the hidden layer until the output layer. The neuronal states of each layer only affect the next layer of the neuron state. If the output layer cannot get the expected output, it will switch to backpropagation and continuously adjust the network weights according to the prediction error so that the predicted value of the model will converge gradually. The error backpropagation algorithm of the BP neural network [52] can be expressed as follows:

In formula (2), is the learning signal of layer l, is the learning signal of the output layer, t is the label value, y is the predicted value, Xl is the output signal of layer l, XL is the output signal of the penultimate layer, Wl is the weight vector between the layer l and l + 1, and WL is the weight vector between the penultimate layer and the last layer.

The weight adjustment function of the BP neural network [53] is as follows:

In formula (3), is the adjustment value of the weight vector between the penultimate layer and the last layer of the BP neural network, is the adjusted value of the weight vector between the layer l and l + 1, is the learning rate, and is the cost function.

4.2. Model Building
4.2.1. Establish a Network Structure

Determining the network structure is an important part of constructing a BP neural network, which directly determines the training speed and prediction accuracy of the model. Generally speaking, the more hidden layers and nodes in the network topology structure, the stronger the generalization ability of the model, and the higher the accuracy of the model. However, the excessively complex network will lead to a slow training rate of the model and the more prone to overfitting; too simple network topology will make it difficult to establish a complex mapping relationship between feature variables and predictions, and it is difficult to achieve good prediction results. Based on experience and repeated attempts, this paper confirms that the prediction results are good with the double hidden layer structure with node number of 100 and 50, respectively [54].

4.2.2. Selection of Learning Rate

The learning rate is an important parameter in the process of model optimization, which determines the speed of model learning and the convergence effect of the model. Too much learning rate will cause the model accuracy to oscillate and be difficult to converge. Too small learning rate will lead to slow model adjustment. In this paper, 0.01 is selected as the learning rate of the model. At this time, the model converges faster and the oscillation amplitude is smaller [55].

4.2.3. Activation Function Selection

The activation function in the BP neural network can increase the nonlinearity of the neural network so that the model has sufficient complex function mapping capabilities, and the applicability of different activation functions is also different [56].

(1) tanh Function. In this paper, the tanh function is selected as the transfer function of the model. The  tan h function is the hyperbolic tangent function. It can maintain the nonlinear monotonic rise and fall relationship on the output and input, which conforms to the gradient solution requirements of the BP network and has good fault tolerance and bounds. Besides, compared with the sigmoid activation function, tanh function alleviates the problem of gradient disappearance to a certain extent, and its formula is as follows:

In formula (4), tan h (x) is the function value of the hyperbolic tangent function, x is the input variable, and e is the natural constant.

(2) Softmax Function. In this paper, the softmax function is selected as the classifier of the model output. The softmax function is the normalized exponential function, which can normalize the gradient logarithm of the finite item discrete probability distribution. Its characteristic is to normalize the vector, highlight the maximum value, suppress other components far below the maximum value, and visually show that the sample is a certain type of confidence; the formula is as follows [57]:

In formula (5), X is the input vector; softmax (X)i is the i-th function value of the vector softmax function for the vector X; xi and xj are the i and j values of the vector X, respectively; n the length of vector X; and the meaning of e is the same as above.

4.2.4. Cost Function Selection

The cost function is mainly divided into two types: quadratic cost function and cross-entropy cost function [58]. The quadratic cost function is mainly used for regression problems. For the classification problems mentioned in this paper, the cross-entropy cost function is generally selected (labels are processed by one-hot encoding), and the formula is as follows:

In formula (6), E is the cost function value, t is the true label value, and y is the predicted value of the model.

Besides, the cross-entropy cost function also avoids the quadratic cost function: when the error is larger, the gradient of the activation function is smaller, resulting in slow convergence.

4.2.5. Data Preprocessing

(1) Normalization. To reduce the influence of the initialization value and accelerate the convergence speed of the BP neural network, the normalized preprocessing method can be generally adopted. In this paper, the maximum-minimum method is used to normalize the continuous characteristic variables [59], and the formula is as follows:

In formula (7), is the minimum value of the feature in all samples, is the maximum value of the feature in all samples, and is the eigenvalue after normalization.

(2) One-Hot Encoding. To digitize classification and discrete variables into the model, it is necessary to map such features to Euclidean space. One-hot encoding is one of the most effective ways to achieve this function. One-hot encoding is also known as one-bit effective encoding and uses multibit status registers to encode multiple states: for a feature, if it has m values, it becomes m binary features after one-hot encoding.

4.3. Construction and Application of the Prediction Model

According to the BP neural network model constructed in Section 4.2, 13 features such as weather conditions, driver data, driving period, and driving speed are taken as the input of the model, and the alarm of the driver is taken as the output of the model. The network topology structure of “13-100-50–2” is adopted, and the tanh function and softmax function are used as the transfer function and activation function of the model, respectively. The cross-entropy cost function is selected as the cost function of the model, and after repeated attempts, the learning rate of the model is 0.01, which ensures the stable convergence of the model. The specific form of the model is shown in Figure 16 [60].

4.3.1. Fatigue Driving Prediction Model

(1) Investigation of Convergence and Dispersion. Randomly select 2/3 of the samples as the training set and the remaining 1/3 as the test set. Perform 500 cycles of iterative training on the BP neural network. The learning curve of the fatigue driving prediction model is shown in Figure 17 [61].

As shown in Figure 17, the analysis shows the following: the fatigue driving prediction model has good convergence, and the learning curve tends to be flat around the 100th training cycle; during the whole 500-cycle iteration process, there was no large-scale oscillation and the fluctuation amplitude gradually decreased with the training cycle; the model performs well on the test set and can still reach an accuracy of 79% based on using a large number of static features; the model has no obvious overfitting in the training process, and there is only a 0.0034 accuracy difference between the set and the test set.

(2) Sample Inspection. Since the sample label adopts the form of one-hot encoding, with position 0 representing forewarning and position 1 representing no forewarning. Therefore, a single sample error can be obtained by randomly selecting the predicted value of the model with 200 samples and subtracting the true value, as shown in Figure 18.

It can be seen from Figure 18 that among the randomly selected prediction samples, the number of samples with correct forewarning accounts for 79%, 18% of the false positives are forewarnings, and only 3% of the samples are falsely reported as no forewarnings, which shows that the whole model is partial to safety and has high prediction accuracy under the application of state prediction.

4.3.2. Driving Risk Prediction Model

(1) Investigation of Convergence and Dispersion. Similar to the fatigue driving prediction model, 2/3 of the samples are randomly selected as the training set, and the remaining 1/3 are used as the test set. The BP neural network is trained iteratively for 300 cycles. The learning curve of the driving risk prediction model is shown in Figure 19.

From Figure 19, the following is obtained: the driving risk prediction model has good convergence. The learning curve tends to be flat around the 120th training cycle, but it fluctuates greatly from the 170th to the 210th cycle, which may be caused by the transformation of the model from the local optimal solution to the global optimal solution; The model reaches the highest state after 300 cycles of the iterative process, and the convergence speed was faster than that of fatigue driving risk forewarning model. The model is slightly weaker than the previous model in the test set, but it can still achieve a higher accuracy rate of 78%. The model has no overfitting phenomenon in the training process, and the performance of the model in the test set is even better than that in the training set.

(2) Sample Inspection. Randomly select the predicted value of 200 samples from the model, and subtract the true value from it to get a single sample error, as shown in Figure 20.

As can be seen from Figure 20, among the randomly selected prediction samples, the number of samples with correct forewarning accounts for 78%, 14.5% of the false positives are forewarnings, and 7.5% of the samples are falsely reported as no forewarnings. The model is generally safe and has high prediction accuracy.

4.4. Research on Simulation of Risk Probability Prediction Based on the BP Model
4.4.1. Typical Driver Selection

This paper conducts a statistical analysis of 1565 drivers of the Zhenjiang Public Transport Company. For continuous features such as driving age and age, the mean value (16, 39) is used as the feature value of the virtual driver; for the classification features such as educational background and gender, the mode (high school, male) is taken as the characteristic value of typical drivers. An example of normalized virtual driver sample data is shown in Table 10.

4.4.2. Risk Probability Analysis during Peak Hours

The fatigue driving prediction model constructed in this paper is used to calculate the fatigue confidence of the virtual driver’s sample data under different weather, periods, and speeds. The simulation results are shown in Figure 21.

It can be seen from the graph analysis that whether it is fatigue driving forewarning or driving risk forewarning, the probability of occurrence is positively increasing with the driving speed value; when the vehicle speed range is (18, 20) km/h and (42, 45) km/h, the probability of fatigue driving risk forewarning and driving risk forewarning, respectively, raises sharply; when the vehicle speed is lower than 17 km/h or 41 km/h, the probability of fatigue driving risk forewarning and driving risk forewarning, respectively, occurring is almost zero; under the same speed conditions, the probability of fatigue forewarning in snowy days is greater than that of foggy days, rainy days, and sunny days and the probability of driving forewarning in foggy days is greater than that of snowy days, rainy days, and sunny days.

4.4.3. Risk Probability Analysis during Low Peak Hours

According to Figure 22, based on different speed conditions, the change characteristics of fatigue driving risk forewarning and driving risk forewarning probability are generally consistent with those in peak hours, indicating that high attention should still be paid to safe driving of vehicles in low peak hours; under the same speed conditions, the probability of fatigue forewarning in rainy days is about 30% lower than that in peak hours, and the difference in other weather conditions is small.

4.4.4. Risk Probability Analysis during Flat Peak Hours

According to the analysis in Figure 23, the change characteristics of fatigue driving risk forewarning and driving risk forewarning probability are generally consistent with peak hours and flat peak hours; under the same speed conditions, the probability of driving forewarning in four weather conditions is 15% higher than that in low peak hours and 10% lower than that in peak hours; at the same driving speed, the sequence of driving risk probability is foggy, snowy, rainy, sunny, and speed, indicating that driving risk is significantly related to weather conditions.

5. Conclusions

5.1. Research Result

This paper selects 297189 various types of forewarning data of Zhenjiang buses to carry out the analysis of hidden risks and characteristic laws. The distribution characteristics of bus forewarnings of different weather conditions, speeds, periods, spaces, and driver characteristics are studied. We get the following conclusions: firstly, on sunny days from 7 : 00–8 : 00 am in the morning, the probability of driver fatigue forewarning is greatest. On foggy days from 11 : 00 am–12 : 00 noon, the probability of vehicle abnormal state forewarning is the greatest. Secondly, when the vehicle is running at 30 km/h, the proportion of driver fatigue forewarning is the largest. Urban core areas are prone to trigger forewarning of driver fatigue, while tourist attractions are prone to trigger vehicle abnormal forewarning. Finally, drivers with 15–19 years of driving experience have the largest proportion of fatigue forewarnings and vehicle abnormal forewarnings. Drivers aged 40–44 years have the largest proportion of fatigue forewarnings. Drivers aged 30–34 years have the largest proportion of vehicle abnormal forewarnings; male drivers have the largest proportion of fatigue forewarnings and vehicle abnormal forewarnings.

The fatigue driving and driving risk prediction model based on BP neural network are constructed, and simulation analysis is performed. The results show that, at the same driving speed, the sequence of occurrence of driving risk probability is foggy, snowy, rainy, and sunny days. During peak hours, the probability of fatigue forewarning in snowy days is greater than that of foggy, rainy, and sunny days; the probability of driving forewarning in foggy days is greater than that of snowy, rainy, and sunny days. When the vehicle speed range is (18, 20) km/h and (42, 45) km/h, the probability of fatigue driving risk and driving risk forewarning increases sharply; when the vehicle speed is lower than 17 km/h or 41 km/h, the probability of fatigue driving risk and driving risk forewarning, respectively, is almost zero. The probability of fatigue forewarning during low peak hours on rainy days is about 30% lower than that during peak hours. The probability of driving forewarning during flat peak hours is 15% higher than that during low peak hours and about 10% lower than that during peak hours.

5.2. Practical Implications

The relevant research conclusions of this paper are of great practical significance for improving the passenger transportation capacity of buses and enhancing the management level. At the same time, it can improve the auxiliary decision-making for the safe operation and emergency management of buses, promoting the sustainable and healthy development of urban public transport safety.

5.3. Limitation and Future Research Scope

This study was not free from limitations. Firstly, we selected 297189 samples from November 2019 to March 2020 with a total of 27 days, and the sample size is relatively small. Secondly, when studying the forewarning characteristics of different gender bus drivers, male drivers accounted for a larger proportion of the selected 324 drivers. Therefore, the conclusion that male drivers have a much higher forewarning rate than female drivers needs further verification. Finally, public transportation safety risk probability prediction is multifactor. Currently, it is only based on the actual data obtained by the forewarning equipment to predict.

Although the constructed model has certain accuracy, more influencing factors such as the type of road facilities, road traffic conditions, and types of bus stations need to be fully considered in the follow-up research.

Although the sample size is insufficient and there are some of the above shortcomings in the research, this paper realizes for the first time the use of real forewarning data of buses in the full time, the whole region, and full cycle to carry out research and the use of real data for objective evaluation, which is representative and innovative.

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that there are no conflicts of interest.

Authors’ Contributions

S. D. and H. Y. conceptualized the study and prepared methodology. C. L. and H. Y. analyzed using software and investigated and visualized the study. S. D., C. L., and H. Y. validated the study. C. L. and S. D. performed data curation and wrote the original draft. S. D. supervised the study, administrated the project, did formal analysis, managed resources, obtained funding acquisition, and reviewed and edited the article.

Acknowledgments

This study was supported by the Research planning fund for humanities and social sciences of the Ministry of Education (19YJAZH011), support for the Open Project of Key Laboratory of Intelligent Traffication Technology and Traffication Industry (F262019016), Science and Technology Project of Traffication Department of Jiangsu Provincial Department of Communications (KY2018049), and Jiangsu Provincial Department of Science and Technology industry-university-research cooperation project “Traffic safety-oriented intelligent supervision and decision support system platform research and development” (BY2019263).