Machine learning methods for commercial vehicle wait time prediction at a border crossing

https://doi.org/10.1016/j.retrec.2021.101034Get rights and content

Abstract

Commercial Vehicles crossing the international land port of entries (LPOEs) go through multiple screenings/stops contributing to the long queues at the congested border crossings. Although delay measurement has become precise, there is still a lack of predictive performance measures for stakeholders’ meaningful use. Instantaneous performance measures are after-the-fact with limited use for most stakeholders in terms of pro-active decision making. Therefore, as part of this study, we investigated new data sources such as Light-Emitting Diode Detection and Ranging (LEDDAR) and Radio Frequency Identifiers (RFIDs) for calculating border crossings performance measures. Next, we developed a percentile-based outlier detection method for reducing noise in the big datasets. Then, we explored machine learning to predict short-term wait time at a US-Mexico border crossing using Gradient Boosting Regression (GBR) and Random Forest (RF) Regression methods. Finally, GBR and RF machine learning algorithm predictions were compared and evaluated, along with a hybrid algorithm. The results encourage combining more sophisticated predictive algorithms and prediction methods on datasets. The high variability in data is a key challenge for machine learning algorithms leading to non-reliable predictions. This research helps to understand the performance of the LPOEs better and predict the magnitude of the situations when the performance deteriorates.

Introduction

Commercial Vehicle (CV) freight movement across the US-Mexico border is complex because of the large number of stakeholder entities. Additionally, the screening process for a US-bound CV (i.e., Mexico to the United States) is more complicated than the Mexico-bound process (Cornejo et al., 2017). A typical border crossing process at El Paso, Texas for US-bound CV/trucks involves: (i) random inspection by the Mexican customs agency (Aduana), (ii) toll payment to Mexican toll agency, (iii) inspections by U.S. Customs and Border Protection (CBP) and (iv) inspection by Department of Public Safety (DPS) to ensure compliance with Texas motor vehicle safety standards and regulations. Fig. 1 illustrates these steps followed at US-Mexico border crossings and the location of Radio Frequency Identifiers (RFIDs) for calculating performance measures.

The performance of these border crossing is measured using two border crossing performance measures—wait time and crossing time. Wait time is the measurement of the time it takes for a vehicle to reach the CBP primary inspection booth after arriving at the queue's end. Crossing time is the time for a vehicle to exit the border crossing process after joining the queue before the DPS primary inspection booth (Fig. 1). Currently, these performance measures are reported using the Border Crossing Information System (BCIS) (2). BCIS is designed to provide a real-time and archived wait and crossing time information for US-bound CVs.

The two primary border crossings on the US-Mexico border in El Paso, Texas are Zaragoza (Ysleta-Zaragoza) and Bridge of the Americas (BOTA) (see Fig. 2). In addition to travel time performance measures through BCIS (wait time and crossing time), various new technologies have been investigated for reliable border traffic volume/counts at Zaragoza and BOTA in the last decade. Recently, LeddarTech™ Light-Emitting Diode Detection and Ranging (LEDDAR) scanner proved to provide accurate counts for commercial vehicles crossing the congested border. This was a significant development, as in the past, various technologies have failed to provide a reliable count, mainly because of queuing and overlap between axles/vehicle signatures. Further, LEDDAR can provide the CV counts at a similar resolution as wait time and crossing time from BCIS—this was not the case before.

At both these border crossings, some CVs have a permit that minimizes their wait time for inspections because of the trusted-traveler/trusted-shipper program known as the Free and Secure Trade (FAST) program. The FAST is a commercial clearance program for known low-risk shipments entering the United States from Canada and Mexico and allows expedited processing for CVs who have completed background checks and fulfill certain eligibility requirements. The CVs that do not have this permit are supposed to travel through non-FAST lanes and go through standard screening and checks. Recently BCIS was upgraded to obtain disaggregated wait time and crossing times for FAST and non-FAST lanes specifically for US-bound CVs.

With recent technological developments and initiatives, researchers now have access to real-time volumes at the US-Mexico border crossing, archived and aggregated by shorter time intervals. A recent upgrade to the BCIS started providing lane-based (FAST and non-FAST lanes) crossing and wait times for US-bound CVs for border crossings. The current BCIS performance measure wait time and crossing time are instantaneous (or historical) and are not very useful for many stakeholders for decision making. For instance, truckers/shippers are interested in predicted wait time and crossing time rather than current to make pro-active decisions about routing to avoid delays. Similarly, CBP and DPS can use this information for making staffing or resource-related decisions. Besides, these new real-time datasets contain noise that requires standardization of the data cleaning process before analyzing the data. Therefore, while the data availability and resolution at these border crossings have increased, there is still a lack of meaningful performance for all stakeholders.

Further, there are only a handful of studies on short-term predictive border crossing performance measures, and all of them are performed on US-Canada border crossing. Hence, in this study, researchers investigated short-term predictive performance measures using machine learning for the US-Mexico CV movement. The researchers developed data cleaning standards to accurately depict trends in terms of border crossing performance measures, explored correlation among various variables, and finally implemented the best algorithm for short-term wait time prediction.

Performance measures are critical for stakeholders’ decision-making. For instance, governmental agencies (state departments of transportation [DOTs], metropolitan planning organizations [MPOs], etc.) need them for infrastructure and mobility improvement. Similarly, road users (commuters, shippers, truckers) need these performance measures to decide their route and departure time.

In the United States, the Highway Performance Monitoring System is a national-level highway information system that includes data on the extent, condition, performance, use, and operating characteristics of the nation's highways at high granularity (5-min or less). Similarly, the Urban Mobility Scorecard (Schrank et al., 2015) lists various highway performance measures that are widely reported to compare urban mobility across the United States. In the literature, there are a number of standard and non-traditional performance measures (Margiotta et al., 2015; Day et al., 2014) that mostly cater to quantifying mobility on the highway system.

Short-term prediction of highway performance measures have become common among real-time information providers, transportation researchers, practitioners, and users, e.g., expected travel time to a destination at a particular departure interval or time. However, not much effort has been made in predictive border crossing performance measures due to operational complexity. The border crossings are different from typical highways due to variables such as specific hours of operation, the number of open lanes, screening time, approaching volume, commodity/shipment types, flow control and segmentation, queue length, and recurrent congestion.

Currently, there is a lack of predictive border-specific mobility measures in the literature due to the lack of reliable and consistent data for most variables at a land port of entries (LPOEs). In addition, the existing methods for calculating highway performance measures do not translate well on the LPOEs —let alone prediction based on known performance measures.

The essential steps to compute performance measures for highway or freight bottlenecks include identifying the extent of the corridor for which performance measures need to be calculated, calculating reference speed that is required to define benchmark or baseline performance measure, calculating performance measures of each bottleneck entry route for each time period of interest, and ranking bottleneck by impact (Margiotta et al., 2015). However, in LPOEs, it is difficult to define a fixed reference speed since there is no speed limit defined for a corridor near an LPOE, and the typical criteria listed for freight bottlenecks are not viable for a border situation.

Villa describes the design and deployment of a border crossing time measurement system for US-bound CVs and introduced border-related performance measures such as wait time and crossing time (Villa, 2015). Although indices like monthly average border wait times are in the literature, they do not incorporate short-term temporal variation, hence do not reflect the actual condition.

Short-term forecasting of highway performance (speed, travel time, and queue length etc.) has gained considerable attention from transportation researchers and practitioners. However, little has been done to predict traffic conditions or performance at the border crossings (Moniruzzaman et al., 2016). A few studies that attempted prediction of border crossing performance measures constitute use of microscopic simulation software and Artificial Neural Network (ANN) model to forecast delays on the border crossing (Khan, 2010); use of ‘‘Enhanced Spinning Network” approach to forecast hourly traffic volumes (Lin et al., 2014); transient multi-server queueing model to predict border crossing delay (Lin et al., 2012, 2016) and an ANN model trained by multilayer feedforward neural network with backpropagation for predicting volume and crossing time (Moniruzzaman et al., 2016). Interestingly, all these handful short-term forecasting studies have been accomplished on only two US-Canada LPOEs, i.e., Ambassador Bridge, which connects Windsor, Canada to Detroit, US, and Peace Bridge that connects Niagara Falls, Canada and Buffalo, New York, US.

Although it can be argued that similar short-term prediction methods can be adopted for the US-Mexico border crossing, it should be noted the CV border crossing process at the US-Mexico border is significantly different compared to US-Canada, as it is deemed relatively “high risk” and lacks similar infrastructure. In addition, the type of commodity, traffic volumes, staffing, trade programs, tolls, screening processes, sensor locations significantly impact the operations. These differences partly reflect in the average crossing time; for instance—the average crossing time Ambassador Bridge and Peace Bridge is 20 min and 22 min with low variability (Lin et al., 2014; Moniruzzaman et al., 2016), whereas the average crossing time at BOTA and Ysleta Zaragoza is 60 min and 40 min with high variance (Border Crossing Informati, 2017; Cornejo et al., 2017).

It should also be noted that the border crossing corridor can also be seen as a highway facility made by different segments with multiple traffic controls. This problem has been studied by implementing queuing theory to capture and predict the delay, queue length or waiting time (Lin et al., 2012, 2016). Various components and parameters, including arrival process, service process, number of servers, number of system places, and the number of vehicles, can be used to capture the complex queue dynamics at the border crossing. However, examining and including the vast queuing theory literature is beyond the scope of this study, but we encourage researchers interested in this field to study literature on the implementation of queuing theory on border crossings.

Section snippets

Data analysis

The data collection for the BCIS performance measures is done using installed RFID readers (R1, R2, R3, R4) at key locations along the path of US-bound trucks from the starting point on Mexican street to the exit of state inspection facility on the U.S. side (Fig. 1):

  • R1 – the end of queue before Mexican Customs (Aduana) lot.

  • R2 – exit of Mexican toll booths.

  • R3 – U.S. CBP primary inspection.

  • R4 – exit of DPS inspection facility.

When a truck passes any of the readers from R1 through R4, its

Short-term prediction algorithm development

As described previously, truck volume counts and number of open lane data showed a weak relationship with the actual wait time data at Zaragoza. Hence, they are excluded as a variable to estimate the border crossing time at Zaragoza POE. The short-term prediction algorithms are developed with the available input data from the BCIS, which contains raw crossing times per each RFID sensor pair from the international bridge.

The algorithms are developed in Python code adopted from Scikit-learn

Conclusions

While predictive highway performance measures have become fairly common, such prediction models eluded border crossing performance measures. The key reason for the lack of predictive border crossing performance measures is complex border crossing operations (as compared to highways) and lack of reliable data at a high resolution. Hence, in this paper, researchers explored newly available datasets at US-Mexico border crossings, the standardized data cleaning process for these datasets, and

Research contribution to state-of-the-art practice

This study's fundamental contribution was to explore newly available multiple data sources and machine learning for short-term prediction of border crossing performance measures at the US-Mexico border crossing. Each land port of entry (LPOE) is different and unique; however, with new technologies, the performance measures and underlying datasets being collected are improving. Hence, researchers leveraged this opportunity to explore the development of meaningful performance measures for

Author contributions

The authors confirm contribution to the paper as follows: study conception and design: Sushant Sharma and Dong Hun Kang; data assimilation and cleaning: Jose Oca; analysis and interpretation of results: Dong Hun Kang, Jose Oca and Sushant Sharma; draft manuscript preparation: Sushant Sharma, Dong Hun Kang, Abhisek Mudgal. All authors reviewed the results and approved the final version of the manuscript.

Declaration of competing interest

All authors of the submitted manuscript declare there is NO actual or potential conflict of interest including any financial, personal, or other relationships with other people or organizations within three years of beginning the submitted work that could inappropriately influence, or be perceived to influence, their work.

Acknowledgments

The research team would like to thank the Center for International Intelligent Transportation Research (CIITR) for funding this research. The Texas A&M Transportation Institute led the research. Any errors, inaccuracies, or omissions are the responsibility of the authors. The content of this research does not necessarily reflect the official views of the agencies mentioned.

References (17)

There are more references available in the full text version of this article.

Cited by (6)

View full text