1 Introduction

Material loss occurs on the rail running surface when wheels carry out a rolling–sliding motion on the rail because of the high temperature and substantial contact stresses between wheel and rail. The material loss which occurs on the contact surface of the rail and wheel is called wear [1]. Wear mechanisms include abrasive wear, adhesive wear, delamination wear, tribochemical wear, fretting wear, surface fatigue wear, and impact wear [2]. Significant changes take place in the rail profile as a result of wear [1]. Rail wear is mainly classified into two types: vertical and lateral wear. Vertical wear appears on the upper surface of the rail head, while lateral wear occurs on the side of the rail head [3]. Rail wear depends on various parameters such as the axle load, train speed, profiles of wheel and rail, material properties of wheel and rail, track curvature, traffic type, condition of the wheel–rail contact surface, contact pressure, lubrication, and environmental effects [1, 4]. Rail wear causes the location change of the contact points between wheel and rail, leading to deterioration of the wheel–rail contact geometry and instability of railway vehicles [5]. Material loss due to wear results in a significant decrease in motion stability and ride comfort, with an increased risk of derailment of trains. The amount of wear and the current shape of the rail head are the main criteria considered in rail maintenance and rail replacement activities on site [1]. Rail wear increases the costs of rail maintenance and track maintenance by reducing the service life of the rail [6]. Accurate prediction of rail wear may improve riding comfort, safety of railway operations, and efficiency of track maintenance by decreasing track maintenance costs and risk of derailment [7]. Therefore, establishing rail wear prediction models and examining effective parameters on rail wear are crucial in terms of cost, comfort, and railway safety [8].

Statistical models which can be categorized into three types as deterministic, probabilistic, and stochastic have been used in previous research for the estimation of rail wear [9]. Costello et al. [10] developed a stochastic rail wear model by using the Markov process for rail wear simulation by means of 10 years of rail wear data from New Zealand’s railroad database. Zakeri and Shahriari [11] proposed a deterioration probabilistic model for the prediction of future rail condition and rail life based on wear by conducting rail wear measurements on a curved track during 6 months. Xu et al. [12] investigated significant factors affecting rail wear in high-speed railway turnouts by using a half-normal probability plot method and revealed that axle load, wheel–rail friction coefficient, profiles of wheel and rail, direction of passage, and vehicle speed had the major effect on turnout rail wear. Premathilaka et al. [13] developed a deterministic rail wear prediction model to prepare long-term strategic plans for the management of railway infrastructure in New Zealand. Jeong et al. [14] presented a probabilistic forecasting model for rail wear progress by using a particle filter method based on the Bayesian theory by means of rail wear data measured at the Seoul Metro. Wang et al. [15] proposed a rail profile optimization method to reduce rail wear by using a support vector machine regression analysis for fitting of the nonlinear relationship between rail profile and rail wear rate. Meghoe et al. [5] established relations between rail wear and railway operating conditions, including track geometry parameters, by means of metamodels obtained with regression analysis.

Despite the limited number of studies listed above regarding the investigation of rail wear by statistical methods, a number of studies have investigated the modeling of track gauge degradation using statistical methods. The studies on the modeling of track gauge degradation by statistical methods are included in the literature review of the present study on the grounds that rail wear is the main cause of deterioration of track gauge [16]. Falamarzi et al. [17] developed four linear multiple regression models to predict track gauge degradation by using data sets from the Melbourne’s tram system, including both curve and straight sections. Elkhoury [18] conducted two degradation models containing a time-series stochastic model and a linear regression model to estimate track gauge deterioration for curve and tangent sections of the tram network in Melbourne. Ahac and Lakušić [19] proposed mechanistic–empirical models for track gauge deviation by regression analysis, observing two types of Zagreb tram tracks with indirect elastic rail fastening system and stiffer direct elastic rail fastening system. Falamarzi et al. [20] generated two linear multiple regression models for the estimation of track gauge deviation utilizing the data set of the curve sections of the Melbourne tram network. Guler et al. [21] performed a multivariate statistical analysis to model track geometry deterioration including track gauge degradation by selecting a track section of approximately 180 km length in Turkey as the base for the model. Ahac and Lakušić [16] developed linear gauge degradation models for 35 types of tracks of the Zagreb tram network by regression analysis of the relationship between gauge deviation and track section exploitation intensity. Berawi et al. [22] presented three methodologies for the evaluation of geometrical track quality in terms of track gauge, profile, and alignment by using the measurement data recorded in the Portuguese Northern Railway Line. Westgeest et al. [23] analyzed track geometry measurement data containing the track gauge deviation by using regression analysis to identify the major contributors to track geometry deterioration and to assess the amount of necessary track maintenance. Screen et al. [24] examined operational data and investigated subthreshold delays less than 4 min incurred by Tyne and Wear Metro trains in North East England. Darlton and Marinov [25] analyzed the suitability of tilting technology for the Tyne and Wear Metro system by designing and performing several tests revealing the possible impact on ride comfort, speed, and motion sickness.

Selection of explanatory variables for the models proposed for vertical and lateral rail wear in the present study was determined based on the previous studies mentioned in the literature review. It is stated in the studies [1, 2, 4,5,6,7,8, 11, 12, 15, 16, 18, 23] that traffic load, sometimes referred to as tonnage of passing trains or axle load, is one of the most effective parameters for rail wear. Effects of track curvature associated with the curve radius on rail wear are declared in previous studies [1,2,3,4,5, 15, 16, 18]. In previous studies [1, 2, 4,5,6,7,8, 12, 15, 16], it has been revealed that rail wear depends greatly on vehicle speed. Influences of superelevation on rail wear have been previously emphasized [2, 3, 5, 12]. Taking into account the findings obtained from previous studies, traffic load, track curvature, superelevation, and train speed were selected as explanatory variables for the rail wear models proposed in the present study.

Considering all studies mentioned in the literature review, none involved examination of vertical and lateral rail wear with a multiple regression analysis method by using traffic load data obtained from passenger counts, track-related data including track curvature, superelevation, and train speed, or wear data obtained from field measurements on an LRT line in use. The present study aims to fill this research gap in the existing literature. The purpose of this study is to investigate the effects of traffic load, track curvature, superelevation, and train speed on vertical and lateral wear of rail. A multiple regression analysis technique, one of the most substantial and commonly used statistical methods for prediction and/or explanation of a dependent variable by independent variables [26], was applied in this research. The Yenikapi–Ataturk Airport LRT line, one of the oldest and most intensely used railway lines in Istanbul, was selected as case study. For the purpose of calculating traffic loads on the Yenikapi–Ataturk Airport LRT line, passenger counts were conducted in all wagons of the train set, covering all stations of the line on different days and time intervals. Amounts of vertical and lateral wear were obtained by rail wear measurements on the LRT line. Values of traffic load, track curvature, train speed, and superelevation were determined for each kilometer where measurement of rail wear was performed. Two separate multiple linear regression models for vertical and lateral wear were developed to examine the effects of traffic load, track curvature, train speed, and superelevation on the amount of vertical and lateral rail wear.

The remainder of this manuscript is organized as follows: Rail wear measurements conducted on the Yenikapi–Ataturk Airport LRT line, data collection regarding location and date of rail replacements, and determination of values of track curvature, superelevation, and train speed for multiple linear regression analysis (MLRA) are described in Sect. 2. Passenger counts performed in the railway cars operating on the line and calculation of traffic loads considering the results of passenger counts are explained in Sect. 3. Section 4 presents the correlation matrices for vertical and lateral rail wear, and the results of two multiple linear regression models developed for the determination of effective parameters on vertical and lateral rail wear. Multicollinearity tests and cross-validation analyses carried out for both vertical and lateral rail wear models are explained in Sect. 5. Finally, Sect. 6 provides the conclusions drawn from this study and recommendations for the content of future research.

2 Data Collection

The Yenikapi–Ataturk Airport LRT line, the case study for this research, has a daily ridership of 400,000 passengers, and it is one of the oldest and most heavily used railway tracks in Istanbul, Turkey. The number of daily trips in one direction on the Yenikapi–Ataturk Airport LRT line is 169 trips/one way. The initial phase of the LRT line was put into service in 1989, then new routes were constructed in course of time, and the LRT line took its current form with the opening of the Yenikapi Station in 2014. The rail track consisting of 18 stations has a total length of 26.8 km [27]. The minimum value of horizontal curve radius is 275 m, while the maximum value of superelevation is 140 mm on the railway track. Rails used in the LRT line are 49E1 Vignole rail profiles in accordance with the European Standard EN 13674-1. Superstructure of the rail track consists of both ballasted and nonballasted track sections. Although the track section between Aksaray and Yenibosna Stations was constructed as ballasted track, the track sections between Yenikapi and Aksaray Stations and between Yenibosna and Airport Stations were constructed as slab track. In the railway track, both concrete sleepers and wooden sleepers are used. Maximum speed of the trains operating in a four-wagon arrangement on the LRT line is 80 km/h. A schematic map of the Yenikapi–Ataturk Airport LRT line with its 18 stations is shown in Fig. 1.

Fig. 1
figure 1

Schematic map of Yenikapi–Ataturk Airport LRT line

2.1 Measurements of Rail Wear

Measurements of vertical and lateral wear of rails on the Yenikapi–Ataturk Airport LRT line were performed by using a rail head wear measuring device (Robel). The Robel device measures the amount of wear at certain points of the rail head by means of the needles on it according to the original rail profile that is not worn. The measuring device consists of a magnetic part, where the rail base is located, and four adjustable needles contacting the gauge corner and upper surface of the rail head. The Robel device is placed on the rail base in contact with the rail head, where the wear will be measured. The measurement of the gauge corner and upper surface of the rail head is conducted by the needles of the device contacting the rail head [28]. The rail head wear measuring device used in the Yenikapi–Ataturk Airport LRT line and the field application are shown in Fig. 2.

Fig. 2
figure 2

Rail wear measurement with rail head wear measuring device on site

After the measuring device is removed from the rail, the values on it are read, and the amounts of vertical and lateral rail wear are recorded on a rail wear measurement form. According to Metro Istanbul Inc., which operates the Yenikapi–Ataturk Airport LRT line, allowable limits for vertical and lateral rail wear are determined as 15 mm. If the lateral or vertical wear of the rail is more than 15 mm or the sum of the lateral and vertical wear is more than 25 mm, the worn rail section should be replaced [28].

Within the scope of this study, vertical rail wear at 476 points and lateral rail wear at 451 points located on the Yenikapi–Ataturk Airport LRT line were measured between 30 October 2013 and 10 May 2016. Rail wear measurements were carried out in the time period between 01:00 and 05:00 a.m., when the LRT line was closed for operation. Using the data obtained from the rail wear measurements performed on the LRT line in 2013, 2014, 2015, and 2016, a rail wear measurement table was generated. The information in the rail wear measurement table contains:

  • Track section where rail wear measurement was carried out

  • Track where wear measurement was conducted (since the LRT line is a double-track railway)

  • Kilometer where the rail wear was measured

  • Rail (inner or outer rail) where the wear measurement was performed

  • Lateral wear amount of the rail (mm)

  • Vertical wear amount of the rail (mm)

  • Date of rail wear measurement

The data in the rail wear measurement table were prepared for use in multiple linear regression models. The amounts of vertical and lateral rail wear were used as dependent variables in the regression models.

2.2 Data Collection of Rail Replacements

One of the independent variables in multiple linear regression models is the traffic load calculated for each kilometer where the rail wear measurement was conducted. Values of traffic load should be determined for the time period between 1 January 2012 and 31 December 2016, which is the time frame considered within the scope of the study. To calculate the traffic loads accurately, it is necessary to have information about the location and date of rail replacements performed on the Yenikapi–Ataturk Airport LRT line. The reason is that the cumulative traffic load affecting the rail in a location where the rail replacement was carried out becomes zero at the date of the rail replacement. In other words, rail replacement has a direct impact on the cumulative traffic load affecting the rail. For this reason, data on the rail replacement activities performed before 30 October 2013, which is the beginning of the rail wear measurements on the LRT line, should be collected. In this context, the date of 1 January 2012 was taken as basis, and data on the rail replacement activities conducted on the Yenikapi–Ataturk Airport LRT line between 1 January 2012 and 31 December 2016 were collected. Daily reports prepared by Metro Istanbul Inc. between 1 January 2012 and 31 December 2016 were analyzed, and information about the date and location of the rail replacements on the line was listed. Afterwards, a comprehensive table including rail wear measurement data together with rail replacement data was prepared. In this table, location of rail wear measurement, date of wear measurement, vertical and lateral wear amounts of the rail, and if any, date of rail replacement performed before the wear measurement date of the relevant rail were presented.

2.3 Determination of Values of Track Curvature, Train Speed, and Superelevation

In the multiple linear regression models, the other independent variables, except for traffic load, are track curvature, train speed, and superelevation. Values of track curvature, train speed, and superelevation were determined for 476 points where vertical wear of the rail was measured and 451 points where lateral wear of the rail was measured on the Yenikapi–Ataturk Airport LRT line. Track curvature values were obtained from the profile of the LRT line. To calculate track curvature, the beginning and ending kilometers of horizontal curves, the radii of horizontal curves, the starting and ending kilometers of transition curves, and the radius of curvature of transition curves were used. For a rail wear measurement point located between the beginning and ending kilometers of a horizontal curve, the track curvature at the measurement point was calculated by the following equation [29]:

$${\text{Track curvature}} = \frac{1}{r},$$
(1)

where r is the radius of the horizontal curve (m), and the unit of the track curvature is m−1. However, for a rail wear measurement point located in the alignment section of the track (straight track), the track curvature becomes zero since the horizontal curve radius is infinite, as can be seen in Eq. (2):

$${\text{Track curvature}} = \frac{ 1 }{r} = \frac{ 1 }{\infty } = 0 .$$
(2)

In the case where the rail wear measurement point is located between the starting and ending kilometers of a transition curve, the track curvature at the measurement point was computed as follows:

$${\text{Track curvature}} = \frac{ 1 }{{\rho_{x} }} .$$
(3)

Here ρx is the radius of the transition curve at the point where the wear is measured (m), and the unit of the track curvature is m−1 [29]. After completing the calculation of track curvature, superelevation values were determined for each point where rail wear measurement was performed in the horizontal and transition curve sections. While superelevation values for the horizontal and transition curves were obtained from the profile of the LRT line, superelevation values for the straight track were zero. Finally, values of train speed for each point where rail wear measurement was carried out were specified by using the speed–distance diagram of the trains operated on the LRT line.

3 Determination of Traffic Loads by Passenger Counts

The number of passengers carried by the train in the track sections between stations on the LRT line must be determined to calculate the traffic loads at the rail wear measurement points. Data records on Istanbul-card, which is the contactless smart card used for transport fare payment on public transportation in Istanbul, were obtained from Metro Istanbul Inc. for the Yenikapi–Ataturk Airport LRT line. Using these data, the number of daily passengers boarding the train at each station was acquired. However, the passengers did not use their Istanbul-card while getting off the train, hence the number of passengers getting off the train at each station could not be determined. Therefore, passenger counts were performed on the Yenikapi–Ataturk Airport LRT line to calculate the number of passengers getting off the train at the stations and the number of passengers carried by the train in the track sections between stations.

3.1 Passenger Counts

A total of 120 passenger-counting studies were carried out in the wagons of the train sets operated on the Yenikapi–Ataturk Airport LRT line between 7 February 2018 and 29 April 2018. While 60 of the passenger-counting studies were performed in the Yenikapi–Airport direction, the remaining 60 studies were conducted in the Airport–Yenikapi direction. Passenger counts were performed on both weekdays and weekends to cover all stations on the LRT line and all wagons of the train set. Due care was taken to ensure that passenger counts were conducted to cover all working hours from 06:00 until 24:00, when the LRT line was open for operation.

Each train set operated on the Yenikapi–Ataturk Airport LRT line is composed of four wagons. Each passenger-counting study was carried out by two observers in one of the four wagons of the train set. Since there were four gates inside a wagon for passenger boarding and descending, each observer in the wagon was responsible for two doors. In each passenger-counting study, two observers boarded the wagon at the first station and traveled in the same wagon to the last station, counting the number of passengers getting on the wagon, the number of passengers descending from the wagon, and the number of passengers carried inside the wagon. During the passenger count, the number of passengers boarding, number of passengers descending, and number of passengers carried inside the wagon were recorded on passenger-counting forms by the observers.

Due to the length of the wagons, two observers were required in one wagon to accurately count the number of passengers getting on and number of passengers off the train. Since the train set consisted of four wagons, the number of passengers boarding and number of passengers descending from each wagon was calculated by the two observers in that one wagon. In this calculation, the occupancy rate difference between the wagons of the train set was used. To determine the occupancy rate difference between the wagons, additional passenger counts were conducted on the Yenikapi–Ataturk Airport LRT line. Additional passenger counting studies were again performed by two observers and labeled as “first wagon + middle wagon” or “last wagon + middle wagon.” While one of the observers was counting passengers in the first wagon, the other observer counted passengers in the middle wagon (second wagon) simultaneously. The same method was carried out in another case where one of the observers counted passengers in the last wagon (fourth wagon), while the other observer was counting passengers in the middle wagon (third wagon) simultaneously. In the additional passenger counts, for each station of the LRT line, observers counted the number of passengers boarding the wagon, the number of passengers getting off the wagon, and the number of passengers carried inside the wagon, as performed in the previous passenger counts. Occupancy rate difference between the first/last wagons and middle wagons was calculated as 10.04% by comparing “the number of passengers carried inside the wagon” between the first, the last, and the middle wagons. For ease of calculation, the occupancy rate difference between the first/last wagons and middle wagons was accepted as 10%. Considering the passenger-counting study performed in one of the middle wagons (second or third wagon), the number of passengers boarding, number of descending, and number of carried inside the other three wagons were determined by using an occupancy rate difference of 10%:

  • Since one of the remaining three wagons is a middle wagon, it shows the same features as the other middle wagon where the passengers were counted. Therefore, the number of passengers boarding, number of passengers descending, and number of passengers carried inside the wagon for this rail car were assumed to be the same as the values of the wagon where the passenger counting was conducted.

  • For the first wagon of the train set, the number of passengers boarding, number of passengers descending, and number of passengers carried inside the wagon were assumed to be 10% lower than the values of the middle wagon where the passengers were counted.

  • For the last wagon of the train set, the number of passengers boarding, number of passengers descending, and number of passengers carried inside the wagon were assumed to be 10% lower than the values of the middle railcar where the passenger counting was carried out.

Thus, for 120 passenger-counting studies performed, the total number of passengers boarding the train, total number of passengers getting off the train, and total number of passengers carried inside the train consisting of four wagons were obtained at each station of the LRT line. As an example of the passenger counts, the results of the passenger-counting study conducted in the direction of Yenikapi–Airport on 13 February 2018 between 07:42 and 08:17 a.m. are presented in Table 1. The journey duration from Yenikapi Station to Airport Station in one direction is 35 min, hence the passenger counting started at 07:42 and ended at 08:17 a.m.

Table 1 Results of passenger-counting study performed on 13 February 2018 between 07:42 and 08:17 a.m.

3.2 Determination of Traffic Loads

To calculate the traffic loads affecting rail at the rail wear measurement points, the following steps were taken in turn:

  1. 1.

    For the 120 passenger-counting studies conducted, the ratio of passengers getting off the train at each station of the LRT line was calculated.

  2. 2.

    The average daily ratio of passengers getting off the train for each station was determined by considering the peak hour traffic on weekdays and weekends.

  3. 3.

    Depending on the track section where a rail wear measurement point was located, the number of passengers boarding the train at the relevant station was specified by using the daily Istanbul-card data at the stations.

  4. 4.

    Depending on the track section where the wear of rail was measured, the number of passengers descending from the train at the relevant station was computed by considering the average daily ratio of passengers getting off the train.

  5. 5.

    In the track section where the rail wear measurement was performed, the number of passengers carried inside the train was determined by using the number of passengers boarding the train and the number of passengers getting off the train at the relevant station.

  6. 6.

    Traffic load affecting the rail at the rail wear measurement point was calculated according to the number of passengers carried inside the train in the relevant track section.

Primarily, for 120 passenger-counting studies carried out on the LRT line, the ratio of passengers getting off the train at each station was computed by using the number of passengers boarding the train, number of passengers descending from the train, and number of passengers inside the train coming from the previous station, as follows:

$${\text{RPGT}} = \frac{\text{NPGTRS}}{{{\text{NPTCPS}} + {\text{NPBTRS}}}} .$$
(4)

Here, RPGT is the ratio of passengers getting off the train at a certain station, NPBTRS represents the number of passengers boarding the train at the relevant station, NPGTRS symbolizes the number of passengers getting off the train at the relevant station, and NPTCPS represents the number of passengers inside the train coming from the previous station. After obtaining the ratio of passengers getting off the train at each station for 120 passenger-counting studies, the stage of calculating the average daily ratio of passengers getting off the train for each station was started. The ratio of passengers getting off the train at each station, time periods specified by the peak-hour traffic on weekdays and weekends, and the number of daily trips performed in these time periods on the LRT line were used to determine the average daily ratio of passengers getting off the train for each station. Separate analyses were carried out for the Yenikapi–Airport and Airport–Yenikapi directions. Due to the difference in passenger density between weekdays and weekends, separate evaluations were conducted for weekdays and weekends by considering the peak hours. The reason for taking into account different time periods was the difference in passenger density between peak hours and off-peak hours. Moreover, the number of trips performed by trains in each time period in 1 day was different from each other. Therefore, different time periods were considered in modeling to accurately reflect the effects of the difference in passenger density and number of trips performed by trains on the traffic load.

Peak hours on weekdays for the Yenikapi–Ataturk Airport LRT line were determined as occurring between 07:00 and 08:59 in the morning and between 17:00 and 19:59 in the evening by evaluating the results of the passenger counts. The hours not included in these two time periods were considered off-peak hours. Within the time frame between 06:00 and 24:00, when the LRT line was open for operation, five basic time periods were identified for weekdays by considering the passenger density obtained from the passenger counts:

  • Time period between 06:00 and 06:59

  • Time period between 07:00 and 08:59 (peak hours)

  • Time period between 09:00 and 16:59

  • Time period between 17:00 and 19:59 (peak hours)

  • Time period between 20:00 and 24:00

The average daily ratio of passengers getting off the train for each station on weekdays was calculated by using the ratio of passengers getting off the train at each station for the five main time periods on weekdays and the number of trips performed by trains in these five time periods in 1 day. Peak hours on weekends for the Yenikapi–Ataturk Airport LRT line were defined as 12:00–14:59 in the afternoon by assessing the results of the passenger counts. The hours not involved in this time period were off-peak hours. Within the working hours of the LRT line between 06:00 and 24:00, four basic time periods were determined for weekends by taking into account the passenger density acquired from the passenger counts:

  • Time period between 06:00 and 11:59

  • Time period between 12:00 and 14:59 (peak hours)

  • Time period between 15:00 and 19:59

  • Time period between 20:00 and 24:00

The time periods between 15:00 and 19:59 and between 20:00 and 24:00 on weekends were not analyzed together due to the difference in passenger density between these time frames according to the results of the passenger counts. Passenger density in the time period between 15:00 and 19:59 was higher than that in the time frame between 20:00 and 24:00. In addition, the number of trips performed by trains in the time period between 15:00 and 19:59 in 1 day was higher than that in the time frame between 20:00 and 24:00 in 1 day. For this reason, the time periods between 15:00 and 19:59 and between 20:00 and 24:00 were considered separately.

The average daily ratio of passengers getting off the train for each station on weekends was computed by utilizing the ratio of passengers getting off the train at each station for the four major time periods on weekends and the number of trips performed by trains in these four time periods in 1 day. After obtaining the average daily ratio of passengers getting off the train for each station on weekdays and weekends separately, the average daily ratio of passengers getting off the train for each station was calculated based on the weighted average of these values. Consequently, the average daily ratio of passengers getting off the train at each station for the Yenikapi–Airport and Airport–Yenikapi directions are presented in Tables 2 and 3, respectively.

Table 2 Average daily ratio of passengers getting off the train at each station for Yenikapi–Airport direction
Table 3 Average daily ratio of passengers getting off the train at each station for Airport–Yenikapi direction

In Table 2, the average daily ratio of passengers getting off the train at Yenikapi Station is zero since Yenikapi Station is the first station for the Yenikapi–Airport direction. On the contrary, the average daily ratio of passengers getting off the train at Airport Station is 100% because Airport Station is the last station for the Yenikapi–Airport direction. As presented in Table 3, since Airport Station is the first station for the Airport–Yenikapi direction, the average daily ratio of passengers getting off the train is zero. Conversely, the average daily ratio of passengers getting off the train at Yenikapi Station is 100% because it is the last station for the Airport–Yenikapi direction.

The next stage of the traffic load calculation is to obtain the number of passengers boarding the train at the stations. Depending on the track section where the rail wear was measured, the number of passengers boarding the train at the relevant station was determined by using the daily number of Istanbul-cards recorded at the relevant station. At this stage, the table containing rail wear measurement data together with the rail replacement data mentioned in Sect. 2.2 was also utilized. If there is no rail replacement at the rail wear measurement location before the measurement date, the daily number of Istanbul-cards recorded at the relevant station is specified between the wear measurement date and 1 January 2012, which is the beginning of the time frame considered in this study. If there is any rail replacement at the rail wear measurement point before the measurement date, the daily number of Istanbul-cards recorded at the relevant station is determined between the rail replacement date and the wear measurement date.

In the next stage of the traffic load calculation, depending on the track section where the rail wear measurement was performed, the number of passengers getting off the train at the relevant station was calculated by using the number of passengers boarding the train, the average daily ratio of passengers getting off the train at the relevant station, and the number of passengers inside the train coming from the previous station. The equation for this calculation is as follows:

$${\text{NPGTRS}} = {\text{ADRPGT}} \times \left( {{\text{NPBTRS}} + {\text{NPTCPS}}} \right) .$$
(5)

Here, ADRPGT is the average daily ratio of passengers getting off the train at the relevant station, NPBTRS symbolizes the number of passengers boarding the train at the relevant station, NPGTRS represents the number of passengers getting off the train at the relevant station, and NPTCPS denotes the number of passengers inside the train coming from the previous station. In the next phase of the traffic load calculation, for the track section where the rail wear was measured, the number of passengers carried inside the train was computed by means of the number of passengers boarding the train and the number of passengers getting off the train at the relevant station. As an example, for the Yenikapi–Airport direction, where the stations of the LRT line were sorted as Yenikapi–Aksaray–Emniyet–…–Airport, the number of passengers carried inside the train in the track section between Aksaray and Emniyet Stations was determined as follows:

$${\text{NPCTAE}} = {\text{NPTCYS}} + {\text{NPBTAS}} - {\text{NPGTAS}} .$$
(6)

Here, NPCTAE is the number of passengers carried inside the train in the track section between Aksaray and Emniyet Stations, NPTCYS represents the number of passengers inside the train coming from Yenikapi Station, NPBTAS symbolizes the number of passengers boarding the train at Aksaray Station, and NPGTAS denotes the number of passengers getting off the train at Aksaray Station. As Yenikapi Station is the first station of the LRT line for the Yenikapi–Airport direction, the number of passengers getting off the train at this station is zero, and all the passengers boarding the train at this station arrive at the next station, Aksaray, which is the second station of the LRT line. Thus, the number of passengers inside the train coming from Yenikapi Station denoted by NPTCYS in Eq. (6) was obtained.

The final stage of the traffic load calculation is the determination of traffic load affecting the rail at the rail wear measurement points. This was computed based on the empty weight of the train, total number of trips in one direction performed by trains for the number of days considered in the traffic load calculation, and the number of passengers carried inside the train in the relevant track section, as follows:

$${\text{TL}} = \left( {{\text{EWT}} \times {\text{TNT}}} \right) + \left( {{\text{NPCT}} \times {\text{AWP}}} \right) ,$$
(7)

where TL is the traffic load affecting the rail at the rail wear measurement point, EWT represents the empty weight of the train, TNT symbolizes the total number of trips in one direction performed by trains for the number of days considered in the traffic load calculation, NPCT denotes the number of passengers carried inside the train in the relevant track section, and AWP signifies the average weight of a passenger. Number of days considered in the traffic load calculation was identified by using the table including rail wear measurement data and rail replacement data. If there is not any rail replacement at the wear measurement point before the measurement date, the number of days considered in the traffic load calculation is equal to the number of days between the wear measurement date and 1 January 2012, which is the origin of the time period considered in this research. If there is any rail replacement at the rail wear measurement point before the measurement date, the number of days considered in the traffic load calculation corresponds to the number of days between the rail replacement date and the wear measurement date. Using the number of days considered in the traffic load calculation and the number of daily trips in one direction (169 trips/one way) on the LRT line, the total number of trips in one direction performed by trains for the number of days considered in the traffic load calculation was obtained.

In Eq. (7), NPCT refers to the number of passengers carried inside the train for the number of days considered in the traffic load calculation in the relevant track section where the rail wear measurement was carried out. In this study, the average weight of a passenger was assumed as 75 kg [30]. The empty weight of the train was determined depending on the weight of the four wagons without passengers. A wagon had six axles, and the axle load was 5 ton/axle; therefore, the empty weight of a wagon was calculated as 30 tons. Since the train set consisted of four wagons, the empty weight of the train was computed as 120 tons. Consequently, the traffic load affecting the rail at 476 points where vertical wear of the rail was measured and 451 points where lateral wear of the rail was measured on the Yenikapi–Ataturk Airport LRT line was calculated in (tons) according to Eq. (7).

Note that passenger counts were carried out only to calculate the average daily ratio of passengers getting off the train for each station (since passengers did not use their Istanbul-cards while getting off the train). The number of passengers boarding the train at each station was obtained directly from the daily number of Istanbul-cards recorded at the stations between 1 January 2012 and 31 December 2016. In other words, the number of passengers boarding the train at the stations was determined depending on the daily number of Istanbul-cards recorded at the stations provided by Metro Istanbul Inc. between 1 January 2012 and 31 December 2016. Nevertheless, it is crucial for the validity of the data analysis to examine the different periods of time used in the traffic load calculation. Therefore, a descriptive step was performed by taking into account the Istanbul-card data recorded at the stations in 2016 and 2018 to investigate the presence of variations in the passengers’ demand that can affect the traffic load calculation. For this purpose, the number of Istanbul-cards recorded at each station of the Yenikapi–Ataturk Airport LRT line in 2016 and 2018 was used. Primarily, this was obtained from Metro Istanbul Inc. Then, the total number of Istanbul-cards recorded at each station of the LRT line in 2016 and 2018 were compared with each other. As presented in Table 4, the number of Istanbul-cards recorded at the each station of the LRT line in 2016 was close to that in 2018 on a station basis. Consequently, it is concluded that passenger demand at these stations in 2016 was close to that in 2018.

Table 4 Comparison of number of Istanbul-cards recorded at LRT line stations in 2016 and 2018

Another analysis of passenger demand was carried out by considering the number of Istanbul-cards recorded on the entire LRT line. For this purpose, the number of Istanbul-cards recorded on the entire LRT line in 2016 and that in 2018 were determined and compared with each other. As presented in Table 4, the total number of Istanbul-cards recorded on the entire track in 2016 is 118,411,591, while the total number of Istanbul-cards recorded on the entire track in 2018 is 119,671,402. Accordingly, the percentage change in the total number of Istanbul-cards recorded on the entire LRT line between 2016 and 2018 was calculated as 1.06%. The percentage change of 1.06% in the total number of Istanbul-cards is quite low, indicating that the passenger demand for the entire LRT line changed very slightly between 2016 and 2018. As a result, it is determined that no significant change was experienced in passenger demand between 2016 and 2018, either for the entire LRT line or by station. Since the number of passengers boarding the train at the stations was obtained directly from the daily number of Istanbul-cards recorded at the stations for the relevant dates and the passenger demand on the LRT line was quite similar over the years, the calculated traffic loads reflect the effects of demand and/or operational variations along the line with a very high accuracy for the relevant periods.

4 Development of Multiple Linear Regression Models for Rail Wear

The multiple regression analysis method, one of the most significant and commonly used statistical methods for identifying the nature of relationships between multiple variables [26, 31], was applied for this research. Multiple linear regression analysis is a general data-analytic procedure to relate a set of independent (predictor) variables to a dependent (criterion) variable, for both explanatory and predictive purposes, through an equation that is linear in its parameters [26, 32]. The general form of a multiple linear regression model with k predictor variables X1i,…,Xki and a criterion variable Yi can be written as:

$$Y_{i} = \beta_{0} + \beta_{1} X_{1i} + \cdots + \beta_{k} X_{ki} + \varepsilon_{i},$$
(8)

where i = 1,…,N and k = 1,…,K; Xki is the kth independent variable at the ith observation, Yi is the dependent variable at the ith observation, βk is the regression coefficient for the kth regressor, N is the number of observations, and εi is the error for the ith observation. The least-squares method is a standard approach in regression analysis to estimate regression coefficients. Regression coefficients obtained by the least-squares method in multiple regression minimize the sum of squared errors between the observed values and the model implied values of the dependent variable [26]. A regression coefficient indicates the expected change in the dependent variable related to a one-unit change in a certain independent variable while the other independent variables are held constant [33].

To define the strength and direction of the linear relationship between variables, a correlation coefficient is used as an illustrative measure. The correlation coefficient denoted by R takes values ranging from −1 to +1 [31]. A correlation coefficient value equal to 1 indicates a precise positive relationship in which both variables increase together. However, a correlation coefficient value equal to −1 indicates a precise negative relationship in which one variable increases while the other variable decreases [34]. A correlation coefficient value of zero implies no linear relationship between variables. The strength of the linear relationship increases as the value of the correlation coefficient approaches −1 or 1 [31]. The multiple correlation coefficient (multiple R) describing the degree of linear relationship between two or more independent variables and a single dependent variable is used to evaluate the quality of the estimation of the dependent variable [35, 36].

The most influential set of predictors in multiple regression is primarily identified by assessing the coefficient of determination, which is the square of the multiple correlation coefficient [33]. The coefficient of determination denoted by R2 is the proportion of variance of the dependent variable accounted for by the independent variables [35]. The coefficient of determination computed in a sample overestimates the accurate R2 in the sample; therefore the value of R2 needs to be corrected. The corrected value of R2 is called the adjusted R2. The adjusted R2, preventing problems with overestimation, measures the accurate predictive power of the variables in the sample [33, 35].

An F-test in analysis of variance (ANOVA) is used to examine the overall significance of the regression by testing the hypothesis that all regression coefficients are jointly zero [37, 38]. The probability value denoted as p-value for the F-test is the indicator of the overall significance of the regression model. For a 95% confidence interval and a significance level of α = 0.05, if the p-value for the i-test is less than 0.05, the regression is overall significant, which means that at least one of the predictor variables is useful for the prediction of the dependent variable [31]. To evaluate the contribution of each independent variable to the regression model, a t-test examining the significance of each regression coefficient separately is used [31, 38]. The p-value for the t-test is taken into account to determine predictor variables that can be useful to predict dependent variable. For a 95% confidence interval and a significance level of α = 0.05, if the p-value for the t-test related to a certain predictor variable is lower than 0.05, then the relevant predictor variable has a statistically significant effect on the dependent variable [39].

It is recommended to examine the correlation matrix of independent variables to identify linear dependencies that may exist between them before carrying out a multiple regression analysis [34]. Independent variables highly related to each other are not preferred in multiple regression. A correlation coefficient between each pair of independent variables should not exceed 0.80; otherwise, the independent variables presenting a relationship greater than 0.80 may be suspicious of showing multicollinearity. Multicollinearity is generally considered as a problem because it indicates that the regression coefficients may be unsteady and may vary significantly among samples. If two variables are extremely correlated, it makes no sense to consider them as separate assets [40].

4.1 Multiple Linear Regression Model for Vertical Rail Wear

To investigate the effects of traffic load and track parameters on the amount of vertical rail wear, a multiple linear regression model was developed in Excel. Independent variables in a multiple linear regression model for vertical wear include traffic load (tons), track curvature (m−1), superelevation (mm), and train speed (km/h), whereas the dependent variable is the vertical rail wear amount (mm). The sample size in the model consists of 476 points where vertical rail wear was measured on the Yenikapi–Airport LRT line, and the values of the independent variables were determined for each point. Primarily, a correlation matrix of dependent and independent variables was analyzed. The correlation matrix showing the correlation coefficients between each pair of variables for the vertical rail wear regression model is presented in Table 5.

Table 5 Correlation matrix showing correlation coefficients between variables

As seen in Table 5, the correlation coefficients between each pair of independent variables were obtained as 0.0603, 0.2393, −0.0882, 0.0825, 0.0921, and 0.1492, indicating a weak linear relationship between independent variables because of the values of R approaching to zero. The correlation coefficients between each pair of dependent and independent variables were determined as 0.9178, 0.0633, 0.2029, and −0.0818, revealing that traffic load was the only independent variable strongly related to the dependent variable. Due to the low correlation between independent variables, it is concluded that there is no obstacle to the use of all independent variables in multiple linear regression analysis. Regression statistics of the multiple linear regression model developed for vertical rail wear are presented in Table 6.

Table 6 Regression statistics of multiple linear regression model for vertical rail wear

According to Table 6, the multiple linear regression model yields a multiple correlation coefficient of 0.9180, implying a strong linear relationship between the dependent and independent variables because of a multiple R value close to 1. The coefficient of determination R2 and the adjusted R2 were obtained as 0.8427 and 0.8414, respectively. The adjusted R2 value indicates that 84.14% of the variance of the dependent variable can be explained by the independent variables. Standard error of the regression was determined as 0.0995. F-test in ANOVA produced an F-value of 630.9581 and a p-value of 0.0000 as the significance F. Since the p-value obtained as 0.0000 is lower than 0.05, the regression is overall significant at the significance level of α = 0.05 (95% confidence interval), revealing that at least one of the predictor variables is useful for the prediction of the dependent variable. To examine the contribution of each independent variable to the regression model separately, a t-test was used. The coefficients table presented in Table 7 shows the t-statistic and p-value for the t-test applied for each independent variable along with regression coefficients and standard errors of the regression coefficients.

Table 7 Coefficients table of multiple linear regression model for vertical rail wear

The “intercept” in Table 7 is the constant term in the regression model described as the mean value of the dependent variable when all independent variables are set to zero. The significance of each predictor variable was determined based on the p-value for the t-test. As presented in Table 7, the p-value for traffic load was found as 0.0000. Since the p-value is lower than the significance level of α = 0.05, it is concluded that traffic load has a statistically significant effect on the amount of vertical rail wear. However, the p-values for track curvature, superelevation, and train speed were obtained as 0.6209, 0.3311, and 0.9352, respectively. Since these three p-values are greater than the significance level of α = 0.05, it is concluded that the track curvature, superelevation, and train speed do not have a statistically significant effect on the amount of vertical rail wear.

Another multiple linear regression model was established for vertical rail wear by making some changes in the independent variables. Explanatory variables in the multiple linear regression model include traffic load (tons), track curvature square (m−2), train speed square (km2/h2), and superelevation (mm), while the dependent variable is the amount of vertical rail wear (mm). The sample size of the model is 476. The correlation matrix of dependent and independent variables showing the correlation coefficients between each pair of variables is presented in Table 8.

Table 8 Correlation matrix showing correlation coefficients between variables

The correlation coefficients related to the replaced parameters in Table 8 are slightly lower than the correlation coefficients in the previous correlation matrix presented in Table 5. According to Table 8, correlation coefficients approaching to zero between each pair of independent variables imply a weak linear relationship between independent variables. With an R value of 0.9178, traffic load is the only explanatory variable strongly related to the dependent variable. Regression statistics of the multiple linear regression model with the replaced independent variables are presented in Table 9.

Table 9 Regression statistics of multiple linear regression model with modified independent variables

The regression statistics in Table 9 are found to be very close to the regression statistics for the previous model presented in Table 6. A multiple R value close to 1 reveals a strong linear relationship between dependent and independent variables. The adjusted R2 value indicates that 84.16% of the variance of the dependent variable can be explained by the independent variables. The p-value obtained as 0.0000 shows that the regression is overall significant at the significance level of α = 0.05. A coefficients table of the regression model with the replaced independent variables is presented in Table 10.

Table 10 Coefficients table of multiple linear regression model with modified independent variables

As presented in Table 10, since the p-value for traffic load is lower than the significance level of α = 0.05, it is concluded that traffic load has a statistically significant effect on the amount of vertical rail wear. However, the p-values for track curvature square, train speed square, and superelevation, which are greater than the significance level of α = 0.05, indicate that track curvature square, train speed square, or superelevation do not have a statistically significant effect on the amount of vertical rail wear.

4.2 Multiple Linear Regression Model for Lateral Rail Wear

A multiple linear regression model was established in Excel to analyze the effects of traffic load and track parameters on the amount of lateral rail wear. Independent variables in multiple linear regression model for lateral wear include traffic load (tons), track curvature (m−1), train speed (km/h), and superelevation (mm), while the dependent variable is the amount of lateral rail wear (mm). The sample size in the model consists of 451 points where lateral rail wear measurements were conducted on the Yenikapi–Airport LRT line, and the values of independent variables were designated for each point. Initially, a correlation matrix of dependent and predictor variables was examined. The correlation matrix presented in Table 11 shows the correlation coefficients between each pair of variables for lateral rail wear regression model.

Table 11 Correlation matrix showing correlation coefficients between variables

According to Table 11, the correlation coefficients between each pair of predictor variables were obtained as 0.0560, 0.2327, −0.0810, 0.0836, 0.0996, and 0.1514, revealing a weak linear relationship between independent variables due to the R values approaching to zero. The correlation coefficients between each pair of dependent and predictor variables were determined as 0.8742, 0.0702, 0.2148, and −0.0686, indicating that traffic load was the only predictor variable strongly related to the dependent variable. As a result of the low correlation among independent variables, it is determined that there is no impediment to the use of all independent variables in multiple linear regression analysis. The multiple linear regression model developed for lateral rail wear yields the regression statistics presented in Table 12. The multiple linear regression model produces a multiple correlation coefficient of 0.8745, indicating a strong linear relationship between the dependent and independent variables due to a multiple R value close to 1. The coefficient of determination R2 and the adjusted R2 were found to be 0.7647 and 0.7626, respectively. The adjusted R2 value reveals that 76.26% of the change in the dependent variable can be explained by the independent variables.

Table 12 Regression statistics of multiple linear regression model for lateral rail wear

As presented in Table 12, the standard error of the regression was specified as 0.0962. The F-test in ANOVA generated an F-value of 362.4583 and a p-value of 0.0000 as the significant F. Since the p-value obtained as 0.0000 is less than 0.05, the regression is overall significant at the significance level of α = 0.05 (95% confidence interval), showing that at least one of the independent variables is useful for the estimation of the dependent variable. The contribution of each independent variable to the regression model was evaluated by using a t-test. The coefficients table presented in Table 13 presents the t-statistic and p-value for the t-test applied for each independent variable together with the regression coefficients and standard errors of the regression coefficients.

Table 13 Coefficients table of multiple linear regression model for lateral rail wear

The “intercept” represents the constant term in the regression model as presented in Table 13. The significance of each independent variable was identified by considering the p-value for the t-test. According to Table 13, the p-value for traffic load was found to be 0.0000. Since this p-value is lower than the significance level of α = 0.05, it is determined that traffic load has a statistically significant effect on the amount of lateral rail wear. However, the p-values for track curvature, superelevation, and train speed were obtained as 0.3698, 0.6541, and 0.9390, respectively. Due to these three p-values being greater than the significance level of α = 0.05, it is concluded that track curvature, superelevation, and train speed do not have a statistically significant effect on the amount of lateral rail wear.

Another multiple linear regression model was developed for lateral rail wear by making some modifications in the independent variables. Explanatory variables in the multiple linear regression model contain traffic load (tons), track curvature square (m−2), train speed square (km2/h2), and superelevation (mm), whereas the dependent variable is the lateral rail wear amount (mm). The sample size of the model is 451. A correlation matrix of dependent and independent variables is presented in Table 14.

Table 14 Correlation matrix showing correlation coefficients between variables

The correlation coefficients related to the modified parameters in Table 14 are slightly lower than the correlation coefficients in the previous correlation matrix presented in Table 11. As presented in Table 14, the correlation coefficients approaching zero between each pair of explanatory variables indicate a weak linear relationship between independent variables. Due to its R value of 0.8742, traffic load is the only independent variable strongly related to the dependent variable. Regression statistics of the multiple linear regression model with the modified independent variables are presented in Table 15.

Table 15 Regression statistics of multiple linear regression model with the modified independent variables

The regression statistics in Table 15 are very close to those of the previous model presented in Table 12. The multiple R value close to 1 signifies a strong linear relationship between dependent and explanatory variables. The adjusted R2 value indicates that 76.30% of the variance of the dependent variable can be explained by the explanatory variables. A p-value obtained as 0.0000 means that the regression is overall significant at the significance level of α = 0.05. A coefficients table of the regression model with the modified independent variables is presented in Table 16.

Table 16 Coefficients table of multiple linear regression model with modified independent variables

According to Table 16, the p-value for traffic load is less than the significance level of α = 0.05, implying that traffic load has a statistically significant effect on the amount of lateral rail wear. However, the p-values for track curvature square, train speed square, and superelevation, which are higher than the significance level of 0.05, show that track curvature square, train speed square, or superelevation do not have a statistically significant effect on the amount of lateral rail wear.

5 Results of Multicollinearity Tests and Cross-Validation Analyses

5.1 Multicollinearity Tests

Multicollinearity occurs when two or more explanatory variables of a multiple linear regression model are highly correlated, leading to a reduction of the reliability of the analysis. Multicollinearity can be detected by using a variance inflation factor (VIF), which measures the correlation between explanatory variables in the regression model. The VIF value for each explanatory variable is calculated according to Eq. 9 [41]:

$${\text{VIF}} = \frac{1}{{1 - R^{2} }} .$$
(9)

The VIF for each explanatory variable is computed by performing individual regression analyses using one explanatory variable as the dependent variable and the other explanatory variables as the independent variables. VIF value is mainly used to measure the severity of multicollinearity in the multiple regression model. A VIF value greater than 5 or 10 indicates multicollinearity problems with severe correlation between a given explanatory variable and the other explanatory variables [41].

For the vertical rail wear regression model, the VIF values of each explanatory variable including traffic load, track curvature, train speed, and superelevation were calculated according to Eq. 9. The results are presented in Table 17. As presented in Table 17, the VIF values for all the explanatory variables were obtained very close to 1. Since the VIF values for all explanatory variables are lower than 5, it is concluded that multicollinearity is not a problem for the vertical rail wear regression model.

Table 17 VIF values for explanatory variables of vertical rail wear regression model

For the lateral rail wear regression model, the VIF values of each explanatory variable including track curvature, traffic load, superelevation, and train speed were computed according to Eq. 9. The results are presented in Table 18. As presented in Table 18, the VIF values for all explanatory variables were determined as very close to 1. Due to the VIF values being lower than 5 for all explanatory variables, it is concluded that multicollinearity is not a problem for the lateral rail wear regression model.

Table 18 VIF values for explanatory variables of lateral rail wear regression model

5.2 Cross-Validation Analyses

Cross-validation techniques are commonly used to evaluate the predictive performance of the models by estimating the prediction error. K-fold cross-validation is widely used for the estimation of the prediction error. In K-fold cross-validation, the data are randomly split into K approximately equal-sized parts. Generally, fivefold or tenfold cross-validation is preferred in terms of computational issues. In cross-validation, the dataset is divided into two subgroups of unequal size; regression coefficients of subgroup 1 are determined and applied to subgroup 2. Then, the effect of the regression coefficients of subgroup 1 on the prediction performance of subgroup 2 is tested [42, 43].

In this study, a fivefold cross-validation technique was used. For vertical rail wear model, the dataset was split into five approximately equally sized parts. In each iteration, regression coefficients of the training dataset were calculated by multiple linear regression analysis. Then, these regression coefficients were used to predict the dependent variable in the test dataset. To measure the accuracy of the prediction, the correlation coefficient (R) between the predicted values and the actual values was determined. In addition to R, the mean square error (MSE) of the predicted and actual values was calculated. The results of the cross-validation analysis performed for the vertical rail wear model are presented in Table 19.

Table 19 Results of cross-validation analysis performed for vertical rail wear model

As presented in Table 19, the correlation coefficients between the predicted and actual values were obtained as very close to 1 for all five iterations. The MSE scores between the predicted and actual values were determined as very close to 0 for all five iterations. The average correlation coefficient of the five iterations was calculated as 0.91785, and the average MSE of the five iterations was computed as 0.01046, indicating a strong linear relationship between the predicted and actual values. As a result, cross-validation analysis reveals that the predictive performance of the vertical rail wear regression model is satisfactory.

For the lateral rail wear model, a fivefold cross-validation analysis was performed, similar to that conducted for the vertical rail wear model. The results of the cross-validation analysis carried out for the lateral rail wear model are presented in Table 20. According to Table 20, the correlation coefficients between the actual and predicted values were determined as close to 1, while the MSE scores between the predicted and actual values were obtained as very close to 0 for all five iterations. The average correlation coefficient of the five iterations was computed as 0.87184, and the average MSE of the five iterations was calculated as 0.00962, implying a strong linear relationship between the actual and predicted values. The results of the cross-validation analysis indicate that the predictive performance of the lateral rail wear regression model is satisfactory.

Table 20 Results of cross-validation analysis conducted for lateral rail wear model

6 Conclusions and Recommendations for Future Research

The effects of traffic load, track curvature, superelevation, and train speed on vertical and lateral wear of the rail are investigated by using a multiple linear regression analysis method. Being one of the busiest railway lines in Istanbul, the Yenikapi–Ataturk Airport LRT line was selected as the case study. The data concerning the date and location of rail replacements performed on the Yenikapi–Ataturk Airport LRT line were collected between 1 January 2012 and 31 December 2016, which is the time period considered within the scope of the present study. Vertical rail wear at 476 points and lateral rail wear at 451 points located on the LRT line were measured by using a rail head wear measuring device between 30 October 2013 and 10 May 2016. To calculate traffic loads affecting the rail at the rail wear measurement points, 120 passenger-counting studies were conducted between 7 February 2018 and 29 April 2018 to cover all stations of the LRT line. The passenger counts were carried out in all wagons of the train set on both weekdays and weekends covering all working hours when the LRT line was open for operation. Depending upon the results of the passenger counts and the Istanbul-card data recorded at the stations, the number of passengers carried inside the train on the track sections and the related traffic loads were determined. Values of track curvature and superelevation at the rail wear measurement points were obtained from the profile of the LRT line, while train speed values for rail wear measurement points were specified by utilizing the “speed–distance” diagram of the trains operated on the line.

Two separate multiple linear regression models for vertical and lateral rail wear were developed to identify the effective parameters on the amount of vertical and lateral rail wear. The correlation matrix of dependent and independent variables examined prior to performing multiple linear regression analysis revealed a weak linear relationship between the independent variables. Independent variables in multiple linear regression model for vertical wear include traffic load, track curvature, superelevation, and train speed, while the dependent variable is the amount of vertical rail wear. The multiple linear regression model for vertical wear produced a multiple correlation coefficient of 0.9180, indicating a strong linear relationship between the dependent and independent variables. The adjusted R2 obtained from the regression model shows that 84.14% of the variance of the dependent variable can be explained by the independent variables. The F-test in ANOVA generated an F-value of 630.9581 and a p-value of 0.0000 as the significance F, implying that the regression is overall significant at the significance level of α = 0.05. The significance of each predictor variable was specified based upon the p-value for the t-test. The p-value for traffic load was determined as 0.0000, which means that traffic load has a statistically significant effect on the amount of vertical rail wear. However, the p-values for track curvature, superelevation, and train speed were found as 0.6209, 0.3311, and 0.9352, respectively, signifying that track curvature, superelevation, or train speed do not have a statistically significant effect on the amount of vertical rail wear.

Independent variables in multiple linear regression model for lateral wear include traffic load, track curvature, train speed, and superelevation, whereas the dependent variable is the amount of lateral rail wear. The multiple linear regression model for lateral wear generated a multiple correlation coefficient of 0.8745, implying a strong linear relationship between the dependent and independent variables. The adjusted R2 obtained from the regression model indicates that 76.26% of the change in the dependent variable can be explained by the independent variables. The F-test in ANOVA produced an F-value of 362.4583 and a p-value of 0.0000 as the significance F, showing that the regression is overall significant at the significance level of α = 0.05. The contribution of each independent variable to the regression model was determined by considering the p-value for the t-test. The p-value for traffic load was found to be 0.0000, which signifies that traffic load has a statistically significant effect on the amount of lateral rail wear. However, the p-values for track curvature, superelevation, and train speed were obtained as 0.3698, 0.6541, and 0.9390, respectively, meaning that track curvature, superelevation, or train speed do not have a statistically significant effect on the amount of lateral rail wear.

Multicollinearity tests were performed by taking into account the VIF values for each explanatory variable in the vertical and lateral rail wear regression models. For both the vertical and lateral rail wear regression models, the VIF values of each explanatory variable including traffic load, track curvature, train speed, and superelevation were obtained as very close to 1. Since the VIF values for all explanatory variables of both models are lower than 5, it is concluded that multicollinearity is not a problem for the vertical or lateral rail wear regression models.

A fivefold cross-validation technique was used to evaluate the predictive performance of the vertical and lateral rail wear regression models. According to the results of the cross-validation analysis performed for the vertical rail wear model, the average correlation coefficient was calculated as 0.91785 and the average MSE was computed as 0.01046, indicating a strong linear relationship between predicted and actual values. The results of the cross-validation analysis conducted for the lateral rail wear model show that the average correlation coefficient was computed as 0.87184 and the average MSE was calculated as 0.00962, implying a strong linear relationship between the actual and predicted values. As a result, cross-validation analyses reveal that the predictive performances of both vertical and lateral rail wear regression models are satisfactory.

As a recommendation for future research, it would be interesting to investigate the effects of traffic load, track curvature, superelevation, and train speed on vertical and lateral rail wear by using different methods such as artificial neural networks, fuzzy logic, or the genetic algorithm. In addition, it would be very beneficial to analyze the effects of other parameters such as railway superstructure type (ballasted track or slab track), sleeper type (concrete or wooden sleepers), and structural characteristics of the track section (tunnel, viaduct, or grade crossing) on vertical and lateral rail wear. Considering that this study was carried out on an LRT line, it would be valuable to conduct similar research for railway tracks with different features such as tramways, metros, or high-speed railway lines.