Introduction

App-based ridehailing services, often known as Transportation Network Companies (TNCs), have revolutionized the customer experience in urban centers in recent years. TNC firms such as Uber and Lyft often provide more abundant, reliable, and cheaper service than taxis, their closest competitor (Brown and LaValle 2021), leading to rapid growth in ridership. Within San Francisco, for example, ridehailing accounted for 15% of intra-city vehicle trips in 2016 (SFCTA 2017).

A large number of studies have analyzed the consequences of ridehailing for travel behavior and congestion. The most common finding is that ridehailing induces users to make more trips, and that it shifts trips away from private cars, walking, and public transit (Rayle et al. 2016; Hampshire et al. 2017; Clewlow and Mishra 2017; Gehrke et al. 2019; Babar and Burtch 2020; Bradley et al. 2022). In San Francisco, ridehailing has been the largest contributor to increased congestion in recent years (Erhardt et al. 2019). However, ridehailing can improve mobility, particularly in neighborhoods where car ownership is low (Brown 2019a) and for older adults (Leistner and Steiner 2017). In some cases, ridehailing can also complement transit use by filling gaps in the reach of scheduled bus and rail services or providing first mile/last mile access (Hall et al. 2018; D. A. King et al. 2020), although most studies find that ridehailing takes trips away from transit (e.g. Dong 2020; Jin et al. 2019).

Less attention, however, has been paid to the strategies of ridehailing drivers, and in particular what they do between paid rides. Most analyses focus on the paid, with-passenger portion of a ridehail trip, but deadheading—such as driving to the next pick-up location and cruising while waiting for a trip request—may have major consequences for the environment and congestion (Ward et al. 2021). Driver choices regarding whether and where to park while waiting for the next trip also affect curbspace and parking availability. Thus, understanding deadheading behavior is important for developing municipal policies for regulating and pricing ridehail services, such as congestion surcharges, and for allocating and pricing curbspace (Strong 2015; Li et al. 2019; Marsden et al. 2020).

In this paper, we quantify the choices that ridehail drivers make between paid trips. We focus on the period of time when the driver is available (the app is turned on, but the driver has not yet accepted a trip request), which we call search travel or search time, and is sometimes referred to as Period 1 or P1. We do not quantify other types of deadheading, which we define as any period when the ridehail vehicle is not occupied by a passenger.

We develop a method to partition search travel into cruising, repositioning, and parking segments, and apply it to a dataset of 5.3 million trips in San Francisco. We find that while almost all trips involve repositioning (traveling to another location where demand is expected to be higher), a surprising portion (29%) entail at least some cruising. We develop a regression model to quantify the factors associated with driver choices, and find that ridehail drivers appear to reposition to neighborhoods where ridehail demand is high, but the model also suggests that drivers may avoid neighborhoods with high proportions of residents of color. A key limitation of our analysis is that we have no way to assess a driver’s intent or reasoning; we are limited to examining their paths of travel.

The rapid growth of ridehailing mean that our findings are relevant to policymakers dealing with present-day transportation challenges. However, our results also provide a preview of what might be expected in a future with autonomous vehicles, whose transportation and environmental consequences may bear many parallels to those of ridehailing.

Driver behavior: comparing taxis and ridehail

Taxi drivers in large cities often cruise along busy streets in search of a street hail, or reposition to major trip generators such as airports and hotels. In New York City, for example, cruising and repositioning account for 44% of miles driven by taxicabs, with an average of 2.9 miles of deadheading between trips (Abrams et al. 2007, p. 124). Driving around rather than waiting at a taxi stand may be rational from the taxi driver’s perspective, as it makes the vacant taxicab visible to prospective passengers, but its impacts from a social welfare perspective are mixed. On the one hand, a ready supply of available taxis reduces wait times for passengers, but cruising taxis are highly visible contributors to congestion. Thus, limiting cruising has often been a key goal of taxicab regulators and a justification for limits on the number of taxicabs (Shreiber 1975; Yang et al. 2005; Abrams et al. 2007).

Another long-standing regulatory challenge has been to ensure the availability of taxis in low-income neighborhoods and communities of color, which typically experience longer wait times. Drivers often decline to accept calls for service to such neighborhoods, and also tend to reposition away from them after dropping off a passenger due to perceptions of lower demand, fears for their personal safety, and racial profiling (Davis 2003; Ingram 2003; Brown 2019b). Regulatory responses have included enforcement “stings,” but also programs such as New York City’s “green cabs,” which can only pick up passengers outside of the high-demand areas of Lower Manhattan and the airports (King and Saldarriaga 2018).

To what extent do these findings translate from taxis to ridehailing? Both sets of drivers should seek to maximize the expected net revenue from their next paid trip, and minimize search time and travel. The options open to taxi and ridehail drivers are also similar. They can park (or equivalently, wait at a taxi stand), cruise around while remaining in the same general neighborhood, or reposition to a different neighborhood where they expect demand to be higher. Cruising and repositioning are often conflated in the literature (e.g. Henao and Marshall 2019; Nair et al. 2020), but conceptually the two categories of search behavior (cruising and repositioning) are distinct.

While the options of taxi and ridehail drivers may be similar, their optimal strategies are likely to be considerably different because their costs and sources of information differ in four main respects. First, while taxi drivers must normally be conspicuous to passengers hailing a taxi on the street,Footnote 1 the app-based system used by ridehail firms renders such visibility unnecessary. Second, a first-in, first-out rule typically applies at taxi stands at hotels, airports, and other major trip generators. In contrast, ridehail drivers are subject to the opaque methods that ridehail firms use to match drivers with passengers, and the incentives that the firms use to encourage drivers to head to specific locations and to start or extend their shifts. Third, while taxi drivers might rely on heuristics or experience to identify high-demand locations, ridehail drivers have access to real-time information on demand patterns through their smartphone app. Fourth, taxi drivers may have lower costs for repositioning if, as in cities such as San Francisco, they have access to bus lanes or dedicated taxi stands.

As a result, one would expect ridehail drivers to cruise less frequently than taxi drivers. For a ridehail driver, parking is likely to provide similar prospects to cruising in terms of obtaining the next paid ride, without the costs of fuel and vehicle wear and tear. Since drivers can easily move if and when an enforcement officer arrives, they have little need to pay for parking either. Indeed, many online guides and fora for ridehail drivers (such as Reddit’s r/uberdrivers) exhort drivers to save money by parking rather than driving around in circles. However, the online fora also provide examples of drivers who are unsure of the optimum strategy, or who prefer to cruise. One Reddit user says: “I keep moving…I have loops I drive. I would probably park if I wasn't getting 40 mpg.”Footnote 2

The relative advantages of repositioning for taxis and ridehailing, in contrast to those for cruising, are not intuitively clear, but one might expect shifts in the destinations and times of repositioning. Given the dynamic information available to ridehail drivers, they might be expected to reposition to a broader range of destinations, not just the hotels and airports that are obvious sources of demand for taxis (Dempsey 1996; Schaller 2007).

Little empirical work, however, exists to support or refute these hypotheses. Data sharing by ridehail firms such as Uber and Lyft has been extremely limited, meaning that most researchers have focused on the paid portion of the trip which is easier to observe through field or household surveys (e.g. Grahn et al. 2020; Brown and LaValle 2021). Deadheading behavior is harder to identify, and often, the distance driven while searching for rides is simply assumed (e.g. Tirachini and Gomez-Lobo 2020) or simulated based on assumptions of rational driver behavior (e.g. Komanduri et al. 2018; Gurumurthy et al. 2020). In almost all travel demand models, the vehicle dematerializes after dropping off a passenger, only to reappear on the network at the start of the next paid trip.

Among the exceptions, Henao and Marshall (2019) find that deadheading accounts for 41% of the miles driven by ridehail drivers, but this estimate is based on data from a single driver—the first author. Several studies use a dataset released by RideAustin to impute deadheading based on pick-up and drop-off locations. While the actual paths taken by drivers are uncertain, the data indicate that 37–45% of total miles driven were by deadheading vehicles (Komanduri et al. 2018; Wenzel et al. 2019). In California as a whole, analysis of data provided by ridehail firms (under a legal requirement) indicates that deadheading accounts for 39.5% of miles driven (CARB 2019). In Manhattan, a similar analysis puts the proportion at 40% (Schaller 2021). Geographically, the broadest estimates are made by Cramer and Krueger (2016) using proprietary data provided by Uber; they find that deadheading accounts for 39% of miles by Uber drivers across five major cities. Proprietary data from Uber and Lyft are also used by Martin et al. (2021), who find that search travel (a subset of deadheading) accounts for an average of 34% of total miles in three regions—San Francisco, Los Angeles, and Washington, DC. Finally, a study commissioned by Uber and Lyft puts the proportion of deadheading at 38–46% in a set of six metropolitan regions (Fehr and Peers 2019). Their breakdown indicates that 28–37% of the distance is driven while waiting for a ride request (i.e., search travel), and 9–10% while driving to the pick-up location after accepting a request.

These estimates are remarkably consistent. They suggest that deadheading by ridehail vehicles is substantial at about 40% of the total distance driven. This consistency comes in spite of different methodologies, data sources, and scopes—for example, whether they consider travel between a driver’s home and the first activation of the ridehail app, or whether they consider cruising or assume shortest-path travel distances. Surprisingly, estimates of deadheading for ridehail services are not much less than those for taxis in the pre-ridehail era, in spite of the information advantages held by the former.

Studies of racial equity, meanwhile, suggest that discrimination still exists in the ridehail market, although perhaps to a lesser extent than with conventional taxis. At the individual level, field audits that requested rides in Boston and Washington, DC found that cancellations doubled when using an African American-sounding name rather than a white-sounding name (Ge et al. 2020; Mejia and Parker 2021). Studies of wait time are mixed: aggregate wait times for ridehailing requests in Austin are longer in neighborhoods with a higher proportion of people of color, after controlling for residential and employment densities and average income (Yang et al. 2021), but a study in Seattle found no such effect (Hughes and MacKenzie 2016).

Research approach

Ridehail data

We used a unique dataset of 5.3 million ridehail trips in San Francisco from November 12, 2016 through December 21, 2016, compiled by researchers at Northeastern University by querying the Uber and Lyft Application Programming Interfaces (APIs) which give access to vehicle locations. The data returned by the servers includes a unique identifier, vehicle type, and a vector of timestamped latitude and longitude coordinates that reflects each vehicle’s recent path. When a vehicle driver has accepted a ride and is no longer available, or has ended their shift, the vehicle no longer appears in the information returned by the servers. Similarly, when a vehicle driver drops off a passenger and becomes available again, or when a driver starts a shift, the vehicle appears in the information from the server. An important distinguishing difference between the data revealed by Uber and Lyft is that while Uber appears to assign a new unique identifier to every vehicle after it has completed a trip, Lyft allows the vehicle identifiers to persist across the entire sampling period.Footnote 3

Further details of data acquisition, processing, and validation are elaborated in Cooper et al. (2018), and a summary is given in the Online Appendix. The dataset has been used in several empirical analyses, most notably an assessment of the congestion impacts of ridehailing in San Francisco (Erhardt et al. 2019), and a profile of TNC activity in San Francisco (SFCTA 2017). However, those analyses focus on the occupied (paid) portions of the rides, rather than the search portions on which we focus here.

Each trip in the dataset consists of a sequence of points with geographic coordinates and a timestamp. On average, the points are 3.0 s apart. We cleaned the dataset to drop points with invalid coordinates, restricted the dataset to trips within the city of San Francisco, and excluded shared (e.g., Lyft Line) and delivery (e.g., Uber Eats) trips. Note that the dataset only includes points when the ridehail app is turned on and the driver is available to accept a ride, which we call search trips (so-called “P1” miles in California regulatory parlance). Our data does not capture travel between ride acceptance and passenger pick-up (“P2” miles).

We map-matched each trip to the OpenStreetMap road network in order to provide more accurate estimates of driving distances that are not affected by irregularities in the GPS trace. We used a three-stage process: (1) matching GPS points to OpenStreetMap (OSM) links using Mapillary’s publicly available map-matching algorithm,Footnote 4 (2) dropping links where the preceding and succeeding links directly connect, in order to eliminate out-and-back detours down side streets, and (3) interpolating gaps in the link sequence using the turn-restricted shortest path function in the pgRouting software package.

Classification of behavior

We classified each pointFootnote 5 as short, parking, cruising, or repositioning as follows:

Short points are those on trips where either (1) there are fewer than six GPS points or (2) the trip duration is less than two minutes. For these trips, it was not possible to determine the driver’s intent. Except where indicated, short trips are excluded from the subsequent analysis.

Parking points are defined as a cluster of points within any three-minute interval where at least 90% of the points are within 7.5 m of each other. After identifying these clusters, each point within the cluster was classified as parking, and the parking location was defined as the closest point to the centroid of the cluster. To avoid classifying vehicles stuck in congested traffic as parked, we created exceptions where time- and location-specific traffic speeds (obtained from INRIX) were less than three mph, or where the GPS point was on a freeway. In these instances, the parking classification was not applied.

Cruising points are those that involve circling or backtracking. We first identified cruising at the trip level using the definition in Weinberger et al. (2020)—trips where the actual (map-matched) distance is at least 200 m longer than the shortest-path network distance. Within each cruising trip, however, the driver may not be cruising the entire time. Therefore, we identified the cruising portion of each trip as a function of the path of the squared displacement—the squared (Euclidean) distance from each point to the origin. This metric is often used in movement ecology studies to distinguish the movements of individual animals, such as deer collared with a GPS tracker, and can distinguish between migratory, non-migratory, and dispersing behavior (Killeen et al. 2014; Singh et al. 2016).

Specifically, if we plot the squared displacement over time, a positive slope indicates that the driver is moving away from the origin. A negative slope shows that the driver is returning towards their origin (i.e., the start of the search trip). After smoothing the standardized slope,Footnote 6 consecutive points with a slope of +1 form a positive segment, and consecutive points with a slope of -1 form a negative segment. We therefore classified a point as cruising if the trip involves cruising per the definition above and either (1) the point is on a negative segment, or (2) the point is on a positive segment, but its squared displacement is offset by a subsequent negative segment. Figure 1 provides an example.

Fig. 1
figure 1

Example of cruising and repositioning segments. The driver’s route is shown in the left panel, with the right panel showing how squared displacement changes over the route. The first segment (marked in black) is classified as repositioning because the squared displacement keeps increasing, indicating movement away from the origin. The subsequent segments are classified as cruising because backtracking is involved. Each of the three pairs of cruising segments has a positive segment which is offset by a subsequent negative segment, as shown by the three pairs of segments labeled in the figure. For the pink cruising segment, the positive segment (1+) is offset by the negative segment (1−). Similarly, for the blue segment, (2+) is offset by (2−), and for the orange segment (3+) is offset by (3−)

Repositioning points constitute the remainder of the data set. In other words, all other points (i.e., those that are not classified as parking, cruising, or short) were classified as repositioning.

One key limitation of our analysis, discussed further in the conclusion, is that we are unable to link these patterns of behavior to driver reasoning and specific intent. Further uncertainty is added by the scraped nature of the data; while the validation discussed in the Online Appendix suggests that estimated trip volumes and patterns are consistent with independent data sources on ridehail activity, we could not directly verify that our data fully reflect search travel. In addition, our classification depends on several arbitrary thresholds, in particular the 200 m difference between the actual distance and the shortest path network distance, which is set to be longer than the typical 100 to 150 m long San Francisco blocks. The sensitivity analysis in the Online Appendix, however, shows only modest effects from varying this threshold. Eliminating it altogether increases our estimate of distance cruised from 23 to 25% of search travel, while doubling the threshold to 400 m reduces cruising to 20%.

Other data sources

We attached the covariates shown in Table 1 to each point. For most variables, we used data at the level of the Transportation Analysis Zone (TAZ), the geographical unit used in analysis by the San Francisco County Transportation Authority. We produced a weighted average for each point by aggregating the values for the TAZ containing the point and neighboring TAZs where the neighboring values were weighted using a distance decay function. This smoothing algorithm avoids abrupt changes in the values of the variables at TAZ boundaries, and also reflects how drivers are likely to perceive gradual changes in neighborhood demographics and parking supply. There are 981 TAZs in San Francisco, with a mean surface area of 0.12 km2. We merged the TAZ level covariates to the point level data, and added lagged dependent variables (indicating prior driver behavior) and time of day and day of week variables for each point. For Lyft trips, we also calculated driver experience, measured as the number of trips by that particular driver observed in the dataset. (The Uber API does not provide a persistent driver identifier.)

Table 1 Descriptive statistics

Regression analysis

We used multinomial logistic regression to estimate the effects of covariates in Table 1 and interaction terms on the driver’s decisions to reposition, cruise, or park. We use these variables because both basic theory and previous studies (e.g. Ghaffar et al. 2020; Grahn et al. 2020; Hughes and MacKenzie 2016) suggest their importance for ridehail demand and/or ridehail availability, in turn implying that they may affect a driver’s decision to reposition, cruise, or park. We include several measures of parking supply due to their effect on both ridehail demand and a driver’s ability to park.

To avoid serial correlation of the error terms, we downsampled the data to 1-min resolution. The downsampled dataset is about 5% of the full dataset. For computational reasons, our regressions use a 40% subsample of this downsampled dataset. Because the distributions of most non-ratio numeric covariates are right-skewed, we applied a log transformation on the non-ratio covariates. This can further avoid serial correlation and strong effects from extreme values. Since the magnitudes of covariates have a large variation, we also normalized all numeric covariates by subtracting the mean and then dividing the value by the standard deviation of each covariate.

We also tested the robustness of our results to key modeling assumptions in two ways. First, we used a nested logistic regression to model a process where drivers first choose between repositioning and remaining in the same area, and if the latter, choosing between cruising and parking. The hypothesis is that with low demand, drivers would prefer to reposition to another place, while with high demand the driver would choose between parking and cruising. Second, we aggregated the point level data to the TAZ level with different times of day and days of week, and then ran a fractional multinomial logistic regression of the ratio of points for each behavior on the covariates. Fractional logistic models are designed for aggregate data where the dependent variable is a proportion, rather than a binary or categorical outcome.

Results

Classification of driver behavior

We begin by presenting the broad patterns of driver behavior in terms of the choices between parking, cruising, and repositioning. Table 2 and Fig. 2 show the percentage of time and distance driven in each of the categories. Repositioning accounts for the majority of search time and distance traveled, and almost all trips involve at least a small amount of repositioning. Perhaps surprisingly given the fuel and wear-and-tear costs of cruising, more time is spent cruising than parking, and the average search trip cruises for nearly half a kilometer.

Table 2 Classification of driver search behavior
Fig. 2
figure 2

Driver behavior when searching for rides. Short trips (defined as fewer than six GPS points or lasting less than two minutes) are not further categorized, as we have insufficient data to classify the drivers’ behaviour

As shown in Table 2, the average search distance traveled is 0.98 km (0.6 miles). The average paid ride is 4.2 km (2.6 miles), based on a previous analysis of the same dataset (SFCTA 2017). Therefore, the search portion accounts for 19% of ridehail vehicle travel. Note that this estimate excludes travel before the driver activates the app, and between accepting a ride request and picking up the passenger.

Drivers for the two ridehail firms operating in San Francisco—Uber and Lyft—spend almost identical proportions of their time across the three categories of parking, cruising, and repositioning. However, search trips are longer for Lyft drivers (5.5 min and 1.35 km, compared to 3.6 min and 0.86 km for Uber drivers). Lyft drivers also have a smaller proportion of short search trips (36% compared to 46% for Uber). Since Uber accounts for three-quarters of the trips in our sample, it is possible that economies of scale lead to their drivers obtaining a paid fare more quickly, reducing the amount of search travel required.

There is surprisingly little geographic variation in the three behaviors across the city (Fig. 3). Drivers finding themselves in the ring of dense residential neighborhoods around the downtown core are more inclined to park rather than reposition or cruise, but the effects are not strong. Northeastern San Francisco—the densest part of the city—accounts for the largest share of search time (Fig. 3) and trip starts and ends (Fig. 4). There is a noticeable concentration of trip starts on freeway corridors, perhaps reflecting drivers turning on their app as they enter the city. Otherwise, there is no obvious geographic pattern in the number of search trip ends minus the number of trip starts (net trip flows), with Fig. 4 showing a patchwork quilt across the city. The exception is along freeways, where for obvious reasons there is a net movement away from these facilities.

Fig. 3
figure 3

Geographic patterns in parking, cruising, parking, and repositioning. A, B and C shows the fraction of time within each TAZ spent parking, cruising, and repositioning respectively. Each category spans a ten percentage point range (e.g. 40–50% below average, 30–40% below, etc.) Most of the color hues are in the center of the distribution, especially for cruising, indicating that behavior is relatively uniform across the city. D shows the distribution of search time across the city, normalized to land area and expressed as thousand hours per square kilometer

Fig. 4
figure 4

Net search flows. A, B and C show the number of search trip starts and trip ends in each TAZ, normalized to area and express as deciles. C shows the net movement, with red-shaded TAZs having more trip starts than ends (a net movement away) and blue-shaded TAZs having more trip ends than trip starts. (Color figure online)

Parking

We now consider the characteristics of parking events. The map in Fig. 5 (left panel) shows a concentration in the inner ring of dense residential neighborhoods. Within this general area, however, drivers find a range of parking options. Off-street parking is most visibly concentrated in grocery store surface parking lots, gas stations, and similar locations, where drivers may be able to linger for a short time before being moved on by security staff or parking attendants. On-street parking is spread more diffusely, but concentrations are evident along neighborhood commercial corridors. In some cases, ridehail drivers park on blocks where driveways, fire hydrants, loading zones, or other restrictions preclude parking for regular vehicles, but mean that curb space is readily usable by ridehail drivers who can quickly move if needed. These concentrations are most visible in an interactive online version of the parking map (right panel of Fig. 5), available at https://tncparking.sfcta.org.

Fig. 5
figure 5

Concentrations of parking locations. Each location is weighted by the length of time parked. The right panel shows a screenshot from the interactive online map available at https://tncparking.sfcta.org. Blue symbols denote on-street parking, and red symbols denote off-street parking, with a gas station and surface lots at two grocery stores being readily apparent. (Color figure online)

Overall, almost all the time spent parking (93% of the total duration) occurs on-street. Non-metered on-street spaces (both legal and illegal) account for the majority of ridehail parking, with the largest share (31%) occurring on residential streets (Table 3 and Fig. 6). Parking at meters accounts for just over one-third of the aggregate time spent parked, but given that most drivers do not park at all while searching for a ride, this amounts to only 12 s in the average trip, of which 5 s are during metered hours. Thus, on a per-trip basis, the impact on parking availability is minimal, as is the revenue loss to the City (less than half a cent). However, given the 1.2 million ridehail trips per week in late 2016 (SFCTA 2017), aggregate meter revenue amounts to more than $200,000 per year, based on the typical meter rate of $2.50 per hour. This calculation also excludes time spent while loading or unloading passengers at meters, and stays of less than three minutes (the minimum length of a parking event in our analysis).

Table 3 Time spent parked (hours per week)
Fig. 6
figure 6

Distribution of time spent parked

Determinants of driver behavior

We now consider the associations between neighborhood characteristics and a driver’s decision to park, cruise, or reposition, using the logistic regression models discussed in the Research Approach section. Two coefficients are attached to each variable, indicating the associated change in the probability of repositioning and cruising respectively, compared to a baseline behavior of parking. All coefficients are shown in Table 4 and, with the confidence intervals graphically represented, in Fig. 7.

Table 4 Regression coefficients
Fig. 7
figure 7

Confidence intervals for regression coefficients. Note that the chart omits the lag behavior coefficients, which are much larger than the other covariates

The variables are standardized, and so each coefficient represents the effect of a one-standard deviation change. A positive sign indicates that that behavior is more likely compared to parking, and a negative sign that it is less likely. For example, drivers are less likely to reposition away from TAZs with a high proportion of White residents (coefficient of – 0.059), and slightly less likely to cruise (– 0.005), compared to parking.

Given the large sample size, most of the coefficients are statistically significant at conventional levels. However, they are hard to interpret given that there are three separate behaviors (parking, cruising, and repositioning); and interaction terms that allow our density coefficients to vary by time of day and day of week. Therefore, Fig. 8 plots the effects of each variable in terms of the probabilities of each behavior. Several findings emerge from these analyses.

Fig. 8
figure 8

Probability of specific behaviors given change in key variables. The plots show the probability of repositioning, cruising, and parking against changes in several key independent variables, which are normalized so that the x-axis indicates standard deviations from the mean. All other variables are held at their means. For example, the upper-left plot shows that repositioning is the most common behavior, but even more so in high-residential density neighborhoods. As the prevalence of repositioning increases with density, that of cruising declines, while parking remains at similar levels

Ridehail drivers tend to reposition away from neighborhoods with more parking, especially on-street parking as shown in Fig. 8a. This perhaps indicates that individuals might choose to drive their own cars to neighborhoods with plentiful parking, meaning less demand for ridehail services in these areas. This demand-side effect appears to outweigh the advantage to ridehail drivers of readily available parking.

Drivers also tend to reposition away from neighborhoods with a higher proportion of residents of color, and do the opposite in neighborhoods with more White residents (Fig. 8c–e).Footnote 7 These findings provide suggestive evidence that drivers avoid neighborhoods with more people of color, supporting the findings of the earlier research on both ridehail and taxi drivers discussed above.

As seen in Fig. 8g–i, the effects of density are perhaps initially counterintuitive. Drivers are more likely to reposition away from neighborhoods with higher residential or service employment density, even though these types of neighborhoods might be expected to generate more ridehail trips, whether due to the presence of bar and restaurant customers or the lower car ownership rates seen in dense residential neighborhoods. In contrast, drivers are less likely to reposition away from neighborhoods with a higher density of non-service employment. However, a more intuitive picture emerges when we consider how the effects of density change over the course of the day and week, through the interaction terms in the regression model. As illustrated in Fig. 9, while there is little change in the effect of density throughout the week, there are strong time-of-day effects. Drivers are more likely to reposition away from dense residential neighborhoods in the afternoon and evening, and less likely to do so in the morning and at night, presumably when more people are at home to request ridehail trips. The opposite patterns are seen with employment density (but not service and visitor employment density), with drivers more likely to reposition away from job-rich areas in the mornings, presumably when potential customers are traveling from home to work. In addition to perceptions of demand, lack of parking and traffic congestion may also be factors that affect repositioning decisions.

Fig. 9
figure 9

Effect of density on driver behavior by day of week and time of day. The plots show how the changes in probability (measured by odds ratios) for residential, employment, and service/visitor density vary with the time of day and day of week. Positive changes mean that the probability of cruising (red bars) or repositioning (blue bars) increases more than the baseline probability of parking, and vice versa. For example, the impact of household density is similar on weekdays and weekend days (top left plot). But drivers are more likely to cruise in and reposition away from higher-density neighborhoods in the afternoon and evening, and less likely at night and in the mornings (top right plot). (Color figure online)

Figure 8f also plots the effects of driver experience (estimated using the Lyft subsample only). Full-time drivers are less likely to cruise and more likely to reposition, suggesting that they are more aware of areas of high demand. A Lyft driver that handles one trip per day cruises for 27% of the time between paid trips, while one who handles 10 trips per day cruises 23% of the time.

Conclusions

The choices made by ridehail drivers about where to go between trips determine the overall impacts of ridehailing on vehicle travel and associated congestion and pollution, as well as on parking availability. At the level of the fleet as a whole, there is a tradeoff between the two—more time spent parked means less vehicle travel, but potentially greater impacts on the availability of space for parking, drop-offs and deliveries. More repositioning, on the other hand, decreases the pressure on curb space and off-street parking and allows the fleet to operate more intensively, but at the cost of more vehicle travel, congestion, and pollution. A certain amount of repositioning creates system-level efficiencies and is inherent in the business model of ridehailing (and in effect differentiates ridehailing from private chauffeur-driven cars), given that demand is not perfect symmetrical throughout the day. But a smaller fleet that parks less implies more repositioning.

Such tradeoffs between parking demand and vehicle travel would also apply to future autonomous vehicles, as demonstrated by Kondor et al. (2020) in the Singapore context. For a given number of trips, the more that the deployment of autonomous vehicles lowers parking demand, the greater the distance driven by deadheading vehicles. In this paper, we provide the first analysis of how ridehail drivers make these tradeoffs using a dataset of 5.3 million search trips in San Francisco.

We find that the average search segment between paid trips lasts 4.1 min, during which time drivers travel 1.0 km (0.6 miles). The average paid trip is 4.2 km (2.6 miles), meaning that searching for rides accounts for 19% of ridehail vehicle travel. Our estimated proportion of 19% is lower than the roughly 40% typically cited in the literature, but our data excludes the portion of the trip between accepting a ride request and picking up the passenger (i.e., “P2”). High demand and short distances within San Francisco may also account for our lower estimate, as previous studies have shown that deadheading tends to be lower in urban areas compared to suburbs and rural areas (e.g. Nair et al. 2020).

We classify points on each search trip as cruising, repositioning, or parking. Both repositioning and parking can represent rational behavior on the part of drivers seeking to minimize downtime and maximize revenue from their next trip. Indeed, our regression models suggest that drivers tend to make apparently reasonable choices between repositioning and parking, heading to high-demand locations based on the time of day. For example, they reposition away from dense residential neighborhoods in the afternoon and evening when demand is likely to be higher in other areas, but stay within those neighborhoods in the morning and at night. However, we also find suggestive evidence of racial disparities, supporting previous studies of both taxis and ridehailing (Ingram 2003; Ge et al. 2020) that indicate that drivers tend to avoid neighborhoods with high proportions of people of color. These disparities are relatively small and are not necessarily due to conscious or unconscious bias on the part of drivers. They may at least partly reflect the impact of other neighborhood characteristics that correlate with race, such as income and the presence of demand generators such as restaurants in predominantly White neighborhoods. Regardless of driver intent, though, the repositioning patterns that we identify are likely to lead to poorer availability and longer wait times in neighborhoods of color.

While cruising by traditional taxicabs makes them visible to potential passengers, it would seem to offer little advantage to a ridehail driver who can simply park instead. Therefore, perhaps our most surprising finding is that cruising accounts for 23% of search time and 22% of the search distance driven by ridehail drivers (excluding short trips). Cruising in lieu of parking means that the impacts on curb occupancy and meter revenue loss are smaller than might be expected, but those on congestion, pollution, and the other consequences of vehicle travel are greater.

Why do ridehail drivers cruise? This question is beyond our ability to answer with the present dataset, and future qualitative research might usefully probe driver decision-making processes. In some cases, a lack of available curb space or high levels of parking enforcement may be the cause. Possibly, drivers believe that they can game the trip allocation system by driving around to be closer to potential passengers, and thus being allocated a trip. Alternatively, psychological factors may be at work. Full-time drivers cruise less, suggesting that drivers learn over time that cruising is a suboptimal strategy.

More generally, our analysis is limited by the lack of information on a driver’s intent. The nature of our data mean that we are limited to analyzing the paths of travel; we do not know why drivers park, cruise, or reposition, or to what extent their chosen strategies are successful in increasing their hourly earnings. Our results highlight the opportunity for future research, possibly qualitative, to investigate further the strategies, heuristics, and reasoning that drivers employ in search of their next paid trip, and the roles of factors such as parking availability, parking enforcement, and the real-time driver information provided by ridehail firms through their apps.

A clearer understanding of motivations through further research would also inform policy responses. In broad terms, however, we suggest that cruising might partly be reduced through tweaks to driver-facing ridehail apps, prompting drivers to find a safe place to park while waiting for their next ride. It may also be possible for cities and other government agencies to regulate deadhead time. Cities, meanwhile, might consider how ridehailing can take advantage of curbspace in front of residential driveways and other curb cuts that are used only occasionally. Some ridehail drivers already park in front of driveways on an informal basis, as they can quickly move if a resident needs to access their garage.

Ultimately, however, revising fee structures to be distance- and time-based, regardless of whether a passenger is in the vehicle, may be the most efficient way for cities to address the external costs of ridehailing including congestion and pollution. Ridehail firms would pay these fees, and determine whether and how to pass them on to passengers. In addition, place-based time charges might be used as a proxy for parking fees, and to encourage drivers to park in locations where they do not compete with other curbspace users. While such fees would initially disadvantage ridehail firms, drivers, and passengers compared to private car trips, they could serve as a testbed for a broader congestion pricing scheme. Moreover, many cities already levy ridehail fees or taxes on a per-trip or percentage basis, but these charges only apply to the paid, with-passenger portion of a trip. To more comprehensively address pollution, congestion, and other externalities caused by ridehailing, policy makers need to extend these policies to encompass what drivers do between trips.