Introduction

Rural development and its links with access to markets, services, and employment centres is a common theme of the roads development literature (Edmonds 1998; Booth et al. 2000; Porter 1997, 2002a, b; Bird et al. 2002; Christiaensen et al. 2003; Stifel et al. 2003). As travel times to markets increase, opportunities for non-farm employment, income, and education levels decrease (Jacoby 2000; Fafchamps and Shilpi 2003). Increases in travel times to paved roads decrease farmland values and profits (Shrestha 2012), while geographical isolation decreases children’s school enrolment and lifetime educational achievement (Shyam 2007). Rural areas in developing countries are particularly affected by underdeveloped transportation infrastructure, and gains in accessibility resulting from investments can be drastic (Airey and Taylor 1999). In these settings, where populations are sparsely located, improving transportation standards can be expensive on a per-capita or cost-effectiveness basis and investments should be selected meticulously (Mikou et al. 2019; Lebo and Schelling 2001).

Existing tools for evaluating and prioritising investments based on their service-based accessibility improvements in rural areas are limited because their modelling approaches focus on urban transportation, where travel is minimally constrained by topography and the traversal of rugged terrain on foot for long periods of times is uncommon (Geurs and van Wee 2004; Páez et al. 2012). In rural settings, the time travelled on foot before reaching a road is often measured in hours—if not days—and thus requires significant attention to more accurately estimate overall walking-vehicular travel times. There have, however, been some recent advances to accessibility models so that their outputs are more realistic and suitable for the evaluation of rural road upgrades. Example environments leveraging such models include rural (Tanser et al. 2006), mountainous (Nelson and LeClerc 2007; Kosmidou-Bradley and Blankespoor 2019), or riverine (Lubamba et al. 2013). Most notable of these is a model recently developed by Banick et al. (2021), suitable for use with all the aforementioned environments by combining the most advanced approaches to various components of rural accessibility evaluation into a single model. Their model considers the effects of roads, topography, land cover, altitude, and waterways on combined walking and vehicular speeds. Using inputs such as these, the model is able to estimate the shortest travel time from any populated place to one of many service types when using a combination of walking and vehicular travel. Furthermore, using condition-specific speed modifiers and land cover datasets determined at different points in time, travel times can also be estimated according to seasons. The resulting specificity enables policymakers to compare rural road construction plans in more detail, according to their predicted impact on multi-modal travel gains, and expected cost.

Estimating the accessibility improvements resulting from the construction of a proposed road or a combination of multiple roads using methods such as the model of Banick et al. (2021) is only one facet of rural roads construction planning. When a large number of potential roads exists, constructing all of them is challenging due to cost constraints. Transportation planners must typically select the most productive subset of roads in a manner that maximises the resulting benefits as measured with respect to multiple objectives, while remaining cost effective and within budget. To ease the cognitive burden in identifying superior development plans, multi-objective combinatorial optimisation solution approaches can be used. The optimisation of rural roads development using such solution approaches presents a unique and understudied challenge, since the traditional focus is to identify the most productive subsets of roads designed for urban, relatively developed settings—likely to underperform in rural, sparsely populated settings (Shrestha 2018). In recent times, however, research into optimisation approaches for rural roads development has received more attention (Shrestha et al. 2014; Kumar and Jain 2000; Scaparra and Church 2005; Murawski and Church 2009; Shrestha 2018)—yet exhibiting two key limitations. First, while accessibility models that are more suitable for use in rural settings exist, they have not been capitalised on by the rural roads optimisation literature. Rural roads optimisation approaches remain focussed on roads and vehicular travel, and do not take into account the often significant amount of walking time required to reach roads in rural settings. Second, a handful of solutions is obtained at specific budget levels without any clarification as to how the example budget levels were determined. These solution approaches are therefore most suitable for implementation when a final budget has already been determined. The earlier stages of rural roads development planning nevertheless requires that a target budget be determined—in these stages, valuable insights that can be provided by heuristics are not being capitalised on.

In this paper, an innovative heuristic solution approach is developed for roads planning in rural settings and is unique as it is capable of providing decision makers with a wealth of information that can play a defining role in prioritising proposed roads and, most importantly, can aid in setting a project budget more intricately than before. The heuristic is capable of determining thousands of solutions that offer high-quality trade-offs in service-specific accessibility improvements, across a large budget span. This provides numerous new analysis opportunities which can aid in identifying a final budget that ensures satisfactory benefits to local communities, while remaining cost effective. The heuristic has been developed jointly with the most recent and sophisticated accessibility evaluation model from Banick et al. (2021), and exhibits a number of unique advantages over existing methods. Although accessibility improvement is the focus in this paper, the framework remains compatible with other criteria which may be considered in future work—e.g. maximising the number of towns connected by new roads, or minimising the total length of new roads—and is not limited to any specific service types, nor specific budgetary constraints.

A recent roadworks planning problem in Nepal’s predominantly rural and mountainous Karnali province is used as a test instance. The Karnali provincial government and the Rural Access Programme (RAP) proposed 92 roads and associated budget charges in the latest draft of Karnali’s 5 year Provincial Transport Master Plan (Rural Access Programme (RAP3) 2019). The service-specific accessibility improvement values for these roads, determined by Banick et al. (2021), are used as inputs to the heuristic, which was possible to determine thousands of high-quality candidate roads development plans within a large budget range. When compared to the actual roads selected by Karnali’s government, heuristic-determined solutions outperformed their plan by 1.5–3 times when evaluated with respect to accessibility improvements to healthcare, financial services, and municipal headquarters at equivalent cost levels, or yielded 1–2 times the improvements at half the cost.

The main contributions of the work presented here are summarised below.

  • The work presented in this paper is the first implementation of multi-modal (walking-vehicular) travel improvements in rural roads planning optimisation, as opposed to the traditional approach of considering road-travel improvements only. Gains from roads construction plans are therefore maximised with intricate consideration of the often significant time required by rural populations to first travel on foot before reaching roads and switching to vehicular travel.

  • A novel mathematical model formulation for roads optimisation that is compatible with roads classified into hierarchies of connectivity with respect to the existing roads network is derived, and implemented within a more traditional integer-linear programming (ILP) solution approach.

  • The presented heuristic exceeds the sophistication of algorithms implemented within transportation contexts, whether urban or rural. Most notable is its ability to search for solutions within a budget span instead of at fixed budget levels, achieved by an innovative scheme for the representation of candidate roads plans.

  • Thousands of high-quality solutions spread across a large budget allows policy makers to examine trade-offs in construction benefits and costs more intricately than previously possible. This helps to focus policymakers’ attention to the most critical roadworks, and to identify which criteria to consider or ignore during the planning process.

The remainder of the paper is organised as follows. In Section "Background", relevant background information is provided regarding measures of rural roads accessibility, existing approaches to rural roads planning, and optimisation considerations and approaches of importance to the presented work. The Karnali study area and the 92 proposed roads are discussed in more detail in Section "Data and methods", along with more detailed descriptions of the accessibility cost-time model implemented in this paper, and the solution processes followed to search for optimal road plans. The results of the heuristic and ILP roads-combination searches are presented in Section "Results", demonstrating the differences in solutions at different budget levels and how seasonality affects the results. More detailed discussion on the meaning and significance of the results are provided in Section "Discussion", including a number of analysis opportunities and future improvements. The paper closes with a brief conclusion in Section "Conclusion".

Background

Measures of rural accessibility

The most consistent living standards improvements from rural roads are not economic in nature but instead linked to improved access to services such as health (Van de Walle and Cratty 2002), finance (Binswanger et al. 1993), or education (Adukia et al. 2020; Kristjanson et al. 2005). In a small scale study of three Nepali village development committees, Sapkota (2018) found that households with longer travel times to services have higher poverty levels and lower levels of health, education, and reported happiness. Elsewhere, Hettige (2006) finds that rural roads most consistently benefit the poor through improved access to services when comparing Asian Development Bank projects in the Philippines, Indonesia, and Sri Lanka.

Measuring accessibility baselines and improvements is critical to establishing rural roads’ benefits. Simple metrics like linear distance, densities of roads and services, or reported travel times in household surveys are easy to compute or parse from existing data sources but are often difficult to scale and influenced by idiosyncratic survey responses (Ahlström et al. 2011) or inaccurate in mountainous areas (Huber 2015). Nevertheless, these are commonly used in economists’ evaluations of rural roads; for instance, Banerjee et al. (2012) and Thapa and Shively (2017) measure accessibility in terms of road length density per administrative area in China and Nepal, respectively.

Planners and researchers looking for greater accuracy and scalability will typically employ geospatial cost-time models of accessibility in which travel is represented in terms of the minimum time required to cross each cell in a gridded (raster) representation of the terrain surface. These times are calculated based on some combination of the underlying roads and road speeds, land cover, terrain, and walking speeds, amongst other factors. An example of such a gridded representation of a road (as viewed from above) and then converted to a gridded on-road speed raster is provided in Fig. 1. The advantage of modelling the terrain in this manner is that grid cells can model travel on the terrain surface as on-roads-network (driving), off-network (walking), or as a combination of the two (walk to a road and then use vehicular travel). Relevant examples leveraging these off-road capabilities to analyse accessibility in difficult contexts include the analysis of Kosmidou-Bradley and Blankespoor (2019) on accessibility to services in rural Afghanistan, the study by Ahlström et al. (2011) on poverty’s relationship to accessibility in southern Sri Lanka, models of accessibility to markets in Africa by Harvest Choice and International Food Policy Research Institute (IFPRI) (2016), and global-scale analysis of wealth stratification by accessibility to urban centres by Weiss et al. (2018).

Fig. 1
figure 1

Converting vector roads data to raster speed data. Road speeds depend on the road and surface type, and the underlying slope

Existing approaches to rural roads planning

A significant portion of road construction in Nepal is characterised by a lack of policy coordination, little to no study of anticipated benefits, haphazard construction standards, and disregard for environmental safeguards—particularly at the local level (World Bank 2018). Roads are often built based on political expediency, without reference to studies or master plans (Coburn 2020; World Bank 2018). Municipal authorities keen to demonstrate progress but lacking planning capacity frequently plough roads into hillsides with little regard for environmental outcomes, investment efficiency, or even basic engineering principles (Coburn 2020; Banick et al. 2021). The implementation of accessibility models is an unfamiliar notion in the planning of Nepalese roadworks—the implementation of advanced optimisation models such as those applied in this paper is even less likely. This is unfortunate, given that Nepal’s scarce resources and harsh terrain merit meticulous, targeted approaches to maximise gains.

Efforts are underway to equip local authorities with planning tools that accommodate their limited data collection and analysis capacity. Over the last 15 years the Rural Access Programme has worked with local governments in Nepal to design and implement Transport Master Plans that provide a measurable, data-driven basis for prioritising proposed roads. The most recent iteration is the Provincial Transport Master Plan, a standardised tool for provincial governments to prioritise roads based on a spreadsheet matrix of cost projections, one day traffic density studies, policy priority, and the population totals of administrative centres touched by roads (Rural Access Programme (RAP3) 2019). An extract of the master plan is provided in Fig. 2.

Fig. 2
figure 2

Road construction alternatives as ranked in Karnali’s draft Provincial Transport Master Plan (PTMP) (Rural Access Programme (RAP3) 2019)

Multi-objective solution approaches to (rural) transport planning

Pareto-optimal road plans

When presented with a large number of proposed roads and when limited by budget constraints, numerous potential development plans—i.e. candidate solutions—that are selected as subsets from all proposed roads can be identified. Multi-objective solution approaches may be employed to discover candidate solutions that provide superior trade-offs in cost and specific objectives. Each candidate roads plan that is evaluated during the solution process corresponds to a single point in objective function space—this is illustrated in Fig. 3 by a number of example candidate plans evaluated with respect to two accessibility maximisation objectives (plotted on the axes). This example figure shows the solutions evaluated with respect to only two objectives, but the same principles apply for three or more objectives. Minimising cost is often considered and visualised as a maximisation objective, which is typically achieved by maximising 1/cost.

Fig. 3
figure 3

An illustration of the notion of Pareto optimality and the translation of candidate roads plans to objective function space. Numerous candidate solutions are evaluated with respect to the objective functions, and those that are determined to be Pareto optimal (non-dominated) are sought for decision making purposes (Heyns et al. 2020)

Solutions such as those in Fig. 3 are classified either as non-dominated (superior) or dominated (inferior) (Zitzler et al. 2004). The set of non-dominated solutions exhibit superior trade-off alternatives when compared to the dominated solutions, and form what is commonly known as the Pareto-optimal front, or simply the Pareto front, as may be observed in the figure. Only the solutions in the Pareto front need to be presented to decision makers because of their superior quality; that is, dominated solutions are avoided. The Pareto front is determined using the same principles of domination for problems involving more than two objectives, and the notion of Pareto optimality remains applicable, albeit harder to visualise.

Two solution search approaches are generally followed to determine the solutions on the Pareto front, namely ILP and heuristics. Both approaches are followed in this paper and are described in more detail below.

ILP solution approach

An ILP approach is popular for rural roads planning. Murawski and Church (2009) determined roads-upgrade solutions that simultaneously improve a road network’s accessibility to healthcare locations, while limiting the associated financial costs. Specifically in the context of Nepal, Shrestha et al. (2014) explored a bi-objective ILP optimisation model for rural road network construction and upgrading in hilly regions. Their objectives were to minimise the overall user operating costs (which can be thought of as a roads usage difficulty measure) and to maximise the population served by the network. Using a similar approach to Shrestha et al. (2014), Shrestha (2018) presented a multi-objective decision support model for rural road network upgrades and construction with the objectives of minimising four population- and distance-based criteria at different budget levels.

An ILP approach provides a simple way to solve optimisation problems when compared to heuristic approaches (which require more sophisticated code and sensitive iterative parameter testing), and typically returns solutions of superior quality. Each execution returns a single solution that optimises the selection of decision variables (the candidate roads) so that the objective function is maximised or minimised according to specification, while ensuring that any specified constraints are enforced. A strong characteristic of an ILP approach is that the corner points of the Pareto front, which effectively optimise with respect to only a single objective while ignoring the others, may be determined exactly (Arora 2011). This is because ILP solvers are inherently designed for single-objective optimisation and determining the corner points only considers one objective, while determining points with respect to multiple objectives (between the corner points) limits the solvers’ Pareto-approximation abilities (Stewart 2007). These corner points indicate the maximum extent of the true Pareto front, which is useful for investigating additional solutions determined between these points by ILP and heuristics approaches (Heyns et al. 2021). An illustration of such corner points is provided in Fig. 4 for a problem with three maximisation objectives. The shaded surface is the general shape of where other Pareto optimal solutions may be expected, such as illustrated in the figure for five additional Pareto optimal solutions. In practical applications, the Pareto optimal solutions do not necessarily lie near the shaded surface, but tend to follow its general shape. The points are projected onto the obj-1/obj-2 plane to help with viewing perspective and to provide an indication of the solutions’ objective function values.

Fig. 4
figure 4

An example of a three-dimensional Pareto front determined with respect to three maximisation objectives (Heyns et al. 2020). The corner points are determined with consideration given to a single objective only, while other solutions along the front may be determined using a weighted-sum objective function

In order to determine the solutions between the corner points with ILP solvers requires transforming the multiple objective functions into a single objective function using a weighted sum (Cohon 1978; Marler and Arora 2010). In its simplest form, a weighted-sum objective function for a problem with \(N_i\) objectives is given by

$$\begin{aligned} \mathrm {maximise}\ O_s = \sum _{i=1}^{N_i} w_i V_i \text {,} \end{aligned}$$
(1)

where the objective function values achieved with respect to each objective, \(V_i\), are combined using weights \(w_i\) that reflect the relative importance of each objective. By varying the objective weights in multiple runs, the solver may be “directed” to a specific region between the corner points, and Pareto optimal solutions along the shaded area (such as those in Fig. 4) may be traced out. The weighted-sum approach remains a popular choice to solve multi-objective optimisation problems from various recent applications, such as building design (Machairas et al. 2014), carpool matching (Xia et al. 2019), and land-use allocation (Yao et al. 2018), and has also been used for rural roads planning research (Shrestha et al. 2014; Shrestha 2018).

The weighted-sum approach does hold some disadvantages. Determining points on the Pareto front may require a large number of weight combinations when many objectives are considered (ReVelle and Eiselt 2005; Tong et al. 2009) and determining suitable weight combinations to suitably explore the Pareto front is a sensitive and laborious process (Marler and Arora 2004). When different cost levels are to be considered, the multiple weight combination runs also need to be repeated at each cost level. Finally, an ILP approach is more sensitive to computational complexity and the size of problems which it can solve, and returns fewer solutions when compared to a heuristic approach.

Heuristic solution approach

Heuristics are often employed for approximating the set of solutions on the Pareto front when the problem involves a significant computational challenge and becomes prohibitively large to solve within realistic computation times (Xiao et al. 2002; Zitzler et al. 2004; Jia et al. 2007; Heyns and van Vuuren 2018), and can obtain well-performing solutions within reasonable computation times compared to ILP. Heuristics target and explore promising regions of the solution spaceFootnote 1 in order to arrive at solutions that are approximated to be Pareto-optimal, and in the process avoid the computationally expensive consideration of solutions in inferior regions of the solution space. It has been demonstrated that heuristics are well capable of determining the true Pareto front within reduced computation times when compared to exact approaches (Kim et al. 2008; Heyns and van Vuuren 2016, 2018).

Heuristics are often considered for rural roads planning because they are easily compatible with the problem. Kumar and Tillotson (1991) investigated the use of an iterative “replacement-test” heuristic to solve their mathematical optimisation model, capable of determining numerous high-quality road network layouts that provide trade-offs in cost and travel times to fixed service/facility locations. They investigated a rural area in India, with travel time to market centres (which serves as a proxy for other essential services) investigated as the trade-off criterion against cost. Their objective was to provide all-weather access to every village in their study area, achieved by minimising the cumulative costs of upgrades to existing roads and the construction of new ones, while minimising the sum of travel costs required from each village to reach a selected service/facility location. This work was furthered by Kumar and Kumar (1999) and Kumar and Jain (2000), in which their model and solution approaches were extended to determine trade-offs for three accessibility criteria measured against cost (educational institution types were considered in this instance, but it remains possible to consider others). Scaparra and Church (2005) utilised and compared two heuristic approaches for solving their model that is capable of determining candidate road network development approaches in rural settings. Their model considers the improvement of all-season connectivity between villages and the efficiency of the routes between them—given a fixed budget. Their work delves deeper into heuristic approaches when compared to the preceding literature, which is a necessity as roads development projects increase in complexity.

Multi-Objective Evolutionary Algorithms (MOEAs) is one heuristic approach that may be considered to approximate a diverse set of trade-off solutions on the Pareto front (Fonseca and Fleming 1993; Purshouse and Fleming 2003). These solution approaches iteratively evolve an initial, randomly generated population of candidate solutions through carefully controlled evolution (based on natural principles) over multiple generations, finally arriving at a set of solutions that approximate the Pareto front (Deb et al. 2002). Due to their population-based nature (numerous solutions simultaneously being discovered and evolved), numerous solutions along the Pareto front can be approximated in a single run. Genetic algorithms (Deb et al. 2002) is a class of MOEAs that are well-suited to solving rural roads problems, and have been implemented for the optimisation of rural roads maintenance strategies (Mathew and Isaac 2014) and rural highway layout planning (Ma et al. 2014). Candidate solutions are represented in a chromosome format that is easily applied in the context of rural roads planning. Each individual road is assigned a number, and a candidate roads plan is represented as a chromosome string of selected road numbers. For example, a chromosome [23, 53, 99, 104, 156, 201] would therefore represent a plan with roads 23, 53, 99, 104, 156 and 201 selected for construction. Evolution-inspired selection processes and modification operators are iteratively performed on a randomly generated population of such chromosomes and terminates when the algorithm ceases to show significant improvement in the solution quality of successive generations, or when a maximum number of generations have been reached (Deb et al. 2002; Heyns 2016).

A modified version of the non-dominated sorting genetic algorithm-II (NSGA-II) (Deb et al. 2002) was developed for this paper, building on previous research and applications in which it has been used to evaluate the placement of radar and defence systems (Heyns and van Vuuren 2018), and cameras used for wildfire-detection (Heyns et al. 2021) and surveillance (Heyns and Van Vuuren 2015; Heyns 2021).

Data and methods

Karnali study area

Fig. 5
figure 5

Provinces and roads of Nepal. Note the relative sparsity of motorable roads in Karnali province

Nepal’s northwestern Karnali province (also known as Province 6, illustrated in Fig. 5), exhibits extreme levels of inaccessibility—even by Nepal’s standards. This is evident when comparing existing roads infrastructure in Karnali to those in other provinces in Fig. 5. Karnali has a predominantly rural population, and access to critical services is often measured in hours or days, walking is a primary transport modality, and it contains the only two districts (out of seventy-seven nationwide) whose headquarters are not connected by vehicular roads of any sort: Humla and Dolpa (Devkota et al. 2012). Karnali’s Multidimensional Poverty Index (MPI) headcount ratio is the worst of any province in Nepal and development is a policy imperative (National Planning Commission and Central Bureau of Statistics 2020).Footnote 2 Like much of Nepal, damages from rains and landslides further constrain access to services during the June-September monsoon season, when heavy rains intensely damage almost all of Karnali’s poorly maintained roads (Huber 2015). Significant reductions in travel speeds (40–80%) for affected road segments are common during this season, particularly for poorly engineered roads with inadequate drainage, and/or poor surfaces (Pantha et al. 2010). Since terrain and available budget effectively prohibit redundancy in roads, damage to a single road can severely impact entire regional populations.

Fig. 6
figure 6

Karnali’s population, roads, and services cluster in the lower altitude south (adapted from Banick et al. (2021))

Figure 6 shows the current distribution of the service locations relative to existing roads in Karnali. Because Karnali’s severe terrain impedes accessibility, many remote areas are poorly covered by existing roads and services, while branch roads usually only reach small populations and are costlier to construct in remote, mountainous areas in its sparsely populated north and northwest.Footnote 3 In Fig. 7 the existing and proposed roads and the populated areas that were considered for the planning of the Karnali Provincial Master Transport Plan are shown. The same roads and populated areas were considered in the solution approaches followed in this paper.

Fig. 7
figure 7

Existing and proposed roads that were considered for the planning of the Karnali Provincial Master Transport Plan and in the solution approaches followed in this paper

When upgrading road networks, it is common to assume that full road network connectivity is to be achieved; however, this is not always possible because enforcing full connectivity may result in lengthy, inefficient networks (Scaparra and Church 2005). This requirement is even more difficult or impossible to achieve in a context such as the Karnali roads planning problem, since the roads are scattered across a region that is larger than common in roads optimisation problems—with many proposed roads appearing in isolation, or to be added to local networks that are isolated from the regional network. Roads data quality is another obstacle in rural settings—in some instances very poor unpaved roads that connect to the larger known network exist, but are not included in the roads database due to their obscurity and the significant challenge associated with obtaining digitised versions of all such roads in remote and rural settings. For these reasons, full network connectivity is not a requirement in this paper, and optimisation results may return solutions with segments that appear to be isolated, but which nevertheless maximise accessibility to services on a local and regional scale.

The novel multi-season, multi-modal rural accessibility model

Model overview

The cost-time model of Banick et al. (2021) assumes that a person will begin their travel by walking, then proceed using vehicular travel (e.g. public transport) if a road exists, and then complete their journey by walking until their service destination is reached. The model considers the underlying topography, land cover, altitude, waterways, roads and road specifications to determine walking and vehicular travel speeds everywhere on the surface of the terrain, digitised into a high-resolution 30 \(\times\) 30 m gridded representation of the surface travel speeds. Combining the walking and vehicular speed grids forms a multi-modal “friction surface”—a grid with walking speeds where roads are absent, and vehicular speeds where roads are present (Banick and Kawasoe 2019; Banick et al. 2021). The time required to cross each cell can then be derived from its travel speed in the friction surface, and its spatial dimensions.

Multiple possible paths exist along the cells between an origin (populated place) and a destination service location. The model, however, identifies the least-cost-time path (fastest path) from the origin to the specific service’s site that requires the shortest accumulated travel time—calculated as the sum of the traversal times associated with each cell along the path between origin and destination. Using the same approach, travel times for the monsoon season can be determined by adjusting walking and vehicular speeds to reflect typical reductions experienced during the season.

Using this model, current multi-modal travel times with existing roads data can be determined between all populated places and services, after which proposed roads data can be “added” to the network and the updated travel times can be estimated. By determining the difference in travel times across populated areas before and after the addition of new roads, service-specific accessibility improvements resulting from the construction of different roads, or road-combinations, can be estimated. Banick et al. (2021) selected three services to capture a cross-section of developmental needs, and the same services are again investigated in this paper. These comprised health services sourced from the Ministry of Health of Nepal, formal financial institutions obtained from Nepal Rastra Bank (Nepal Rastra Bank 2017), and municipal headquarters compiled from assorted Nepali government documents. Freely available datasets are used in the model where possible, including the Shuttle Radar Topography Mission (SRTM) 30 m terrain model (Farr et al. 2007), High Resolution Settlement Layer (HRSL) raster population datasets (Facebook CIESIN 2016), OpenStreetMap road and river data (OpenStreetMap contributors 2019), and national land cover products (ICIMOD 2013).

A demonstration of the outcome of adding roads to the existing network and estimating their overall benefit according to the model of Banick et al. (2021) is provided in Fig. 8. In the figure, the actual roads that were eventually selected in the Karnali transport port plan (without use of the solution approaches in this paper) are shown, along with their resulting accessibility improvements with respect to municipal headquarters during the monsoon season. A number of additional roads-plan evaluation metrics are also observed in the figure; these are discussed and compared in detail when analysing solutions obtained in the results section. The accessibility improvements achieved by this roads plan with respect to the three selected services serve as a baseline with respect to which the optimisation-determined solutions are compared in this paper.

Fig. 8
figure 8

The roads selected in the Karnali transport plan and an example of its accessibility improvements achieved with respect to municipal headquarters during the monsoon season

It should be noted that road speeds are determined without consideration given to queueing or traffic delays, and the overall travel times therefore reflect regular traffic conditions. This simplified approach is considered suitable for rural roads and for the purposes of this paper, since rural roads are less heavily trafficked than urban ones and the occurrence of congestion is less regular.

Population-weighted gains

The final component introduced to the model of Banick et al. (2021) is to measure accessibility not only in terms of multi-modal travel times, but as a composite measure of person-hours (p-h)—graphically illustrated in Fig. 9 for a hypothetical scenario with exaggerated cell sizes. In (a) in the figure, the estimated total travel time to reach the service in the bottom-right corner is displayed, determined from each cell in the grid. In (b), populated places on the grid are displayed, including the size of the population and their fastest paths to the service. The p-h measure is then determined by multiplying the travel times in (a) with the population size in (b), resulting in the values displayed in (c) in the figure—with a value of 0 p-h where populations are absent. In this manner, when there is a reduction in travel time as a result of a change in roads infrastructure, the number of people that gain is also reflected in the model’s outputs.

Fig. 9
figure 9

Multiplying a Accessibility cost-time rasters against b Population values to compute c Person-hours. Exaggerated cell sizes are used for illustrative purposes—these are normally 30 \(\times\) 30 m

Referring back to Fig. 8, large areas of improvement without any population are observed, which would translate to a 0 p-h improvement. The values of interest are the p-h gains that overlap with each 30 \(\times\) 30 m populated area in the figure, scaled according to population as described above and in Fig. 9. The overall accessibility improvement with respect to a service and a specific season for a combination of roads is then the sum of all the total p-h gains—244 187 p-h for the roads selected in Fig. 8, with respect to municipal headquarters during monsoon season.

Sequences

Accessibility improvements are not only determined for individual roads, but for certain combinations of roads that are linked to each other (Banick et al. 2021). This is because there may be proposed roads that can only be constructed if others are constructed first (e.g. some roads that link directly to the existing road network should be constructed first, before others that branch off from them). Proposed roads are therefore assigned tiers, reflecting their dependencies on each other. A tier 1 road is considered essential for construction first, before any other proposed roads that are connected to it can be constructed. A tier 2 road depends on a tier 1 road being built, while a tier 2 road enables a tier 3 road to be built, and so forth.

A local “cluster” of proposed roads comprising multiple tiers and overlapping influences, as graphically illustrated with reference to an example group of roads in Fig. 10, is therefore a source of many construction alternatives, each requiring its own accessibility improvement analysis. Individual construction alternatives identified from such local groupings are henceforth referred to as sequences. Instead of evaluating the effects of individually constructed roads, individual sequences are evaluated in terms of their accessibility improvement. Single tier 1 roads are nevertheless evaluated individually, albeit categorised as a sequence.

Fig. 10
figure 10

From the entire set of proposed roads, groups that are connected physically or by overlapping areas of influence are identified. Such groupings, henceforth called sequences, offer numerous potential sub-groups of combinations, which can be considered as unique construction alternatives. Some roads in the groupings depend on the construction of others—resulting in internal hierarchies (tiers) (image adapted from Banick et al. (2021))

The process of identifying feasible sequences is manual and potentially time-consuming—increasing in required effort as the number of proposed roads increases, while also depending on the prevalence and complexity of local groupings. For example, from the 92 proposed roads in the Karnali study area, Banick et al. (2021) identified groupings of proposed roads that have local areas of influence that overlap with each other. The roads in these local groupings were then assigned tiers, and with consideration of different tiers, 261 plausible sequences were identified. Of the 92 individually proposed roads, 68 are tier 1 roads and can be considered in isolation, while the remaining 24 roads are auxiliary and only considered within sequence combinations.

In a real-world scenario, such hierarchies would be identified by experts in road construction and regional accessibility—the priority of Banick et al. (2021) and in this paper is to illustrate the accommodation of such hierarchies in the solution approaches.

Example accessibility model outputs

Repeating the process of determining total p-h accessibility improvements for 261 possible sequences, with respect to three services, for two seasons, resulted in Banick et al. (2021) obtaining a total of \(261\times 3\times 2 = 1\,566\) multi-modal p-h accessibility improvement measures. These measures were compiled in a master spreadsheet alongside each sequence’s projected cost determined with road segment costs from the Karnali Provincial Master Transport Plan (which considers factors such as road length and surface type). Two example sequences are presented in Table 1.

Table 1 Example outputs of the accessibility improvement modelling of Banick et al. (2021) which are used as inputs to the optimisation approaches applied in this paper. Example results for the first two road sequences are shown here

Comparing accessibility improvements to cost in this manner yields an easily interpretable measure of each segment’s projected cost-efficiency with regards to accessibility (p-h/NPR). In the optimisation processes followed in this paper, however, only the p-h and NPR values are used, since development plans are evaluated according to cost-efficiency in their totality and not by their constituent sequences.

Sequence exclusion lists

The purpose of the model of Banick et al. (2021) is to determine accessibility improvements for individual sequences. In this paper, however, optimal sequence combinations are sought. This task is complicated by the fact that some individual roads are present in more than one candidate sequence, as previously observed in example Fig. 10. The optimisation approaches should not allow for the same road to be included in a candidate sequence-combination plan more than once (its contribution to a solution’s cost and accessibility improvement should be measured once only). To prevent this from happening, an important additional input provided to the optimisation solution approaches are sequence exclusion lists.

A sequence’s exclusion list provides a list of all the other sequences that contain any one of the roads that is included in its own combination of roads. In other words, during the candidate solution “building processes” of the optimisation solution approaches, once a sequence is part of a candidate solution, any sequences in its exclusion list are excluded from consideration for further addition to the solution. This ensures that any sequences that share roads with those that are already included in the candidate plan are excluded from consideration. As more sequences are added to a candidate plan, its list of exclusions grows as a result. Each sequence’s exclusion list is populated by simply comparing its constituent roads with those in all other sequences and adding sequences with any overlaps to its list.

The constraint on duplicate roads selection using these exclusion lists and as enforced by the ILP mathematical problem formulation and the heuristic processes are described in the sections below.

Weighted-sum ILP solution approach

In the ILP weighted-sum approach followed in this paper, cost is employed as a maximum budget constraint at different levels across a specified budget span; multiple solutions are determined at each cost level with different service-accessibility weight sets.

In the first stage of ILP optimisation, the purpose is to find the corner points on the Pareto front. This is a single-objective optimisation problem, for which the mathematical formulation follows.

\(N_s\)

Denotes the number of sequences available for consideration.

\(N_a\)

Denotes the number of accessibility criteria considered.

ij

Denote indices of sequences, where \(i,j\in \left\{ 1,\dots ,N_s\right\}\).

a

Denotes the index of accessibility criteria, where \(a\in \left\{ 1,\dots ,N_a\right\}\).

\(v_{i,a}\)

Is the accessibility improvement value that sequence i achieves with respect to criterion a.

\(x_i\)

Is 1 if sequence i is selected, and 0 otherwise.

\(x_j\)

Is 1 if sequence j is selected, and 0 otherwise.

\(X_{i,j}\)

Is pre-determined for all ij pairs, and equals 1 if sequence i has an overlap (common road) with sequence j, and 0 otherwise.

\(c_i\)

Denotes the cost of sequence i.

C

Denotes the maximum plan cost (maximum budget).

The objective can then be written as

$$\begin{aligned} \mathrm {maximise}\ V_c = \sum _{i=1}^{N_s} x_i v_{i,a} \quad \forall ~ a \in \left\{ 1,\dots ,N_a \right\} \end{aligned}$$
(2)

subject to the constraints

$$\begin{aligned} \sum _{i=1}^{N_s} \sum _{j=i+1}^{N_s} x_i x_j X_{i,j}&\le 0 \end{aligned}$$
(3)
$$\begin{aligned} \sum _{i=1}^{N_s} x_i c_i&\le C \end{aligned}$$
(4)
$$\begin{aligned} x_i,x_j&\in \{0,1\} \text {.} \end{aligned}$$
(5)

The objective in (2) is to maximise the total improvement with respect to each accessibility criterion \(a \in \left\{ 1,\dots ,N_a \right\}\). Constraint (3) ensures that no sequence overlap (duplicate road selection) occurs—if sequences at i and j are selected, i.e. \(x_i = 1\) and \(x_j = 1\), the constraint requires that \(X_{i,j} = 0\) (no overlap between i and j). Constraint (4) ensures that the combined cost of all the sequences selected in the solution does not exceed the maximum cost, while constraint (5) specifies binary requirements on the decision variables (a sequence is either selected or not in its entirety, and not “partially built”).

In the second stage of ILP optimisation, the objectives in (2)—which were determined separately with respect to each individual criterion—are reduced to a single objective function using various combinations of weights, \(w_a\), assigned to each accessibility criterion, where \(\sum _{a=1}^{N_a} w_a =1\). When following a weighted-sum approach, the values of the weights are, however, not only significant relative to each other but also relative to their corresponding objective functions (Marler and Arora 2010). This is because the relative magnitudes of the separate objective functions may differ significantly. Consider a scenario such as that illustrated in Fig. 11, where the maximum objective function values that may be achieved for objective 1 is an accessibility improvement of 100k person-hours, and for objective 2 is an accessibility improvement of 250k person-hours. If a solution that lies between the two corner points of objectives 1 and 2 is desired—e.g. a solution that reflects equal importance between these corner points and does not consider objective 3—weights of \(w_1 = 0.5\), \(w_2 = 0.5\) and \(w_3 = 0\) would be assigned. However, the objective function effectively aims to maximise \((w_1V_1 + w_2V_2)\), which is not dictated solely by weights or relative magnitudes alone. In this case, the weighted sum will be dominated by criterion 2 because of the larger magnitude of which it is capable and the returned solution will be located towards (or even on) the corner point of objective 2.

Fig. 11
figure 11

Objective function values can vary significantly in minimum and maximum values of magnitude, and it is required to transform these values so that their magnitudes are similar (i.e. to normalise the values). This allows the objective weights in the weighted-sum approach to more effectively direct the search towards desired solution regions (Heyns et al. 2020)

When using weights to represent the relative importance of the objectives, it is therefore required to transform the objective function values so that their magnitudes are similar and so that some do not naturally dominate the aggregate objective function (Marler and Arora 2010). In order to achieve this, the objective functions are normalised so that they range between zero and one, so that the importance of weights are reflected more accurately on the resulting solutions. The maximum objective function values with respect to each criterion are already determined in stage 1 of the ILP approach—the single objective optimisation function in (2)—and are rescaled to a value of one. The minimum value with respect to each objective function is rescaled to zero—these minimum values remain to be determined. However, this process is not overly sophisticated since the values may be found at the corner points not associated with the objective.

Consider Fig. 11 once more, where the aim is to find a solution between the corners of objective 1 and objective 2. The minimum value that may be achieved with respect to objective 1 is simply found at the corner of objective 2, and the minimum of objective 2 is found at the corner of objective 1—when considering objectives 1 and 2 only. Using the minimum and maximum values for each objective function, values determined between these values may be scaled to between zero (at the minimum) and one (at the maximum). In this manner, the effect of objective function magnitudes on a weighted-sum may be eliminated and the search for solutions is directed by the weights in the weighted-sum. If solutions between all three corners are desired, the minimum of each objective still lies at one of the other corners, and the value at each corner needs to be compared to the others before the minimum may be identified. For example, when only considering objectives 1 and 2, the minimum for objective 2 lies at the corner of objective 1. Now, if objective 3 is also considered, then the minimum of objective 2 may be found at either the corner of objective 1 or objective 3. Looking at the figure and the projections of the corner points onto the obj1/obj2-plane, the minimum for objective 2 is now found at the corner of objective 3 and this minimum value becomes zero relative to other values.

Let \({\mathbb {A}} \subseteq \left\{ 1,\dots ,N_a \right\}\) denote the set of criteria for which a weighted objective function is sought, with the weights of the selected criteria adding to 1. Furthermore, let \(v_{a,max}\) denote the maximum value of accessibility criterion \(a\in {\mathbb {A}}\), determined in ILP stage 1. Similarly, let \(v_{a,min}\) denote the minimum value of accessibility criterion \(a\in {\mathbb {A}}\), which is pre-determined as the lowest value found with respect to the objective at the Pareto corner points of the criteria included in \({\mathbb {A}}\). The normalised objective function is then to maximise

$$\begin{aligned} \mathrm {maximise}\ V = \sum _{a\in {\mathbb {A}}} w_a \frac{\left( \sum _{i=1}^{N_s}x_i v_{i,a} - v_{a,min}\right) }{\left( v_{a,max}-v_{a,min}\right) } \quad \forall ~ a \in {\mathbb {A}} \end{aligned}$$
(6)

The objective in (6) is subject to the same constraints (3)–(5)

Variable chromosome length heuristic

As described in "Heuristic solution approach", the genetic algorithm solution approach employed here iteratively evolves populations of candidate solutions through carefully controlled evolution over multiple generations. The main differences between this heuristic and the NSGA-II of Deb et al. (2002) is the addition of a drop operator for budget exploration, and a more aggressive mutation strategy. A variable chromosome length (VCL) approach (Cramer 1985; Srikanth et al. 1995; Ting et al. 2009) was required because feasible candidate plans including varying numbers of sequences can be configured across the budget span—its implementation here is a first for roads optimisation.

Fig. 12
figure 12

An overview of the heuristic employed in this paper, which is based on the NSGA-II framework and principles of Deb et al. (2002) and borrows solution search methods from Jia et al. (2007), Densham and Rushton (1992), Teitz and Bart (1968), and Heyns (2016)

The heuristic commences by stochastically generating a population of N feasible solutions of varying costs—and therefore varying chromosome lengths—that fall within the specified budget range, which forms the current population as illustrated in Fig. 12. From the current population, well-performing solutions are favoured for selection into a “mating pool” of solutions from which the offspring population is created—weaker solutions nevertheless have a small chance to be selected into the mating pool, for diversity purposes (Deb et al. 2002). The size of the mating pool used in this paper is half the size of the current population.

Pairs of “parent” solutions are randomly selected from the mating pool N times, with each parent pair subjected to crossover (swap) and sequence-drop operations. In order to vigorously explore the budget span of the solution space, the crossover approach that is used here was inspired by an implementation of Jia et al. (2007) for large-scale medical emergency planning, and compatible with a VCL solution approach.Footnote 4 A crossover performed between two selected parent chromosomes creates one new “offspring” solution by combining the parents’ sequences with each other, as illustrated in Fig. 13. Then, sequences from one chromosome that are in the exclusion lists of the other are then removed (the choice of which chromosome to remove entries from is randomly selected). This results in the first potential offspring solution—feasible if within budget (Offspring 1 in Fig. 13). The new solution may exceed the maximum budget or be near the maximum limit. This is, in fact, desired; the subsequent repeated entry removal that follows returns over-budget solutions back within feasible range, and then explores solutions at various cost levels across the budget range.

Fig. 13
figure 13

The VCL crossover process that was followed in this paper. In the first step, two parents are selected and any possible duplicate entries are removed. Then, sequences from one chromosome that are in the exclusion list of the other are removed and the chromosomes combined, resulting in the first potential offspring solution (Offspring 1). In the final step, weakest sequences are repeatedly removed until the minimum budget is reached

In the implementation followed here, the weakest entry is repeatedly determined and removed using a “best-drop” removal process from the Global/Regional Interchange Algorithm of Densham and Rushton (1992). In this approach, each entry along the chromosome is separately removed to determine its removal effects. Each removal reduces cost and objective function values—the removal of a strong entry will typically result in significant weakening of the objective function values, while the removal of a less significant entry will typically have a small impact. In this manner, the weakest (least significant) entry may be identified. As illustrated in Fig. 13, removal of the weakest entry discovers a new candidate plan with a lower cost, while the removal of only the weakest sequences promotes the discovery of stronger solutions across the budget span. All the offspring solutions across the budget-span resulting from the repeated crossover-drop operations are combined, as in Fig. 12, followed by multi-swap mutation performed on each.

Mutation promotes the diversity of sequences in the chromosomes—and the population in general—by stochastically introducing unexplored sequences into the chromosomes (Deb et al. 2002). The mutation process followed here is a multi-objective modification to the classic Teitz-Bart Algorithm (TBA) (Teitz and Bart 1968)—a more detailed discussion on the TBA is provided by Heyns et al. (2020). When utilised within the NSGA-II, this modification returned impressive results for large geospatial optimisation problems when compared to the NSGA-II using traditional mutation approaches (Heyns 2016), and has already been used for solving a real-world optimisation problem related to anti-poaching (Heyns 2021).

Fig. 14
figure 14

The VCL mutation process followed in this paper. Each entry in the chromosome is swapped with an arbitrarily selected sequence that is not in the exclusions of the other entries, and each swap that is performed results in a new offspring

The mutation process followed here is illustrated in Fig. 14 and is repeated for each offspring from the crossover-drop process (which now enters the mutation process as a parent). Each of the parent’s entries are individually mutated from first to last, with each mutation returning a new offspring. For example, to mutate the first entry in Fig. 14, all exclusion lists from the sequences in the parent are identified—except for the sequence in the entry being mutated. A sequence that is not excluded is then randomly selected and replaces the current entry to be mutated. The new chromosome is evaluated with respect to all of the objective functions, and progresses if it exhibits an improvement over the parent with respect to any one of these. This process is then repeated on the original parent by mutating the second entry, and so forth.

The mutation approach described above results in multiple offspring for each parent from the crossover and drop stage, which are then reunited with their parents from before mutation, as illustrated in Fig. 12. From this large combination of solutions, a final offspring population of size N is identified—to ensure that only the strongest offspring solutions progress and so that the population size does not spiral out of control (resulting in intractable complexity). This is achieved by the Fast Non-dominating Sorting Algorithm (Deb et al. 2002) which applies non-dominated principles and preferences for solution diversity to filter populations down to a specified size. The final offspring population and the current population are then combined (size 2N), and the N best solutions from this combined population are once again identified using the Fast Non-dominating Sorting Algorithm. If convergence is observed or when a specified maximum generation count is reached, the algorithm terminates and the Pareto front approximation of the final population is returned for analysis. Otherwise, the N solutions become the current population and the process is repeated until it reaches its termination criterion. In this paper, a maximum number of generations was used to signal termination.

Solution search setup

The cost and exclusion list of each of the 261 candidate sequences for Karnali Province, along with their p-h accessibility improvement measures for each service-season combination were provided as inputs into the optimisation solution approaches. Minimum and maximum budget limits of 6 and 13 billion NPR were specified. For the upper bound, 13 billion NPR was used because this was the budget for the roads that were eventually selected by the Karnali provincial government (before this research was envisioned). The much lower limit of 6 billion NPR provided a large budget span, which allows for services to be compared in terms of rates of accessibility improvement that may be achieved at different budget ranges, and also allows for comparing how improvements between services relate to each other across the budget range (e.g. linear or exponential).

In the weighted-sum ILP approach, each execution sought a combination of sequences so that their accumulated person-hour accessibility improvements were maximised according a specific service-weight combination. Cost was considered as a maximum constraint and solutions were sought at cost levels beginning at 6 billion NPR and increasing in increments of 1 billion NPR up to 13 billion NPR, resulting in eight cost levels. The ILP runs were first executed at all the cost levels and possible weight combinations for the dry season, and then repeated once more for the monsoon season. In the first ILP stage, no weights are assigned because the objective function in (2) is repeated for each of the three service-accessibility criteria per season, at each cost level in order to determine the corner points on the Pareto front. In the second stage, the objective function in (6) is used. Equal weights were assigned with respect to all pairs of two services that can be compared from the three per season, e.g. set 1) \(w_1 = 0.5\), \(w_2 = 0.5\), \(w_3 = 0\), set 2) \(w_1 = 0.5\), \(w_2 = 0\), \(w_3 = 0.5\), and set 3) \(w_1 = 0\), \(w_2 = 0.5\), \(w_3 = 0.5\), resulting in three such weight combinations at each cost level, per season. Then, the three criteria were equally weighted together, i.e. \(w_1 = 0.33\), \(w_2 = 0.33\), and \(w_3 = 0.33\)—a single weight combination at each cost level, per season. The three runs from the first ILP stage together with the four weighted runs from the second stage totalled seven runs at each of the eight cost levels, resulting in a total of 56 weighted-sum runs per season—and therefore 112 total runs.

The purpose of the aforementioned weights and ILP executions were to obtain a variety of solutions for a specific season. However, to investigate how each specific service’s accessibility improvements compare between seasons, along with cost increases, one additional run was performed per service, at each cost level. For example, to obtain and compare improvements to health facility accessibility between seasons, at each cost level a weighted sum is used with equal weights of 0.5 assigned to maximise each season’s health facility accessibility improvement. For these season-comparison runs, with one solution obtained at each of the eight cost levels for each of three services, a total of 24 solutions were obtained.

No more weight combinations were investigated, as these were primarily used for visual comparison to the heuristic results (and visualising solutions with respect to more than three objectives is challenging). Thus, a global number of 136 weighted-sum runs and solutions were obtained. The runs were all executed with CPLEX and called from within MATLAB. This is because data preparation was done in MATLAB, and for each CPLEX run a different set of variables (depending on the criteria for which weights are assigned) are selected and provided as input—this was managed by MATLAB.

In the heuristic approach, the aim was to find combinations of road sequences so that their accumulated person-hour accessibility improvements with respect to the three services were maximised, and was also performed separately for each season. Cost was implemented as an additional minimisation objective—therefore a total of four objectives per season. The cost minimisation objective forces the heuristic to not only find solutions that maximise the accessibility objectives but also does so considering cost-optimality across the budget span. Without this objective the heuristic will be biased to and return solutions that are at the maximum budget level because more expensive solutions are able to return the largest accessibility improvements.

Because of the stochastic nature of the heuristic’s Pareto-front approximation process, the solutions returned by different optimisation runs will generally vary in quality. It is therefore standard practice to repeat the process multiple times (Knowles et al. 2006; Kim et al. 2008; Heyns 2021). The results of all the runs are then combined and a final attainment front—essentially the Pareto front of all the combined Pareto front approximations—is identified. Here, the heuristic was repeated fives times per season with a population size of \(N = 3000\), and solutions resulting from all the runs were combined into a single set of \(5\times 3000=15,000\) solutions per season—and thus a total of 30,000 solutions were determined in total by the heuristic. The authors’ personal optimisation code was used for the heuristic runs, executed in MATLAB. All runs were terminated after 45 generations.

Results

Solution sets

The solutions obtained by the heuristic and weighted-sum runs are presented in this section. It was possible to determine both sets of heuristic and ILP results within one day.Footnote 5 The ten heuristic runs terminated at an average of 3 h 42 m per run, requiring a global computation time of approximately 37 h to complete. This was, however, reduced to under 20 h by running multiple instances of MATLAB concurrently (this reduces the overall computation time to complete all runs, but lengthens the individual runs’ computation times because of resource sharing on the same PC). The ILP run times averaged at 2 m 19 s with a global computation time of 5 h 16 m to complete all 136 runs.

Regarding the heuristic’s total of 15,000 solutions obtained from the dry season runs, 9854 were determined to be approximately Pareto-optimal with respect to the others. For the monsoon season runs, this number was 9779 from its total of 15,000 solutions. There are therefore a combined total of 19,633 approximately Pareto-optimal solutions discovered in total by the heuristic for both seasons. For referring to specific solutions in the remainder of this paper, the 19,633 solutions were sorted from least to most expensive and assigned numbers from 1 through 19,633, i.e. solution 1 costs roughly 6 billion NPR, while solution 19,633 costs roughly 13 billion NPR.

Fig. 15
figure 15

A summary of the heuristic optimisation outputs in the shape of each season’s approximately Pareto-optimal solutions plotted against the accessibility objectives. The cost of the solutions are indicated as increasing in darkness from light grey at 6 billion NPR, to dark grey at 13 billion NPR. Example solutions and their selected roads are provided for solutions 6139 and 19,618, with objective function values displayed

Figure 15 visualises the approximately Pareto-optimal solutions obtained by the heuristic for each season. The cost of the solutions in Fig. 15 are indicated as increasing in darkness from light grey at 6 billion NPR, to dark grey at 13 billion NPR. Solutions number 6 139 and 19 618 are linked between their solution “dots” and their accessibility improvements are provided alongside an image of their selected roads. Figure 15 exemplifies how solutions may be presented to decision-makers to assist their analysis of options. A more evolved presentation might provide this data in interactive form.

In Fig. 16a–c, the dry season’s approximately Pareto-optimal solutions in Fig. 15 are now projected in 2D, along axes which are p-h accessibility improvements with respect to different service pairs. In Fig. 16d–f the solutions are presented in the same manner for the monsoon season’s solutions. The accessibility improvement achieved by the actual 13 billion NPR roads selected by the Karnali provincial government and as evaluated by the cost-time model implemented here are also displayed as a black cross in each plot. In all the plots in Fig. 16, selected solutions are indicated in the colours blue, green, orange, and red, which represent solutions at budget levels of 7, 9, 11 and 13 billion NPR, respectively. Only four of the eight cost levels’ solutions are displayed for visual clarity and to provide a general indication of the heuristic’s Pareto-front approximation capability compared to the ILP approach—other cost levels were used for more analysis in development and testing, and displayed similar characteristics as those presented in the figure.

When examining the colour solutions at each level, it may be observed that three are larger circles—these are the ILP solutions that were determined with the budget set at the indicated maximum value. Of these three ILP solutions at each budget level, the two outer points are the values determined by single-objective optimisation with respect to a single service-accessibility criterion, while the middle solutions were determined by the weighted-sum with respect to the two objectives in each plot (i.e. determined with two equal weights for the two objectives). The smaller colour circles are heuristic solutions that are extracted from the solution set at the specific budget level and with respect to the two objectives in each plot. To aid in understanding these additional “sub”-fronts, one can consider the heuristic solutions obtained for each season in Fig. 16 as a subset of the entire solution space of all possible solutions, and which have been identified to return optimal trade-offs with respect to the three accessibility criteria, and cost (i.e. a 4-dimensional Pareto front of solutions). Then, similar to this process of determining a Pareto front from the entire solution space with respect to four objectives, sub-fronts are extracted from the dry season’s 9854-solution set with respect to selected service pairs at each cost level, and repeated for the monsoon season’s 9779-solution set with respect to selected service pairs at each cost level.

Fig. 16
figure 16

ac Objective function values (in person-hours) of the dry-season heuristic solution set and selected ILP solutions, plotted against values achieved with respect to pairs of different services types. The values achieved during the monsoon season are similarly displayed in (d)–(f)

The combined heuristic solution set of 19 633 solutions and selected weighted-sum solutions are now presented in Fig. 17, with the axes representing accessibility improvements to health facilities with respect to the two seasons. Results are only shown for health facilities because of the highly linear nature of the solutions’ trade-offs in objective function values. Similar yet less significant patterns were observed with respect to financial institutions and municipal headquarters (examples for these services are provided by Heyns et al. (2020), albeit for a different heuristic solution set).

Fig. 17
figure 17

Objective function values (in person-hours) of the heuristic solutions and selected ILP solutions, plotted against values achieved with respect to health facilities in the two seasons

Analysis

Pareto front approximations

When analysing the sub-fronts in Fig. 16, the trade-offs between improving accessibility to services become clear. At each budget level, moving along solutions that improve in quality with respect to one objective causes a degradation in solution quality with respect to another objective. This provides insight into the decision-making process, as it is clear that the objectives are conflicting and that the trade-offs should be examined. The solutions exhibit constant improvement in the objective function values when moving from one budget level to the next, indicating a relatively linear relationship between cost and accessibility improvement. Particularly encouraging is that the heuristic solutions at the four budget levels displayed in the figure are similar in solution quality and follow the shape of the sub-fronts yielded by the ILP solutions. This is significant, since the heuristic solutions are determined thousands at a time, while each of the ILP solutions are determined individually.

When analysing the Pareto fronts in Fig. 17, the trade-offs between improving accessibility to health facilities during different seasons is highly linear and reveal that gains in accessibility in one season scale favourably to gains in another season. While similar patterns were also observed comparing solutions for financial institutions and municipal headquarters in the same manner, these were not as striking as for health facilities (such observations for financial institutions and municipal headquarters are visually presented and discussed in more detail by Heyns et al. (2020), albeit for a different heuristic solution set). A possible cause for health facilities’ gains being so significantly linear between seasons may be the wider spatial distribution and higher absolute number of health facilities (refer to Fig. 6), which reduces the occurrence of relatively extreme accessibility gains for any specific road sequence during any season. Observations such as these may lead to improved decision-making processes. In this case, it may be possible to eliminate the consideration of one of the seasons in healthcare access—effectively removing a criterion from the overall optimisation and analysis process. Alternatively, it may be possible to eliminate the consideration of one of the seasons from the entire optimisation and analysis process altogether. This would further reduce the computational complexity of the solution search, reduce computation times, likely result in improved solution quality, while also simplifying the analysis process. This would, however, be problem-specific and dependent on local preferences, and whether it is considered necessary to distinguish between separate seasons (which, in turn, would depend on the magnitude of the effect that seasonal changes have on transportation).

Significant improvement in solution quality is observed when comparing the heuristic solutions in Figs. 16 and 17 to the actual roads plan selected by the Karnali provincial government. The maximum accessibility improvements for solutions at or near the maximum 13 billion NPR budget are more than 3, 3, and 1.5 times those yielded by the actual selected roads with the same budget for health facilities, financial institutions, and municipal headquarters, respectively, for both seasons. Regarding the plot in 17, it is observed that even at 7 B NPR the majority of the solutions outperform the actual roads selected with respect to health facility accessibility, more than double in magnitude with respect to each season. Similar observations were made with respect to financial institutions, yet less but nonetheless noteworthy for municipal headquarters—examples from a different heuristic solution set are discussed by Heyns et al. (2020).

Accessibility improvement

Fig. 18
figure 18

Layouts of a Solution 6139, and b Solution 19,624, and their accessibility improvements and areas of influence with respect to municipal headquarters during the monsoon season

The roads selected by solutions 6139 and 19,624 from the combined heuristic solution sets are provided in Fig. 18—both solutions were approximately Pareto-optimal with respect to the monsoon season. Also indicated are their resulting accessibility improvements and areas of influence with respect to municipal headquarters during the monsoon season, as well as the underlying populated areas that benefit. It should be noted that areas benefiting by less than 15 min are not indicated in these maps, nor those presented in the remainder of the paper, in order to highlight more significant gains only and for visual clarity. The maps in Fig. 18 are an example of how alternative roads plans may be presented to decision-makers for visual assessment and comparison purposes.

The accessibility improvement maps in Fig. 18 are uniquely innovative, as they demonstrate how the multi-modal accessibility model implemented with this paper’s optimisation solution approaches take populations that are far away from roads into account, and incorporate their walking times in the evaluation of the solution combinations. Thus, overall walking-vehicular travel times are considered and roads are selected to maximise accessibility improvements across both travel modes. This is a significant improvement over traditional roads optimisation methods, which only consider improvements in roads-travel and not the associated walking times required to arrive at a road—effectively maximising improvements with consideration given to populations close to roads only. Furthermore, by scaling the gains to population density, the approach followed here returns solutions that provide more equal opportunities for improved accessibility, even for the most remote populations, regardless of how far they are located from roads. This is an essential step forward for the optimisation of roads in remote and rural settings, where significant walking times to reach roads and services are common.

The images in Fig. 18 show how more sequences are included in solution 19,624 compared to 6139 (21 compared to 13), resulting in a larger surface area where accessibility improvements are achieved, thus also enveloping and benefiting more populated areas as a result. Solution 19,624 is, however, associated with a higher cost of nearly the maximum budget of 13 B NPR, compared to 7.39 B NPR for solution 6139. The full set of objective function values obtained by solutions 6139 and 19,624, as well as those of the actual roads selected, are provided in Table 2. Also in the table are the values for two additional solutions, 6135 and 19,618, which are discussed in more detail later.

Table 2 Objective function values and additional measures obtained by four solutions discussed in the results, in addition to values for the actual roads selected by the Karnali government

Because of drastic falls in most roadworks’ performance and moderate decreases in walking speeds during the monsoon season, the relative access improvement from new, well-engineered roads is greater in the monsoon season than in the dry season. When multiplied by population underneath, the premium on monsoon season improvements from a new roadwork can be substantial. This is reflected in the higher ratio of gains for the monsoon season seen in Table 2; interestingly, this finding independently echoes engineers’ and planners’ preference for all-weather roads which can guarantee year-round access (Lebo and Schelling 2001). Thus by quantifying seasonal gains these models help align access studies with planners’ own needs.

Fig. 19
figure 19

Layout of solution 19 624 and its accessibility improvements and areas of influence with respect to financial institutions during the monsoon season

The accessibility improvement achieved by solution 19 624 with respect to financial institutions in the monsoon season is displayed in Fig. 19, for comparison with the same solution’s results with respect to municipal headquarters during the monsoon season in Fig. 18b. It may be observed how the areas of influence between the two services vary greatly, as a result of the difference in service locations. Financial institutions are concentrated in the south of Karnali, while municipal headquarters have a better distribution across the province. Access to financial institutions therefore exhibit a significant improvement in the eastern- and northernmost parts of the province, if these roads were to be constructed. It may also be observed in Table 2 how gains with respect to financial institutions outweigh those with respect to municipal headquarters and health facilities in both seasons for all the solutions—as a result of the concentration of financial institutions in the south, compared to more “spread out” locations of the two other services (refer to Fig. 6).

Sequences and road segments

The large number of solutions obtained by the heuristic allows for more nuanced analyses which may reveal characteristics of the roads/sequences considered. For example, it was found that 82 of the 261 sequences do not appear in any of the 19 633 solutions. The same 82 sequences were also not present in any of the ILP solutions. Of these 82 unselected sequences, only one is a single road—i.e. 81 of the 82 unselected sequences are a combination of two or more roads. Strikingly, the heuristic’s 10 most frequently selected sequences are single-road, as are 17 out of the 20 most selected. While single-road sequences account for only 68 of the 261 sequences (26.05% off all sequences), they comprise 73.9% of all sequences selected in the 19 633 solutions set. Single-road sequences are clearly favoured for selection by the heuristic.

Similar results were returned by the 136 ILP solutions. Of the 261 sequences, 177 are not selected in any of the 136 solutions. Of these 177 unselected sequences, 151 (or 85.31%) are multi-road sequences—once again alluding to multi-road sequences being weaker alternatives. Finally, the 8 most frequently selected sequences are single-road, as are 18 out of the 20 most selected—these are similar to the numbers observed for the heuristic solutions. Single-road sequences make up 74.2% of all the sequences selected in the 136 ILP solutions. These results mimic the heuristic’s bias towards single-road sequences, and could indicate that the efficiency and benefit of roads fall once they are added to a “core” tier-1 road. This provides empirical evidence that some roads are proposed solely as ancillary extensions to tier-1 roads, and would normally not be proposed in isolation.

The fact that the ILP approach rejects 177 of the 261 candidate sequences from its solutions, compared to 82 for the heuristic, can be attributed to the heuristic exploring the solution space far more vigorously than the ILP approach. Thus, the heuristic is bound to uncover solutions in regions of the solution space not explored by ILP, where sequences that were not selected by any of the ILP solutions are, in fact, well-performing when combined with other sequences. The heuristic approach therefore offers analysis opportunities that provide insights into the characteristics of sequences and solutions across the entire solution space, instead of being limited to certain discrete points returned by a weighted-sum ILP approach. In this instance, it can be concluded with confidence that the 82 sequences rejected by the heuristic approach could be ignored from further analysis, as they are not present in any of the heuristic’s 19 633 approximately Pareto optimal solutions. On the other hand, ignoring 177 sequences based on them not being present in a smaller set of 136 ILP-determined solutions would be ill-advised.

The selection frequency of individual sequences was also investigated. It was found that the five most selected sequences in the heuristic solution set were sequences number 218 (selected 17,717 times in 19,633 solutions), 217 (selected 17,002 times), 220 (selected 16,496 times), 261 (selected 15,600 times), and 223 (selected 14,450 times). For the ILP solutions, these were sequences number 261 (selected 133 times in 136 solutions), 218 (selected 130 times), 220 (selected 127 times), 196 (selected 122 times), and 213 (selected 119 times). Having such values available can help identify which sequences are the best-performing across the entire budget span and with respect to all the objectives, and the most critical to include in any new roads plan. Once again, it could be better advised to consider such values determined from the heuristic solution set—instead of from the ILP solutions—as these would return the most-selected sequences across the entire solution space instead of at a small number of discrete points.

Figure 20 reveals additional characteristics—and advantages—of solutions returned by the heuristic when compared to those obtained by ILP. In Fig. 20a the minimum, mean, and maximum number of sequences included in solutions obtained at different cost levels are indicated. For the ILP solutions, this is determined for the 17 solutions at each cost level, while for the heuristic the solutions were grouped together in bins of 0.1 billion NPR—therefore the first two bins are from 6.0 to 6.1 billion, and 6.1 to 6.2 billion NPR, and so on, so that there are ten bins between 6 and 7 billion NPR, ten more between 7 and 8 billion NPR, and so on.

The results in Fig. 20a show that the heuristic is capable of determining solutions that include fewer sequences than those obtained by ILP—significantly so when analysing the minimum values, less so when analysing the mean values, while exhibiting a larger number in general when observing the maximum values. Once again, this is possible because the heuristic explores the solution more vigorously than the ILP approach, and can therefore uncover solutions with certain characteristics that may be overlooked by the ILP approach. These observations are crucial for project and planning purposes, since fewer sequences should facilitate an easier management schedule. However, this would also depend on the actual number of individual road segments included in the sequences. For this purpose, the solutions were analysed in the same manner as in Fig. 20a, but in Fig. 20b the total number of individual road segments included in the solutions are displayed in minimum, mean, and maximum numbers. This figure reveals similarities when compared to sequences in Fig. 20a and indicates the heuristic’s ability to find solutions including fewer segments than their ILP counterparts at similar cost levels.

Fig. 20
figure 20

Minimum, mean, and maximum solution lengths in terms of a Number of sequences per solution, and b Number of individual road segments per solution. The ILP values were determined from the 17 solutions at each of its eight cost levels, while the heuristic values were determined for solutions grouped together in bins of 0.1 billion NPR

For further discussion related to these values, it is observed in Table 2 how solutions 6135 and 6139 have similar costs, yet there are noteworthy differences in their number of sequences and road segments. Solution 6 135 includes 20 sequences comprising 25 roads, compared to solution 6139’s 13 sequences comprising 17 roads. Similarly, in the maximum cost range, solution 19,618 includes 32 sequences and 42 roads, compared to solution 19,624’s 21 sequences and 32 roads. These values illustrate the sensitive nature of the decision-making process in comparing solutions, not just in terms of cost and objective function values but also in terms of project planning and management. The layouts of solutions 19,618 and 19,624 are provided in Fig. 21 and provide more insight into their values. It can be seen that solution 19,624 contains fewer roads, yet its similarity in cost compared to solution 19,618 is explained by its inclusion of longer roads (and road construction costs are related to their length), while solution 19,618 is a combination of primarily short roads. Such observations are also important for planning and management, since fewer sequences appears an attractive proposition, but this benefit is accompanied by the disadvantage of longer stretches of roads to plan and manage.

Fig. 21
figure 21

Comparison of the roads selected by a Solution 19,618 and b Solution 19,624, both costing 12.99 billion NPR. The images illustrate how solutions are not just compared in terms of cost and objective function values but also in terms of project planning and management related to the number of sequences and their associated length of roads to construct. Solution 19,618 contains 32 sequences compared to the 21 in solution 19,624 but, as can be seen in the figure, includes shorter road segments

Discussion

Summary

The marked gains in cost-effectiveness from the methods presented here justify their additional complexity, time investment, and expense. The heuristic and ILP optimisation approaches efficiently process and highlight the most promising combinations of rural roads based on the provided objectives (accessibility improvements) and constraints (budget). The system provides flexibility to policymakers and analysts by proposing solutions at various budget levels and can accommodate additional objectives—accessibility-related or otherwise—with low marginal effort. The end result is a workflow better equipped than existing tools to analyse and compare the accessibility improvements from rural roads construction in remote environments, and their links to desired development outcomes. Table 3 provides an overview of the effort (measured in days) required to arrive at the results presented in the previous section. Larger areas and/or additional service accessibility criteria (added as additional optimisation objectives) would increase the computations times for geospatial and optimisation analysis, and the interpretation and visualisation of results in such instances may similarly require more time.

Table 3 Level of effort required for the different processes that were followed to arrive at the results presented in this paper

Perhaps the most significant contribution of the work presented here is that rural roads planning is optimised with consideration given to remote populations’ entire journeys—not only the part that requires vehicular travel, as is the standard in traditional roads optimisation approaches. This ensures that roads are constructed to benefit even those that are nevertheless required to walk for long periods of time, by considering reductions in walking times to reach roads, followed by the consideration of vehicular travel time gains. Such an approach is essential in rural and remote settings, where travel times often comprise hours (or even days) of walking to or from roads—thus constituting a significant portion of their overall travel times.

Expert feedback

Expert feedback was sought from World Bank economists, geographers, and transportation specialists in a series of presentations and 1-on-1 meetings at the World Bank’s headquarters in Washington, DC, in February-March 2020 (Heyns et al. 2020). The reception was positive, with strong appreciation for the marked improvement in optimisation-based results over the actual roads selected by the Karnali provincial government. Critical feedback focused on the need to incorporate political economy considerations into the analysis process to better reflect decision-makers’ real-world constraints. The actual roads selected likely incorporated unobserved policy or social objectives, such as connecting important towns like Simikot, better integrating specific ethnic groups, boosting tourist access, or facilitating military movements to border regions.

One example of this is political dispersion, which was not investigated in this paper but by Heyns et al. (2020) following the World Bank meetings. A solution’s political dispersion was evaluated as the number of municipalities that, within their municipal boundaries, hold at least five kilometers of roads in the solution. While such political considerations are context-specific and cannot be predicted ahead of time, the political imperative to ensure equitable investment across a province, i.e. “spreading the money out”, is universal. As an example, Heyns et al. (2020) identified two solutions that reach 36 and 46 municipalities at 8.7 B NPR and 12.9 B NPR, respectively. This is improved over the actual roads selected in Karnali’s Provincial Transport Master Plan, which reaches 20. If all the proposed roads were built (which would cost 18.9 billion NPR), 52 out of Karnali’s 79 municipalities would hold at least five kilometres of new roads. Heyns et al. (2020) identified a particularly impressive solution that reaches 46 of the maximum 52 possible municipalities (if all roads were built) but at roughly two-thirds the cost of constructing all roads.

Considerations such as these may be determined without much effort post-optimisation, or be included in the optimisation process as additional constraints or objectives—e.g. maximise the number of municipal regions crossed by at least 5 km of roads, or only return solutions crossing at least 20 municipalities with 5 km of roads.

Advances in analysis opportunities

Several novel optimisation approaches were followed in this research. The most notable is the consideration of a budget span. This was achieved by novel variable chromosome length representations of candidate roads plans in the heuristic. The heuristic is capable of determining thousands of solutions which exhibit optimality with respect to multiple conflicting objectives, in reasonable computation times, across the budget span.

A significant benefit of using the heuristic to obtain such a large number of solutions that are optimal with respect to many objectives is that it provides decision-makers with more rapid analysis capabilities than possible with an ILP approach—particularly during the early planning stages. For example, if it is desired to analyse solutions at a specific budget level of 10.5 billion NPR (a cost level not explored in this paper), the Pareto optimal sub-fronts can rapidly be extracted (within seconds) from the solution sets that were determined here (as was performed in Section "Pareto front approximations"). Then, depending on the objective function values and associated number of sequences and road segments contained in the extracted solutions, it may be debated whether solutions at the specified cost level are adequate, or overperform or underperform with respect to the desired outcomes. Solutions at other cost levels can then be extracted based on these observations, if required, and analysed to gauge their suitability. This can aid decision-makers in finalising a budget and to identify the most significant objectives, followed by a final round of optimisation to “fine-tune” solutions to these specifications—using either the weighted-sum or heuristic solution approach, or both. When compared to such heuristic-determined capabilities, the ILP approach does not offer a rapid way to analyse solutions at unexplored budget levels. For example, if solutions at 10.5 billion NPR are desired (a cost level that was not considered during the ILP runs in this paper) the ILP approach needs to be repeated at the new budget level and for many different weight combinations.

While many analysis opportunities exist as a result of the work presented in this paper, one that was investigated by Heyns et al. (2020) is particularly interesting. By determining and comparing investment efficiency amongst proposed sequences with respect to each accessibility criterion, Heyns et al. (2020) found that the most efficient sequences are generally the most frequently selected, yet certain efficient sequences are not selected often and certain less efficient sequences are frequently selected. Efficient solutions may therefore be by superseded by even more efficient solutions using the same road sequences, and cheap but less efficient sequences may be useful complements to more efficient sequences and plans. The last point underlines the utility of an optimisation-based approach: judging roads in isolation ignores the significant complementaries that exist between choices, and highlights the importance of determining plans based on sequence combinations.

Other benefits to having a large number of solutions include added insight into (a) the relative linearity or concavity of a specific service’s accessibility improvements versus cost increases (as observed in Figs. 16 and 17), (b) the trade-offs that exist between accessibility to services and seasons (as observed in Table 2), and (c) the roads’ investment efficiency and selection frequency (discussed in "Sequences and road segments" and explored in more detail by Heyns et al. 2020). Analyses of such large numbers of solutions have not been performed in the literature, because capable methods have not existed until now. These insightful strategies may further help compensate for the delays and inefficiencies inherent to Nepal and similarly developing countries’ capital infrastructure planning bureaucracy, by focusing policymakers’ attention on the roadworks most critical to success within an overall optimised portfolio.

Limitations

Certain limitations to the approach outlined in this paper must be noted. The most significant is that due to resource and data constraints, accessibility gains were considered as a simultaneous phenomenon instead of staggered over the provincial government’s 5-year road construction window. A deeper analysis would specify a yearly budgetary max within the overall budget max and stagger a solution’s roads across the time period, with gains appropriately discounted. Furthermore, this analysis treats population as spatially fixed, while in reality population is dynamic and new road construction can induce population movements; for example, in Nepal new markets often spring up at newly built roads heads. This could create additional unobserved objectives if the government speculatively selected roads to create new market hubs or connect underpopulated remote villages to reduce population outflows.

The complexity of the analysis routines described here places them effectively out of reach of even national government planning agencies in Nepal and other countries at its level of development, let alone local ones. Thus government will need to employ universities and private sector companies with specialised expertise to replicate this analysis in new areas. Given that the studied investment planning exercise happens on a 5 year basis this seems a manageable extra expense. A further limitation is the difficulty of communicating the results to policymakers compared to traditional instruments. Welfare-based evaluation methods ease the cognitive burden on decision makers by measuring investment returns in terms of familiar NPR values or easily parsable rates of return. To compensate, thorough explanatory visuals for the methods and results were produced. Nevertheless, correctly interpreting a 3D bubble chart of person-hours is inherently more challenging than a table of Net Present Values (NPVs) or Internal Rates of Return (IRRs). This erodes decision makers’ ability and willingness to use the results presented here. Therefore real-world implementations will need to invest additional time and effort in correctly explaining results and guiding policy makers to use them appropriately.

Potential improvements and future research opportunities

Additional services would nuance the analysis. For instance, the analysis could incorporate accessibility to schools, water points, agricultural extension services, or other important services, even at a village or household level. A variation on this theme is to differentiate qualitatively between services, for instance by separating hospitals from health posts or major from minor financial institutions and then re-running the analysis with each as an objective.

The heuristic is a novel development and by no means considered a finished version—the focus here was to develop the heuristic to investigate and illustrate the potential for new analysis by obtaining thousands of high-quality solutions. As such, there exist numerous opportunities for improvement. A simple join crossover operator was used here because it results in an offspring with a higher cost level from which the lower levels may be explored by repeated entry removal. It may nevertheless be interesting to see how more traditional crossover methods affect the results, particularly in terms of budget exploration. Similarly, the aggressive mutation strategy works well to explore large solution spaces and has also been successful in other recently applied research (Heyns 2021). It would nevertheless be worthwhile investigating simpler strategies that are less computationally expensive, such as simply replacing a single entry in a solution instead of all of the entries. This may result in faster computation times, but at the cost of slower exploration of the solution space and therefore more generations to perform before terminating the heuristic process. Furthermore, many of the heuristic’s offspring solutions during the offspring creation phase are similar and only differ from each other marginally in terms of their objective function values and selected roads plans. In future implementations, solutions that exhibit such marginal gains compared to each other may be filtered out and only certain ones retained—reducing the overall size of the offspring populations and the total modification and sorting operations required during the solution process. This can significantly reduce overall computation times, but more importantly improve solution quality because the relative diversity of the populations are improved—an important characteristic for efficient solution space exploration.

Sensitivity tests could be conducted on optimisation results to verify the stability of preferred solutions to cost overruns or accessibility miscalculations. Investigating (near-)optimal solutions with the greatest degree of stability across reasonable cost overruns can help identify sequences that help guard against the construction industry’s notoriously poor cost control and, alternatively, could also help to identify sequences that should be avoided because of high degrees of instability. Similarly, the optimisation routines could offset Nepal’s budgetary and planning inefficiencies—not unique to Nepal and generally observed in developing countries—by recommending which core road sequences to finish first.

Conclusion

In this paper, rural roads optimisation was performed for multi-modal travel in rural settings—the first such implementation in roads optimisation. Optimising roads plans for multi-modal travel using the accessibility model of Banick et al. (2021) ensures that roads are selected based on their benefit to even the most remote populations that require significant walking times to reach roads before being able to utilise vehicular transport. By implementing this state-of-the-art accessibility model with heuristic and ILP optimisation approaches, the utility of a multidisciplinary approach to arrive at meaningful and substantially improved roadworks plans in mountainous and data-poor environments was demonstrated.

Optimisation of roads plans was performed using 92 recently proposed roads in Nepal’s remote Karnali province—a project that required a subset of roads to be selected within a limited budget. Accessibility changes linked to each proposed road, or sequences of roads, with respect to three services, during dry and monsoon seasons, were determined using the multi-modal model of Banick et al. (2021), and were provided as inputs to a heuristic and ILP solution approach. Road-combinations were then determined with the objectives of maximising accessibility values for populated areas with respect to the three services, and minimising cost, for each season.

The heuristic presented in this paper is a new solution approach, and was capable of determining large numbers of road-combination plans that offer high-quality trade-offs in objective function values. The returned solutions were distributed across a large budget span, realised by using a genetic algorithm with solutions represented by chromosomes of variable lengths. The approach overcomes the limitations of traditional optimisation methods, which offer a limited number of road-combination plans to analyse and compare and return solutions at specified budget levels only.

The results obtained by the heuristic were compared to solutions determined by the ILP approach—for which a new mathematical model formulation was proposed—and compared favourably in terms of objective function values. Despite the heuristic obtaining thousand of high-quality solutions compared to a small number by the ILP approach, these were obtained within reasonable overall computation times. The solutions determined by both solution approaches substantially outperformed the roads selected by the local Karnali government—numerous plans that exhibit improved accessibility outcomes were obtained at half the budget of what was actually spent.

The heuristic approach offers new opportunities for in-depth analysis for final roads selection. Examples that were discussed and illustrated include, amongst others, the influence of seasonality on roads-combinations’ estimated accessibility improvement, the number of sequences and individual roads included in solutions at different cost levels, and roads selection frequency.. The heuristic is capable of identifying key roads and making similar observations compared to an ILP approach, but with additional insight and other analytical opportunities—without much extra effort required. The work presented here forms part of a greater push for data-driven analysis in World Bank roads development projects, and is earmarked for application in the near future.