Introduction

Petroleum exploration is a competitive activity, and, historically, exploration data, decision-making inputs and processes, and drilling outcomes were highly confidential. However, some exploration companies are now more willing to discuss their exploration plans and expectations on the Internet and in social media (e.g., FAR Limited, 2018; Melbana Energy, 2019; Lundin Energy, 2021), and many governments strive to increase transparency in exploration regulations and reporting (e.g., Norwegian Petroleum Directorate, 2018; Canada–Newfoundland & Labrador Offshore Petroleum Board, 2021; United Kingdom Oil & Gas Authority, 2021). These developments facilitate projects in which open source, open access, and crowdsourced data can be used to learn about petroleum exploration forecasts and outcomes (Milkov, 2017, 2020).

Petroleum explorers routinely predict outcomes of drilling, and such forecasts have enormous financial consequences for companies (Capen, 1976; Rose, 1987; LaCosta and Milkov, 2022). Independent and government studies published over the last 25 years consistently indicate that exploration companies commonly underestimate probability of geological success (PoS) and overestimate success-case volumes with no improvements over time (Lewis, 1984; Rose, 1987; Fosvold et al., 2000; Ofstad et al., 2000a, b; Blystad and Søndenå, 2005; Milkov, 2017; Bagley and Bond, 2018; Norwegian Petroleum Directorate, 2018; Janssen, 2019; Hoetz et al., 2020). In the past, exploration companies themselves often admitted the existence of large gaps between their exploration predictions and outcomes (Slujik and Parker, 1986; Alexander and Lohr, 1998; Johns et al., 1998). However, more recent companies’ reports tend to present rather cheerful and self-complimentary accounts of their performance (Möller, 2015; Rudolph and Goulding, 2017; Boyd, 2019; van Mastrigt and Quinn, 2021). Did exploration companies improve their processes and become better at exploration assessments in recent years or not? Critical unbiased studies of open data may help answer that question.

Here, we compare pre-drill petroleum exploration assessments and drilling outcomes in Mexico using open source presentations and reports of the National Hydrocarbons Commission (Comisión Nacional de Hidrocarburos, or CNH), an agency of the Mexican Federal government. The focus of this study is on (1) the pre-drill assessed geological PoS values and actual geological success rates, and (2) the pre-drill estimated success-case volumes and actually found recoverable resources of petroleum fluids. Mexico has a long history of petroleum exploration and production going back to the beginning of the nineteenth century (Ryan, 1979). More than 5,000 exploration wells have been drilled in the country, and they discovered ~ 320 billion barrels of oil equivalent (mmboeFootnote 1) resources in place (Guzmán, 2013). However, to our knowledge, there are no previous studies comparing pre-drill predictions and actual exploration drilling outcomes in Mexico.

Definitions

Segment is a subsurface feature that represents a potential petroleum pool (Stabell, 2000; Milkov, 2015). This is the smallest unit in the assessment of the geological PoS and success-case resource volumes. Exploration wells may target prospects with one or more segments. In one basin, prospects with multiple segments that have at least some independent risk factors, should be, in general, less risky and should result, on average, in more discoveries than prospects with single segments (Delfiner, 2003; Milkov, 2021).

There are two common definitions of the probability of geological success (geological PoS). In first definition, the geological PoS is the probability of making a petroleum discovery, i.e., finding petroleum fluids that would freely and sustainably flow into the well from the penetrated subsurface segment(s) given the actions of a prudent operator. This definition is concerned only with the probability of finding producible (movable) petroleum, and there is no specified minimum volume of petroleum or commerciality implied in this definition (Meisner and Demirmen, 1981; Harbaugh, 1984; Clapp and Stibolt, 1991; Nederlof, 1994; Otis and Schneidermann, 1997; Johns et al., 1998; Rose, 2001; Milkov, 2015, 2021). In second definition, the geological PoS is the probability of finding a certain pre-defined geological minimum volume (or more) from the success-case volumetric model. This definition was developed at Exxon (Gehman et al., 1975; White and Gehman, 1979). The geologic minimum volume is typically smaller than what would be needed for economic (commercial) success and is commonly in the range of 1–10 million barrels of oil equivalent (mmboe) recoverable petroleum fluids (e.g., Sykes et al., 2011; Rudolph and Goulding, 2017). The first definition of the geological PoS listed above is perhaps more common in the petroleum exploration industry.

The geological success rate is the proportion of discoveries among all drilled wells. In the case of good assessment of geological PoS values, for a large number of prospects/wells, the average geological PoS and the actual geological success rate should be the same.

Success-case volumes are volumes of recoverable petroleum fluids expected to be discovered if the well finds movable petroleum with sustainable flow. Most modern companies estimate the success-case volumes using a probabilistic methodology, i.e., they multiply distributions of values for volumetric parameters (gross-rock volume, net-to-gross ratio, porosity, petroleum fluid saturation, formation volume factor, and recovery factor) using a Monte Carlo approach (Gehman et al., 1975; Murtha, 1994; Snow et al., 1996; Rose, 2001; Milkov, 2015, 2021). The outcome of such assessment is the distribution of recoverable resources, and companies commonly report the results as P values (typically P90, P50, and P10) and the mean value. P90 is defined as 90% probability of exceeding the P90 estimated value, P50 is defined as 50% probability of exceeding the P50 estimate, and P10 is defined as 10% probability of exceeding the P10 estimate. The mean value (the expected value) is a measure of the center of a probability distribution (e.g., Bratvold and Begg, 2010).

Data, Methodology, and Assumptions

The pre-drill data for this project come from recorded CNH sessions publicly available on YouTube (https://www.youtube.com/channel/UCB5ZTtQx6tzV4zBdaNY-qhA). In these sessions, CNH representatives present pre-drill summaries of prospects/wells that operators evaluated, included in their inventories of opportunities or/and planned to drill. We transferred the information such as prospect/well name, well type (exploration or development), targets (segments), expected petroleum fluid phase, geological PoS, and success-case volumes from these presentations into a database. CNH reports drilling results relatively quickly after wells are completed and posts them as publicly available documents at https://hidrocarburos.gob.mx/statistics/. Various outcomes of the wells such as post-drill status (e.g., dry or discovery), discovered volumes, petroleum fluid type, and commerciality were added to our database after they became publicly available.

The database for this study includes 375 exploration prospects/wells presented at CNH from September 2018 to February 2022. Although the CNH also discusses appraisal and development wells in their sessions, this study includes only the true exploration wells targeting new petroleum accumulations. The exploration forecasts for the prospects/wells such as the targeted formation(s), expected fluid type, geological PoS values, and expected volumes presented in CNH sessions were apparently provided by the operators.

The CNH does not explicitly state which definition of geological PoS is used for the values reported in their summaries of prospects/wells. Further, it is possible that different operators use different definitions. Still, it appears from the presented materials that the definition of the geological PoS for prospects/wells in Mexico is the probability of finding movable petroleum fluids. We will use that definition for the purpose of this study.

The CNH reports in their public presentations only one expected success-case resource value for each presented prospect/well and does not specify the statistical meaning of these values. These values are, most likely, mean values or, possibly, P50 values. For the purpose of this study, we assume that the reported success-case volumes are mean values. Further, there were no clear statements if the presented geological PoS values and volumes were assessed for the prospect or for the well. The difference is that the well drilled significantly down-dip from the crest of the structure is expected to have lower geological PoS but higher success-case volumes (Schneider and Cook, 2017; Milkov, 2021). It is more likely that the geological PoS values and resources presented at CNH sessions are for the prospects. Still, that is not certain, and we use terms “prospect/well” throughout this study.

The prospects have varying subsurface understanding, and some of them perhaps would be better described as “leads” (especially the ones that have been discussed in CNH sessions, but have not been drilled), but we do not have sufficient data to properly distinguish between prospects and lead according to existing definitions (e.g., Milkov, 2015). Some prospects/wells have been presented in CNH sessions two or more times, and the information from the latest session was used in this study.

A total of 85 prospects/wells from the database were drilled in 2019–2021 with completion dates from May 2019 to November 2021 and at least some known geological outcomes (discovery, dry or unknown due to mechanical accident or targeted segment (target) not being reached, discovered petroleum fluid phase, discovered recoverable resources). Figure 1 shows locations of discoveries and dry wells.

Figure 1
figure 1

Map showing locations and outcomes of 80 wells drilled in Mexico in 2019–2021

The vast majority of presented prospects/wells (265 out of a total 375, or 71%) and drilled wells (51 out of a total 85, or 60%) were operated by the Mexican state-owned petroleum company Pemex (Table 1), in both onshore and in shallow-water (water depth < 500 m) offshore environments. Other operators include 10 international oil companies (IOCs: Cairn, Chevron, CNOOC, Eni, Equinor, Hokchi (Pan American Energy), Lukoil, Petronas, Repsol and Shell). These companies together evaluated 38 prospects/wells and drilled 22 of them, in shallow-water and in deep-water (water depth > 500 m) offshore environments. There are also 10 local oil companies (LOCs: Bloque VC 01, IHSA, Jaguar, Maycastle, Newpek, OB12, Pantera, Roma Energy, Secadero and SMB). These companies together evaluated 72 onshore prospects/wells and drilled 12 of them. Although Pemex-operated wells are dominant in this study, there are also many prospects/wells operated by 20 other companies, and the results of this study are representative for the large part of the exploration industry.

Table 1 Statistical characterization of prospects/wells and discoveries evaluated and drilled by operators in Mexico in 2018–2022

Not all generated prospects and drilled wells in Mexico in the studied time period are included in this study. Only those prospects/wells that were presented in CNH sessions and in CNH reports are considered. Prospects/wells for which geological PoS values and success-case volumes were reported for individual segments (targets) and not aggregated into total prospect/well expectations (e.g., the Bitol well operated by Cairn and the Ixcanul well operated by Pemex) are excluded because it is difficult to compare such predictions with reported well outcomes. Still, our results cover ~ 90–95% of all prospects/wells evaluated and drilled in Mexico in the considered time period.

Results

Probability of Geological Success and Rate of Success for Prospects/Wells

The estimated geological PoS values for 375 evaluated prospects/wells vary from 7 to 92% and average at 33 ± 1%. The distribution of all PoS values is displayed in the box plot diagram in Figure 2. The distribution of PoS values for 85 drilled prospects is narrower (14–77%), and the average value is 34 ± 1%. The greater average PoS for actually drilled prospects is expected because the companies, in general, prefer to drill less risky prospects. However, the 1% difference in PoS values between all evaluated prospects/wells and drilled prospects/wells observed in this study is rather small.

Figure 2
figure 2

Estimated probabilities of geological success for all evaluated prospects/wells discussed in CNH sessions and for the drilled prospects/wells. Each subset of data is displayed using box plot which show distribution of values as histogram, average (mean) value (black stars) with its 95% confidence intervals displayed as black vertical bar, median value (dashed red line), first quartile (Q1), third quartile (Q3), lower adjacent value (LAV), upper adjacent value (UAV), and outliers. The first quartile (Q1) is the median of the lower half of the data set; this means that about 25% of the numbers in the data set lie below Q1 and about 75% lie above Q1. The third quartile (Q3) is the median of the upper half of the data set; this means that about 75% of the numbers in the data set lie below Q3 and about 25% lie above Q3. The LAV is the smallest observation that is greater than or equal to the lower inner fence, which is the first quartile minus 1.5 × IQR, where IQR stands for the interquartile range. The UAV is the largest observation that is less than or equal to the upper inner fence, which is the third quartile plus 1.5 × IQR

The geological success rate for the 80 wells with known exploration results (excluding five drilled wells that had mechanical issues and/or did not reach the targeted segment(s)) is 58% (Table 1). This is significantly higher than the average pre-drill estimated geological PoS of 35 ± 2% for these prospects/wells (Fig. 3). Explorers significantly underestimated PoS values (overestimated the geological risks). Figure 4 shows the probability density and exceedance probability of predicted number of discoveries based on pre-drill PoS values assigned to 80 prospects/wells. The most expected number of discoveries was 28, and there was only ~ 9% probability of making 34 or more discoveries. Still, the explorers made many more discoveries than they predicted. The actual number of discoveries was 46. The pre-drill probability of making 46 (or more) discoveries was only around 0.0015%. Since at least some wells were likely drilled down-dip from the structural crests, the pre-drill underestimated of PoS values was even more severe than recorded above. Explorers working for Pemex, IOCs and LOCs, on average, underestimated the geological PoS values (Table 1). Explorers at LOCs underestimate the PoS values the most. (Average pre-drill PoS value was 33%, while actual success rate was 67% (n = 12).) Explorers at Pemex assessed average PoS at 36%, and their wells delivered success rate 61% (n = 46). Explorers at IOCs assessed average PoS at 33%, and their wells delivered success rate 45% (n = 22), so their geological PoS assessments were relatively more realistic, but still too low.

Figure 3
figure 3

Estimated probabilities of geological success (displayed as box plot per description in the caption of Fig. 2) for all prospects/wells with known drilling outcomes and separately for those that became discoveries and dry holes. The actual success rates for these groups of prospects/wells are also displayed

Figure 4
figure 4

Predicted number of discoveries from the pre-drill assessments of geological PoS values for 80 prospects/wells. The actual number of discoveries was 46

Explorers in Mexico underestimated the geological PoS values (overestimated the geological risks) for all subsets of prospects/wells with < 20%, 21–40%, 41–60%, and 61–80% PoS values (Fig. 5). The largest subset with 21–40% PoS values includes 52 drilled prospects/wells with average PoS 30 ± 1%, but the success rate for that group of wells was 52%, so the difference is 22%. That difference is similar or even larger for the other three subsets of prospects/wells with 0–20% (difference 33%), 41–60% (difference 19%), and 61–80% (difference 29%, but there are only two prospects/wells in that subset).

Figure 5
figure 5

Comparison of pre-drill estimated geological PoS values and actual geological success rate for 80 drilled prospects/wells. The pre-drill estimated PoS values are presented for groups of prospects/wells with PoS values 0–20% (n = 8), 21–40% (n = 52), 41–60% (n = 18) and 61–80% (n = 2) as mean values (crosses) and the standard error of mean (error bars)

The 34 prospects/wells that were judged as less risky prior to drilling (PoS > 35%, average PoS 48%) turned into 25 discoveries with the overall success rate 74% (Fig. 6). The group of 46 more risky prospects (PoS < 35%, average PoS 25%) had lower success rate of 46%. The 46 drilled prospects/wells that became discoveries had average pre-drill geological PoS 38 ± 2%, while 34 wells that became dry had average geological PoS 30 ± 2% (Fig. 3). Among wells operated by Pemex, 28 discoveries had pre-drill PoS 40 ± 3% while 18 dry holes had pre-drill PoS 30 ± 2%. Among wells operated by IOCs, 10 discoveries had pre-drill PoS 37 ± 6% while 12 dry holes had pre-drill PoS 30 ± 4%. Among wells operated by LOCs, eight discoveries had pre-drill PoS 34 ± 6% while four dry holes had pre-drill PoS 31 ± 4%.

Figure 6
figure 6

Prospects/wells sorted in the order of decreasing geological PoS and showing the outcomes of drilling (discovery or dry)

Geological success rates vary between different operators from 0 to 100% (Table 1). Companies that operated more wells more closely approached the average success rate by the industry (Fig. 7). That finding is similar to the observations of Milkov and Navidi (2020) based on a much larger data set.

Figure 7
figure 7

Geological success rate versus the number of operated exploration wells by individual companies, all local oil companies (LOCs), all international oil companies (IOCs) and the entire industry (21 companies). The size of circles indicates the number of companies with similar results (e.g., there were three companies that each operated one well and had zero success rate)

Probability of Geological Success and Rate of Success for Prospects/wells with Different Number of Segments

The assessments in this study come from multiple basins and plays in Mexico. It is prudent to assume that, in general and on average, evaluated prospects with more segments should have higher geological PoS values (Delfiner, 2003; Milkov, 2021). However, that is not observed in this data set. Evaluated prospects/wells with one segment have average geological PoS 36 ± 2% (n = 211) which is greater than prospects/wells with two segments (33 ± 1%, n = 98) and with three, four or five segments (27 ± 2%, n = 28) (Fig. 8). There are 79 drilled prospects/wells with known number of segments and known well outcomes (Table 2). The trends of decreasing geological PoS with increasing number of segments somewhat hold for them: 31 prospects with one segment had the average geological PoS 36%, 36 prospects with two segments had average PoS 36%, and 12 prospects with three, four, or five segments had average PoS 30%. (However, note significant standard errors of these mean values due to relatively small sample sizes, Table 2.) The actual geological success rate among drilled prospects with one segment is 61%, and it is 56% for prospect with two segments, and 58% for prospects with three, four or five segments.

Figure 8
figure 8

Estimated probabilities of geological success (displayed as box plots per description in the caption of Fig. 2) for evaluated prospects/wells with one, two and three-to-five segments (targets)

Table 2 Pre-drill average probabilities of geological success for 79 drilled prospects/wells with one, two and three, four or five segments, and actual success rates for these prospects/wells

Predicted and Discovered Petroleum Volumes

The success-case volumes in evaluated prospects vary from 1.1 mmboe to 1492 mmboe (n = 375, average 96 mmboe, median 34 mmboe) (Fig. 9A). The onshore prospects are generally smaller, from 1.1 to 1388 mmboe (n = 241, average 45 mmboe, median 27 mmboe). The offshore prospects are relatively larger, from 5.4 to 1492 mmboe (n = 134, average 186 mmboe, median 51 mmboe). Among the drilled 85 prospects/wells, the onshore prospects are generally smaller (n = 37, from 1.14 to 1388 mmboe, average 82 mmboe, median 30 mmboe) than the offshore ones (n = 48, from 5.4 to 1318 mmboe, average 234 mmboe, median 81 mmboe). This is expected because offshore prospects should be larger to become commercially successful and to justify exploration efforts. IOCs operating offshore have, on average, larger prospects than the state company Pemex, which operates onshore and in shallow-water environment (Fig. 9C, Table 1).

Figure 9
figure 9

Estimated values of geological PoS versus success-case volumes for all evaluated prospects/wells (A, B) and for drilled wells (C, D). Panels A and C separate onshore and offshore prospects/wells (A) and type of companies (C), and display the volumes on a logarithmic scale. Panel B shows all prospect/wells with slightly negative correlation between PoS values and volumes. Panel D shows all drilled prospect/wells with essentially no correlation between PoS values and volumes. Note: 1 mmboe (million barrels of oil equivalent) = 158,987 m3 of oil

It is often expected in the industry that the relatively larger prospects are also more risky (i.e., have lower geological PoS values) (Rose, 1987; Binns and Corbett, 2012). However, there is only a small negative correlation between geological PoS values and success-case volumes (R2 = 0.0065, n = 375) for evaluated prospects/wells in Mexico (Fig. 9B) and essentially no such correlation for drilled prospects/wells (Fig. 9D).

Figure 10 shows the comparison of expected success-case volumes and the actual found volumes for 71 prospects/wells that were drilled and for which the volumetric results have been reported. Thirty-four wells were dry, i.e., they found no resource volumes. Thirty-seven wells were discoveries.

Figure 10
figure 10

Comparison of mean success-case volumes estimated pre-drill and the found volumes in all drilled prospects. Some of the key prospects/wells mentioned in the text are indicated. Note: 1 mmboe (million barrels of oil equivalent) = 158,987 m3 of oil

These 37 discoveries, before they were drilled, were expected to contain from 2.6 to 650 mmboe (average 93 mmboe) of recoverable resources (Fig. 11). In reality, these discoveries contain from 0.09 to 377 mmboe (average 80 mmboe). There is a wide range of predictive accuracy, which is consistent with previous similar studies (Capen, 1976; Uman et al., 1979; Rose, 1987; Fosvold et al., 2000). Companies underestimated the pre-drill resources by as much as ~ 12 times (the Kuun prospect/well operated by Pemex) and overestimated the pre-drill resources by as much as ~ 190 times (the Trevino-2001 prospect/well operated by Newpek). For most discoveries (21 out of 37, or 57%), the discovered volumes were more than two times larger or smaller than the predicted pre-drill estimates. Thirty-two of the 37 discoveries (86%) lie within one order of magnitude envelope relative to the predictions (Fig. 11B). Twenty discoveries contain less petroleum resources than the expected volumes, and 17 discoveries contain more petroleum resources than was expected (Figs. 11 and 12). The median ratio of found to predicted resources is 0.76, reflecting mild pre-drill overestimation of recoverable resources in individual prospects (Fig. 12). The arithmetic mean ratio is 1.5, and it reflects the disproportionate influence of four extreme underestimates (Figs. 11 and 12).

Figure 11
figure 11

Comparison of mean success-case volumes estimated pre-drill and the found volumes in discoveries. Panel A shows data plotted on arithmetic scale, and panel B shows data on log–log scale. Note: 1 mmboe (million barrels of oil equivalent) = 158,987 m3 of oil

Figure 12
figure 12

Cumulative distribution of predicting errors (expressed as ratio of found to predicted volumes) in forecasts of petroleum recoverable resources

Prospects that were judged during the pre-drill evaluation as less risky (PoS 41–80%, n = 13) and that became discoveries contain smaller average and total predicted and found volumes than prospects with moderate geological risks (PoS 31–40%, n = 12) (Fig. 13). The most risky prospects with PoS 21–30% (n = 12) that became discoveries had largest pre-drill average volumes, but contain less resources (on average and in total) than expected.

Figure 13
figure 13

Average (A) and total (B) predicted and found petroleum resources in discoveries that had low, average and high geological PoS values prior to drilling. Note: 1 mmboe (million barrels of oil equivalent) = 158,987 m3 of oil

The future dry holes have pre-drill estimated success-case volumes from 1.1 to 1012 mmboe (average 204 mmboe, n = 34). The future discoveries have pre-drill estimated success-case volumes from 2.6 to 650 mmboe (average 93 mmboe, n = 37).

Pemex-operated 43 prospects/wells for which the volumetric results are known (Table 1). They expected to discover total 855 mmboe (accounting for risks), but found 2059 mmboe, i.e., ~ 2.4 times more. For 25 prospects that became discoveries, Pemex expected the average discovery to be 54 mmboe and the total volume to be 1345 mmboe, but the average discovery is 82 mmboe, and the total volume is 2059 mmboe. Therefore, Pemex significantly underestimated the success-case volumes in their pre-drill evaluations. Importantly, large underestimations of resources for three prospects the Racemosa (factor of 5), the Tupilco-3001 (factor of 7) and the Kuun (factor of 12) (Fig. 8) contributed most significantly to the total underestimation of resources by Pemex.

LOCs operated 10 prospects/wells for which the volumetric results are known. They expected to discover total 56 mmboe (accounting for risks), but found 118 mmboe, i.e., ~ 2.1 times more. On average, LOCs significantly underestimated pre-drill volumes.

The IOCs operated 18 prospects/wells for which the volumetric results are known (Table 1). They expected to discover total 2415 mmboe (accounting for risks), but found 789 mmboe, i.e., ~ 3.1 times less or 33% of the expected volumes. For six prospects that became discoveries, the IOCs expected the average discovery to be 324 mmboe and the total volumes 1941 mmboe, but the average discovery is 132 mmboe, and the total volume is 789 mmboe. The Sayulita discovery operated by Eni had the largest discrepancy (475 mmboe) between the predicted mean success-case volumes (650 mmboe) and the discovered volumes (175 mmboe, or only 27% of the expected volumes). On average, the IOCs significantly overestimated the success-case volumes in their pre-drill evaluations.

Discussion

Consistent Underestimation of Geological PoS Values

Explorers in Mexico consistently underestimated geological PoS values for drilled prospects/wells and made many more discoveries than predicted (Figs. 3, 4, 5 and 6). This finding is consistent with the results of similar studies published in the last century (Holmes et al., 1985; Rose, 1987; Alexander and Lohr, 1998) and in more recent independent studies of exploration performance of an individual company that participated in 25 wells offshore Norway in 2011–2015 (Milkov, 2017) and a group of 10 companies that operated 126 wells in the Netherlands in 2005–2018 (Janssen, 2019). Quirk et al. (2018) documented that most (~ 75%) practitioners in the industry think that exploration teams typically assess the PoS values either correctly or assign too high PoS values, but such opinions are not supported by findings in our study of recent exploration in Mexico.

A positive finding is that explorers in Mexico, in general, correctly distinguished between less risky and more risky prospects, even though they assigned too low PoS values to them (Figs. 3 and 5). However, it is unclear if the 8% separation in average pre-drill PoS values between discoveries and dry holes (Fig. 3) indicates a good exploration performance. This metric addresses the resolution of PoS assessments, which refers to the decisiveness of forecasts or “discrimination” aspect of forecasting quality (Murphy, 1993). Forecasts with good resolution have high probabilities for things that actually happen and low probabilities for things that do not (Tetlock and Gardner, 2015). In the extreme but perhaps unattainable scenario with perfect exploration forecasting, all prospects that become discoveries (the group with 100% geological success rate) should have pre-drill PoS values around 100% and all prospects that become dry (the group with 0% success rate) should have pre-drill PoS values around 0%. (This could be, for example, commitment wells that had to be drilled.) However, there are few publicly available studies of this metric. It is unclear if PoS assessment abilities of explorers behind the wells in Mexico (with 8% separation) reflect the current standard abilities in the industry. Blystad and Søndenå (2005) reported that the average PoS for 300 drilled prospects/wells offshore Norway (1990–2002) was 29% while it was 33% for the 134 prospects/wells that became discoveries. They did not report the average pre-drill PoS for dry holes, so we cannot calculate the exact percent separation, but it was perhaps 8–10%. This indicates little or no progress in explorers’ abilities to distinguish between future discoveries and dry holes through PoS assessments over the last 30 years. Analysis of 25 exploration wells in which one company participated offshore Norway in 2011–2015 showed that geoscientists at that company could not distinguish between future 12 dry holes (average pre-drill geological PoS 27%) and 13 discoveries (average pre-drill geological PoS 25%), and the separation was negative 2% (Milkov, 2017).

There is a discrepancy between the expected greater PoS values and greater success rates for prospects with a large number of segments and the observed opposite assessments and outcomes (Fig. 8 and Table 2). Reasons for that are unclear. One simple explanation is that petroleum explorers behind those prospects/wells in Mexico do not use a proper assessment methodology for prospects with multiple segments (Milkov, 2021). This may be caused by the lack of technical understanding or/and workflow to aggregate segments into prospects, lack of appropriate software to do segment aggregation, or lack of management support for proper segment aggregation. Another explanation is that most one-segment prospects come from the less risky plays than most prospects with two, three, four, and five segments. In more risky plays (or parts of the play) or more risky areas of plays, explorers and managers may consciously stack up multiple targets to justify drilling the prospect/well, even though their intended increase in geological PoS values is not reflected in the PoS values for the prospects/wells due to the lack of proper aggregation process. The lower actual success rate for prospects/wells with multiple segments (Table 2) indicates that such drilling opportunities are indeed more risky. A focused evaluation of individual segments and historical successes rates in their respective plays may help explain the observed discrepancy, but this is beyond the scope of our study.

It is informative to compare our findings from Mexico with findings from similar studies of geological PoS assessments in other countries in the past (Table 3). The average geological PoS for 300 prospects drilled offshore Norway in 1990–2002 (Fosvold et al., 2000; Ofstad et al., 2000a,b; Blystad and Søndenå, 2005) was 29%, and the actual success rate was 45% (Blystad and Søndenå, 2005). The degree of underestimation of geological PoS (difference 16%) recorded in Norway at the end of the last century is similar to the underestimation of geological PoS in Mexico (n = 80, average pre-drill PoS 35%, success rate 58%, difference 23%) in recent years recorded in this study. Another recent example is from 126 wells in the Netherlands drilled in 2005–2018. These wells had average pre-drill PoS 53%, but actual success rate was 64% (difference 11%, Janssen, 2019).

Table 3 Comparison of pre-drill assessments of geological PoS and actual geological success rates in Norway (Blystad and Søndenå, 2005), the Netherlands (Janssen, 2019), and Mexico (this study)

Poor Pre-drill Assessment of Petroleum Resources

Comparison of pre-drill versus post-drilled volumetrics for individual prospects (Figs. 10, 11 and 12) suggests that the pre-drill evaluations of resources are rather poor, and the companies should improve their assessment methodologies and processes. One caveat here is that there are no ranges (described, for example with P90 and P10 values) available for pre-drill volume assessments, and we do not know if discovered volumes were within the predicted ranges (Norwegian Petroleum Directorate, 2018). Another caveat is that resource volumes in discoveries are estimated based on one discovery well, they are uncertain and subject to revision (Rose, 1987) and can become larger or smaller during the appraisal and production (Tennyson, 2005). Data from 40 fields in Norway suggest that petroleum reserves estimated after the discovery at the time of making plans for development and operation of the fields were, on average, almost 20% higher than after production startup (Carstens, 2021), suggesting that the reserves tend to get smaller. Finally, we note that there are no good and widely accepted criteria to judge the goodness of volumetric predictions for individual prospects. Uman et al. (1979) and Rose (1987) seemed to be satisfied with the quality of the predictions when they observed that the vast majority of points on the log–log plot of found resources versus predicted resources fall within the band defined by a factor of 10 above and below the perfect 1:1 line. However, we do not think that assessment errors by a factor of 10 (e.g., when explorers predict 500 mmboe but find 50 mmboe) are good and suggest that the industry should strive to significantly improve its volumetric assessments.

One common approach to judge the quality of exploration predictions is to compare the cumulative risked volumes (i.e., mean success case volumes multiplied by the geological PoS) with the cumulative found volumes (Holmes et al., 1985; Clapp and Stibolt, 1991; Nederlof, 1994; Johns et al., 1998; Blystad and Søndenå, 2005; Milkov, 2017). This approach allows to account for expectations and outcomes for both dry holes and discoveries. Figure 14 shows the results of such an evaluation for 71 prospects in Mexico, for which the pre-drill predictions and post-drill results are available. The total volume expected to be discovered (on risked basis) was 3325 mmboe. The total discovered volume is 2966 mmboe, or ~ 11% less than the total expected resources. This is a good result. However, the close agreement of cumulative pre-drill assessed resources and the discovered resources resulted from several prospects/wells that were drilled close to the end of the sequence and delivered significantly more resource volumes than predicted. After 53 drilled wells, the total discovered volume was 1264 mmboe, or only 49% of the expected 2553 mmboe. The most discrepancy between the expected total volume and the found volume for the first 53 drilled wells resulted mostly from five dry holes with estimated large risked volumes, i.e., large success-case volumes and not low PoS values (Yaxchilan Este operated by Petronas, Alom operated by Cairn, Ehecalt operated by Eni, Xakpun and Xuyi operated by Pemex, Figs. 10 and 14). The sequence of wells from 53 to 71 (Fig. 14) has four large discoveries (Tupilco-3001, Tum, Kuun and Racemosa) operated by Pemex, which all delivered much larger volumes than predicted pre-drill (Figs. 10 and 11). A similar observation was made in the analysis of the performance of one company by Alexander and Lohr (1998). The total discovered volume in that study was similar to the total pre-drill prediction, but the total prediction dropped to 66% of actual if only one discovery is removed from the summary.

Figure 14
figure 14

Cumulative pre-drill assessed risked volumes and discovered volumes versus the time sequence of the drilled wells. Note: 1 mmboe (million barrels of oil equivalent) = 158,987 m3 of oil

We conducted the Cumulative Area Test (CAT) analysis to test the rank correlation between predicted volumes and actual found volumes and to assess the forecasting efficiency for individual prospects (Sluijk and Parker, 1986; Nederlof, 1994). This analysis helps highlight situations in which there is a perfect correspondence between the sum of expected volumes and the sum of actual volumes, but there is little correlation between the individual predictions and actuals. Figure 15 shows the CAT analysis for the studied 71 recent exploration prospects/wells in Mexico. The vertical axis represents the cumulative volume of discovered resources. The horizontal axis represents the hypothetical sequence numbers of the wells testing the prospects. These wells can be ordered in a large variety of ways (71! or about 85 ×  10100 hypothetical sequences). The diagonal (blue dashed) line is the average of all possible hypothetical ranking orders and represents the average cumulative plot if the prospects are drilled in random order (0% forecasting efficiency of the rank). The upper (brown dashed) curve is a plot of actual outcomes in hindsight in decreasing size—that is the theoretical best possible ranking (+ 100% forecasting efficiency). The lower (orange dashed) curve is a plot of actual outcomes in hindsight in increasing size—that is the theoretical worst possible ranking (− 100% forecasting efficiency). The actual line (green) is a plot of cumulative actual outcomes in the order of decreasing predicted sizes. The better the ability of explorers to predict the actual rank order of outcomes, the closer the cumulative plot of the predicted volumes in decreasing predicted size (brown line) will approach the cumulative plot of the actual volumes in decreasing predicted size (green line). The rank correlation (called the CAT correlation, or CAT-r) is defined as the ratio of area between the random line and the observed line and the area in the best case and can vary from + 1 to − 1, with 0 indicating no correlation. In the case of the 71 recent prospects/wells in Mexico (Fig. 12), the CAT analysis shows CAT-r about + 0.2, which suggest rather poor ability of petroleum explorers to predict the actual rank of drilling outcomes (and much worse than in cases presented in two original papers by Sluijk and Parker (1986) and Nederlof (1994)).

Figure 15
figure 15

Forecasting efficiency displayed on the Cumulative Area Test (CAT) plot. Note: 1 mmboe (million barrels of oil equivalent) = 158,987 m3 of oil

Significantly overestimated and underestimated volumes in individual prospects compensated for each other and resulted in seemingly good predictions of total volumes for all discoveries. It is important to investigate the pre-drill versus post-drill volumes at the level of individual prospects and discoveries to avoid the potentially false conclusion of having a good exploration process if the total resource volumes in discoveries are similar to total volumes in corresponding prospects.

A common finding from previous similar studies comparing pre-drill assessed volumes with found volumes is that explorers, on average, overestimated the prospective resources (Lewis, 1984; Slujik and Parker, 1986; Rose, 1987; Johns et al., 1998; Fosvold et al., 2000; Ofstad et al., 2000a, 2000b; Milkov, 2017; Norwegian Petroleum Directorate, 2018; Janssen, 2019; Hoetz et al., 2020). There is a general consensus that the industry is overly optimistic with exploration volumetrics (Quirk et al., 2018). However, our results from the studied data set in Mexico suggest a more complicated picture in which the overall volume predictions for all prospects (on risked basis) and for prospects that became discoveries (on success-case basis) are not too bad (Fig. 14), but the predictions for individual prospects are poor (Fig. 11). Our finding that IOCs overestimate pre-drill volumes is consistent with the results of recent independent studies of exploration performance of international companies (Milkov, 2017; Norwegian Petroleum Directorate, 2018; Janssen, 2019). However, we note that Pemex and the LOCs significantly underestimated the pre-drill resource volumes.

It is worth to compare our findings in Mexico with similar studies in other countries (Table 4). The 300 wells drilled offshore Norway in 1990–2002 found 55% of total predicted resources (Blystad and Søndenå, 2005). The 14-year exploration program with 126 wells in the Netherlands delivered 70% of total predicted volumes (Janssen, 2019). Our study documents a much better delivery of the total expected volumes by 71 wells drilled in Mexico in 2019–2021, only 11% less than the total predicted volumes. Unfortunately, this does not imply that explorers improved their ability to estimate volumes pre-drill. Assessments of success-case volumes in individual prospects in Mexico were rather poor (Fig. 11), and there was a large gap (51%) between expected and found volumes for the first 53 drilled wells (Fig. 14).

Table 4 Comparison of pre-drill assessments of total expected (risked) resource volumes and actual discovered volumes in Norway (Blystad and Søndenå, 2005), the Netherlands (Janssen, 2019), and Mexico (this study)

Conclusions and Lessons Learned

We compiled a data set of 375 evaluated exploration prospects/wells and 80 drilled wells in Mexico using open source data and then compared pre-drill assessments with the drilling results. The average estimated geological PoS for drilled wells was 35%, which is much lower than the actual success of 58%. Contrary to theoretical expectations, prospects with two and three-to-five segments have lower pre-drill geological PoS and lower actual success rates than prospects with one segment. While the total discovered volumes are very close to average pre-drill expectations, resources in individual discoveries differ significantly from the expectations. Our observation of generally poor exploration forecasts by companies is similar to the results from other independent studies, but differs from most recent self-complimentary reports by companies themselves (Möller, 2015; Rudolph and Goulding, 2017; Boyd, 2019; van Mastrigt and Quinn, 2021). This highlights the value of open source and open access data for critical unbiased evaluation of exploration forecasts and performance.

The problem of underestimation of geological PoS and poor assessments of success-case volumes is persistent in the exploration industry. This problem was highlighted in the first statistical comparisons of pre-drill predictions and drilling outcomes in 1980s (Lewis, 1984; Rose, 1987) and has not been fixed since then. Petroleum explorers make more discoveries and find different petroleum volumes than they expect and promise to investors. There are various reasons for that. However, the history of exploration predictions versus results (Tables 3 and 4) demonstrates that petroleum explorers do not learn from history in two senses. First, explorers did not learn to make better exploration predictions over the last 30 years. Second, explorers continue to neglect base rates (relevant historical success rates and sizes of discoveries) in assessments of geological PoS values and success-case volumes (Harbaugh, 1984; Milkov, 2017).

Importantly, the industry-average exploration success rates increase with time because the petroleum-finding technology and skills constantly improve (Milkov and Navidi, 2020). However, the tendency to underestimate the geological PoS values (i.e., overestimate geological risks) and incorrectly assess (typically overestimate) the success-case volumes does not change with time. This is somewhat similar to the general inability of energy experts to make correct forecasts about future oil prices (Wachtmeister et al., 2018) and petroleum production, imports, exports, and ultimately recoverable reserves (Sorrell and Speirs, 2010; Reed et al., 2019). Future oil prices are hard to predict because of dominating epistemic uncertainty (unknown unknowns) in the forecasting models (Reed et al., 2019). However, subsurface predictions of the geological PoS and success-case volumes, which are more foresights than forecasts, deal mostly with stochastic uncertainties (known unknowns) and we believe that exploration industry can and should improve such predictions.

Recommendations

In contrast with findings in this independent study based on open source data, recent studies of exploration performance originating from the exploration companies themselves do not show problems with pre-drill assessments and instead highlight and praise generally good exploration predictions (Möller, 2015; Rudolph and Goulding, 2017; Boyd, 2019; van Mastrigt and Quinn, 2021). There are various reasons for that. However, this observation indicates the need for a better independent tracking and audit of exploration forecasts versus outcomes. Government regulators can facilitate such audit by implementing mandatory disclosure of exploration predictions by companies for each prospect/well that they plan to drill. Agencies such as CNH, NPD, and UK Oil and Gas Authority already made some good steps toward this. Ultimately, the entire industry will greatly benefit if transparent reporting of pre-drill assessments and drilling outcomes becomes available for exploration wells drilled around the world.

Many reasons may explain the lack of improvements in exploration assessments over the last 30 years. For us, these reasons are mostly related to the culture of how exploration assessments are conducted and rewarded and how they influence the decision-making (i.e., which prospects are drilled) in companies. Rose (1987, p. 10) cited “motivational bias related to commercial and corporate pressures” as a plausible explanation for poor exploration assessments and supported that hypothesis by the observation of relatively better assessments by less biased government geoscientists (Uman et al., 1979). There are many anecdotal evidence, but no published proofs, that many companies implicitly encourage explorers to “sell” their prospect to the management, do not track forecasting abilities of individual explorers and do not explicitly punish explorers for poor pre-drill predictions. Johns et al. (1998) presented similar observations. If that is true, the potential ways to improve the assessments are easy to identify, even if they may be difficult to implement.

Exploration assessments are forecasts (Rose, 1987; Milkov, 2020), and companies should aim to improve the forecasting abilities of individual petroleum explorers and exploration teams. A wider implementation of assessment algorithms such as risk tables (Milkov, 2015) can make the assessments less biased and more consistent (less noisy), as has been demonstrated in many other realms of forecasting and decision-making under uncertainty (Meehl, 1954; Goldberg, 1970; Grove et al., 2000; Kahneman, 2011; Kleinberg et al., 2018; Kahneman et al., 2021). A more disciplined use of relevant base rates (historical data on actual success rates and sizes of discoveries) can help close the gaps between exploration expectations and outcomes (Murtha, 1994; Milkov, 2017), as, again, has been shown in other forecasting realms (Kahneman and Tversky, 1973; Lyon and Slovic, 1976; Bar-Hillel, 1980; Koehler, 1996; Tetlock and Gardner, 2015; Pennycook and Thompson, 2016). Instead of relying on individual experts, companies may benefit from using “wisdom of the crowd” either within or outside the company (Milkov, 2020), which, once again, is a proven approach to obtain more reliable forecasts (Armstrong, 2001; Surowiecki, 2004; Cowgill and Zitzewitz, 2015; Simperl, 2015; Da and Huang, 2020). Petroleum exploration companies do not have to invent new ways to become better at exploration predictions and should borrow readily available approaches and methodologies from other domains that use forecasting.

The overarching aim should be to move away from assessments by respect-experts and rely more on algorithmic (mechanical) predictions or/and assessments by experts with proven track-records of successful exploration forecasts. Kahneman et al. (2021) distinguish two kinds of experts. The “true experts” operate in domains where judgements are trackable and verifiable (e.g., weather forecasting and health care), and it is easy to find out who is a good expert by seeing how often they have been right in the past. The “respect-experts” make judgements that do not have the true correct value, and the confidence in their judgement is entirely based on the respect from their peers (as is common in wine testing and book and movie reviewing).

Petroleum exploration is a domain in which the expert judgements are verifiable. (The outcomes of wells can be compared to pre-drill predictions.) Many large exploration companies do regular look-backs and compare the pre-drill predictions with the drilling outcomes at the company level (Sluijk and Parker, 1986; Rose, 1987; Clapp and Stibolt, 1991; Alexander and Lohr, 1998; Johns et al., 1998; Rudolph and Goulding, 2017; van Mastrigt and Quinn, 2021). However, companies (at least to our knowledge) do not explicitly track the forecasting abilities of individual explorers and do not have the evidence-based lists of their best exploration forecasters. Instead, companies and decision-makers routinely rely on the respect-experts, who are, commonly, have more years of experience, deeper expertise in some specific geoscience topics, and greater influencing abilities through their personality traits or official positions as advisors or assurers. These explorers may have only slighter better (if at all) forecasting abilities than the other, less experienced or less-respected explorers (Milkov, 2020).

We offer a simple and relatively easy to implement business solution to improve exploration forecasts and therefore performance. Companies should identify, use, and rely upon the true exploration experts, i.e., those who make exploration predictions which actually become largely correct when verified by drilling results. Good forecasting skills should be rewarded and honed. Implementing this solution would require a significant cultural change in exploration companies. However, the industry-wide historical look-backs from Norway and the Netherlands and this study from Mexico indicate that previous and current approaches to improve exploration predictions clearly do not work, so new ones are needed.