Introduction

Information about total removals by a fishery is vital to detect and manage impacts on stocks and ecosystems and so contribute to the long-term sustainability of the fishery. However, if this knowledge comes from reported catches, then it only represents the landed portion of catches (hereafter referred to as landings). That is, such data do not give a complete picture of total extractions because of discarding at sea and any catches that are misreported or not reported at all.

Many of today’s stock assessments use reported catch statistics to estimate population abundance and fishing mortality which lead to management recommendations, so it is vital that all catches are accounted for. Inaccurate reporting can affect estimations for those assessments (Dickey-Collas et al. 2007; Rudd and Branch 2017) and have specific effects on outputs concerning undersized fishes, such as recruitment (Punt et al. 2006; Dickey-Collas et al. 2007). For non-commercial species, a lack of understanding about total catches will limit knowledge of a fishery’s impact on the wider ecosystem, particularly on species of conservation importance (Gray and Kennelly 2018). Knowledge of such bycatches are also necessary for eco-labelling initiatives, such as Marine Stewardship Council certification. In addition to environmental impacts, discarding is also perceived as a waste of resources. Public ownership of wild fisheries resources exists up to the point of retention, so discarded fish are effectively in permanent public ownership (Gray and Kennelly 2018). Governments and managers therefore have an obligation to monitor and reduce this wastage in the public interest. Wasted resources also have the potential to become new market opportunities, improving utilisation and economics sustainability.

A discard ban (also referred to as a landing obligation) can be an effective tool towards accounting for all catches in a fishery, as all catches are supposed to be landed and reported. In a global review of discard ban strategies, Karp et al. (2019) concluded that the success of a discard ban depends largely on the ability to enforce it, coupled with the acceptance and compliance of stakeholders. They also noted that discard bans may introduce complications in gathering high quality data on catches and discards at sea, and so restrict the ability to verify the effectiveness of a ban. These limitations are evident in recent global estimations of discards by Pérez Roda et al. (2019) and Gilman et al. (2020), where discard rates for Norway and Iceland had to be assumed due to low data availability.

Norway first introduced a discard ban on cod (Gadus morhua) and haddock (Melanogrammus aeglefinus) in 1987 to address declining stocks of these species in the Barents Sea. A suite of regulatory measures was also introduced alongside, collectively referred to as the ‘Discard Ban Package’ (see Gullestad et al. 2015 for full description). The measures included real-time closures, compensation for the landing of illegal catches, and development of gear selectivity, all of which aimed to remove incentives for discarding by encouraging the avoidance of unwanted catches. Over the following decades, the discard ban was extended to include more species such that now, under the Marine Resources Act 2008, there is an obligation to land and report all catches. Under the current legislation, there are still exemptions to the obligation,Footnote 1 which include any fish that are alive when discarded, as well as certain protected species that must be released back into the sea immediately regardless of if they are alive or dead, but these must still be recorded in the catch logbook even though they were not retained.

There have been no direct studies that quantified the impact of the Norwegian discard ban on discarding practices, either as it developed or in the ensuing years. Nedreaas et al. (2015) reconstructed total catches for numerous fisheries between 1950 and 2010, reporting a overall decrease in unreported catches after the introduction of the discard ban. Other estimates of discards and unreported catches in Norway (Dingsør 2001a; Valdemarsen and Nakken 2002) indicate low levels of discarding relative to the global average (Pérez Roda et al. 2019), whilst numerous studies have provided snapshot estimates for individual fisheries (e.g. Hylen and Jacobsen 1987; McBride and Fotland 1996; Dingsør 2001b; Breivik et al. 2017). The available estimates, both nationally and for individual fisheries, have been constrained by a lack of at-sea observations throughout time, focussing on shorter timescales and specific fisheries where data are available.

We therefore acknowledge that the Norwegian discard ban is difficult to enforce (Gezelius 2006; Gullestad et al. 2015; NOU 2019), and that the level of discarding in Norwegian waters is still relatively unknown (Gullestad et al. 2015; Nedreaas et al. 2015). The monitoring and management of unwanted catches is a core component of ecosystem-based fisheries management generally (Pikitch et al. 2004; Bellido et al. 2011), but for it to be effective, a better understanding is needed of the scale and causes of unreported catches, and the impacts on ecosystems. However, there is currently no system in place to provide regular estimates of unreported catches in Norway, which are necessary for stock assessments for commercial species and evidence-based management of bycatches.

In this review we aim to identify best practices used globally to estimate unreported bycatches and discards and determine if they can be applied to Norwegian fisheries under a discard ban. To achieve this, we have broken down the process into three stages: (1) defining the scope of a study, (2) data collection, then (3) the estimation procedure used. At each stage, we critically evaluate approaches from the literature to identify best practices, then assess the extent to which they can be applied to Norwegian fisheries, giving focus to the influence of the discard ban. A schematic diagram for this process is shown in Fig. 1, listing the themes addressed at each stage. Through this process, we identify best practice guidelines for estimations of unreported catches which are applicable to fisheries under a discard ban, whilst identifying knowledge gaps and limitations which should be addressed to improve estimations.

Fig. 1
figure 1

Schematic diagram of the themes addressed in this review at each stage of the process for estimating unreported catches

Defining the scope of estimating unreported catches

Defining the scope of a study beforehand helps to guide decisions on data collection and the estimation procedure. In addition, a well-defined scope will provide a firmer understanding of what inferences can be made once an estimation is obtained.

We have not considered some sources of unreported catches in this review due to them being out of scope. Marine recreational fishing has been shown to contribute substantially to total mortality in European fisheries, with evidence that removals from recreational fisheries can exceed commercial fishing in some cases (Radford et al. 2018). Therefore, recreational fisheries must be considered and accounted for in total removals. However, large differences in sampling approaches are needed to adequately address their unique dynamics (e.g. in fishing gear, catch and release practices) (National Research Council 2006), meaning that quantifying unreported catches in recreational fisheries is out of the scope of this review. Mortality of organisms that encounter fishing gear underwater but are not caught is not accounted for in total extractions, which can occur after escapement from gear before it is hauled, either through physical injury or stress (Veldhuizen et al. 2018). This is also applicable to habitat damage caused by fishing gears, particularly bottom trawls, which damage benthic community structures and habitats (Kaiser et al. 2006). Finally, mortality by abandoned fishing gears, known as ghost fishing, can continue to occur indefinitely. Whilst it can have large environmental impacts, it is often addressed in a different management framework (Gilman 2015) and requires a different sampling methodology to quantify mortality.

Terminology

The definitions used in this review are based on those of Kelleher (2005), with specific adaptations highlighted. A fishery is defined as a group of similar fishing gears targeting one or more species in a fishing area or zone. The catch (also referred to as ‘gross catch’) is all biological material retained by the fishing gear and brought on board the vessel. This differs from the definition given by Kelleher (2005) because estimating unaccounted mortality whilst the gear is underwater is not possible using on-board catch sampling methods considered here (see above). After the catch is brought on board and sorted, landings are the portion of the catch that is brought ashore. Discards are defined as that portion of animals in the catch which is thrown away or dumped at sea before landing for whatever reason. It does not include shells, corals, plants, or inorganic materials (sometimes considered a concern of environmental impact), nor processing waste such as offal and carcasses. Discards include slipping, an event typically associated with purse seine fisheries where catches are released before being brought on board. Bycatch is the catch of non-target animals, which can either be landed or discarded. This includes juveniles and undersized specimens of the target species. Unreported catches contain any catches that are not reported upon landing under a landing obligation. They can be separated into three general categories: unmandated catches, illegal catches, and discards (Pitcher et al. 2002). These are expanded upon in the next section.

The terms ‘discard ban’ and ‘landing obligation’ are used synonymously in many descriptions of discard reduction policies and are used as such in this review. However, they are also two distinct legal terms. By definition, a discard ban makes the act of discarding illegal, whilst a landing obligation creates the legal requirement to land and report all catches. This is seen in the history of Norwegian discard policy, where the act of discarding was banned in Norway in 1987, but it was only in 2009 that a “landing obligation” was introduced. In contrast, the reform of the EU common fisheries policy in 2014 introduced a discard ban and landing obligation simultaneously. Therefore, to assess the effectiveness of a discard ban, then total discards must be quantified. The same assessment for a landing obligation requires the quantification of unreported catches to assess the extent to which reported landings reflect total catches.

Unreported catches

Unmandated catches

Global reviews of discard ban policies by Borges et al. (2016) and Karp et al. (2019) found no examples where the discarding of all species is prohibited. Instead, discard bans have focussed on species with quota regulations, aiming to ensure that all catches count towards total catch allowances (e.g. the European Union and New Zealand), whilst others apply to a defined list of species that includes non-quota and non-commercial species, but are not exhaustive (e.g. Norway and Iceland).

While numerous discard bans have addressed the issue of mandatory reporting, there remain difficulties in the resolution of such reports. For some species groups, there can be no mandate to differentiate between individual species. This is particularly the case for elasmobranchs, for which there are substantial knowledge gaps in bycatch information worldwide (Oliver et al. 2015) due to difficulties in species identification and a general lack of reporting. Fishmeal production facilities on-board vessels cause similar problems if individual species contributions are not reported. Whilst all catches will have technically been accounted for in these situations, the lack of detail means they should still be classed as unreported catches for the purposes of estimation and management advice regarding individual species.

The Norwegian discard ban applies to all species in principal, but subsequent legislation has confined mandatory reporting to a list of 55 species or species groups. The overall resolution of species reporting is high across fisheries, but there are a small percentage of species reported to a higher taxonomic level. These are almost entirely elasmobranchs (especially skates and rays), for which species reporting is poor, reflecting the global trend mentioned above. In addition, an increase in fishmeal factories on Norwegian trawlers has led to increased utilisation of unwanted catches but, as above, does not contribute to data about individual species.

Illegal catches

Illegal catches consist of those fish caught that the vessel had no legal right to take (i.e. due to being in closed areas or various gear regulations) or catches intentionally misreported upon landing (Pitcher et al. 2002). Intentional misreporting involves altering catch weights on official records, concealing illegal catches underneath legal catches in boxes, or exploiting difficulties in species identification. This is done to avoid prosecution for illegal fishing, catches being counted towards quotas, or get a better price than if it were legally landed. Fishing in illegal areas or periods requires a presence at sea to detect infringements, whilst intentional misreporting of landings requires portside inspections. Illegal catches are further complicated if one species is misreported as a different species, which results in a combination of under- and over-reporting. On-board fishmeal production or offal processing facilities can also be used to intentionally hide illegal catches. Methods for identifying the species composition of highly processed products require genetic techniques which are rapidly developing, but the detection of low-represented species is still particularly difficult and costly, rendering it currently unfeasible to routinely screen landed fishmeal (Vlachavas et al. 2019).

A study by Pitcher et al. (2009) found that there is poor compliance in fisheries globally. Across all countries, there are difficulties in controlling illegal fishing due to a mixture of poor policy implementation and lack of surveillance. The study assessed compliance with the United Nations Code of Conduct for Responsible Fisheries, finding that Norway had the highest score globally. Since 1990 when a new catch-monitoring system came into force in Norway, it has become increasingly difficult to misreport fish upon landing, especially for offshore fisheries (Gezelius 2006). The new system requires that daily catch logbooks and remote vessel monitoring at sea must match the information in sales notes completed when landing catches, reducing the risk of catches being misreported whilst at sea. Additionally, it is the joint responsibility between buyer and seller to report landings using approved weighing equipment. Finally, unannounced inspections mean that opportunities or incentives to misreport landings have been reduced, improving the reliability that official records accurately reflect what is landed (Gezelius 2006).

Discards

Discarding is caused by a complex combination of regulatory, environmental, and economic factors (Rochet and Trenkel 2005; Feekings et al. 2012; Pennino et al. 2017), all of which vary between fisheries and species. We therefore discuss the specific discard risks for different species groups in the next section. Discarding is further characterised by the conscious decision of skipper or crew to discard. Although discards can be reduced through regulations, improvements in gear selectivity, and improved utilisation of catches, some unwanted bycatches remain unavoidable. Fishing gears are seldom perfectly selective, and there is always the risk of non-compliance. In most cases, a discard ban will reduce discarding compared with fisheries without any discard regulations (Karp et al. 2019), but in worst-case scenarios a ban could increase the risk of discarding if monitoring and control is insufficient (Borges et al. 2016) or if additional management methods do not address any new problems that a discard ban creates (Pennino et al. 2017).

Slipping is considered as a type of discarding in this review because, like general discarding, it occurs during the hauling process, involves a decision by the skipper, and can result in high mortality rates (ICES 2020). Slipping most often occurs in purse seine fisheries as fishing strategies are more targeted towards very specific species and size groups, and catches are larger such that only a small number of hauls are needed to reach quota limits. As catches can be sampled before hauling the entire net, slipping becomes a solution to avoid undesirable catches. Slipping also occurs in trawl fisheries, but this is most commonly due to safety concerns, such as excessively large catches, damaged gear, or poor weather conditions. However, these issues are easier to mitigate as technology has developed.

Species-specific considerations

Different species groups are at risk of misreporting for different reasons and have different degrees of conservation concern (Hall 1996). Estimation procedures and output requirements will differ depending on the species and the need for estimating unreported catches (Anon 2003; Punt et al. 2006; Stock et al. 2018). It is therefore necessary to explore how catches can be categorised and what risks they are exposed to in order to determine the appropriate estimation procedure.

Target species

Due to their commercial value, target species typically undergo stock assessments to regulate their harvesting to achieve long-term sustainability. Therefore, one of the main goals for estimating unreported catches is to improve the accuracy of catch data used in stock assessments. Perretti et al. (2020) suggested that unreported catches should be accounted for, even if there is only a small possibility of their occurrence. This is based on evidence that the largest biases occurred when unreported catches were ignored, compared to accounting for them when they were not present. Rudd and Branch (2017) found that constant misreporting of catches can still produce sustainable estimates of recommended catches, but if misreporting varies over time, then estimates of important parameters become more inaccurate, and catch recommendations become more sensitive to the reporting rate. As a result of poor information on unreported catches, stock assessments can assume a constant value based on expert knowledge or long-term averages. However, this can introduce unknown biases in many aspects of a stock assessment. Whilst it is important to account for unreported catches, a constant rate will hide temporal trends, and may be unwillingly detrimental to the stock assessment.

Target species are generally included under a discard ban as they are typically subject to quota regulations. As a result, they are particularly vulnerable to high-grading, where lower value catches are discarded to make space for those with higher value to maximise the value of quota (Kelleher 2005; Batsleer et al. 2015). The risk of high-grading increases when approaching the quota limit, as a fisher aims for the highest return on the remaining quota. It can also be influenced by seasonal restrictions, minimum size requirements, low market value and storage restrictions during a trip (Batsleer et al. 2015). Despite the complex drivers behind high-grading, it results in the discarded portion having a different size distribution to the portion landed (Batsleer et al. 2015). Whilst high-grading is often based on the minimum landing size (Batsleer et al. 2015), it can also result in discarding of sizable fish if a vessel is actively targeting the largest of individuals (Stratoudakis et al. 1998). This was the case in Norwegian Barents Sea fisheries prior to the discard ban, where high-grading was legal.

Once the target species quota is filled, discarding will not be size selective as all catches of that species must be discarded to avoid penalties (Batsleer et al. 2015). This is especially relevant to ‘choke’ species, a species with low quota that when reached can force a vessel to stop fishing early, even though quotas for other species are available. Over-quota discarding involves large amounts of fish being discarded occasionally, as they are dependent on remaining quota, catch composition and available space on board. Aside from regulatory discarding behaviours listed above, a target species would otherwise be discarded only if damaged. This can occur from the prolonged soaking of passive gears leading to decay or predation, or the overcrowding in the codend of a trawl. Depending on the gear type, species and environmental conditions, damages may or may not be size based (Veldhuizen et al. 2018).

It is particularly in age- or length-based stock assessments where high-grading needs to be considered. Whether assuming a flat rate of discarding across all size groups, or constant size-based discarding across years, not accounting for the high variability in discarding of smaller size groups between years can mask annual variations in recruitment (Anon 2003; Dickey-Collas et al. 2007; Cook 2019), restricting the ability to detect strong incoming year classes that do not appear in reported landings (Punt et al. 2006). However, Punt et al. (2006) showed that if it is over-quota discarding that is the main cause of discarding, then it is unnecessary to account for size-based discarding patterns in the model. Instead, discards have the same length composition as landings so they can be combined to provide total catch estimates. Where both drivers are acting simultaneously, Cook (2019) demonstrated that only accounting for size-based discarding is inadequate if over-quota discarding is also occurring, which can account for as much as 40% of catches.

Justifying the assumption of either negligible or constantly unreported catches is especially important in multinational fisheries in Europe where each country contributes catch data to stock assessments. The magnitude of biases introduced by such assumptions depend on the relative contribution to total catches by that nation. Species with migratory behaviour may be vulnerable to different national fisheries at each life stage. As a result, the need to account for unreported catches of smaller fish (Anon 2003) would become the responsibility of nations whose fisheries overlap with nursery grounds, where the risk of high-grading is higher.

Bycatch species

Discarding of bycatch species with commercial value is primarily driven by market prices and storage space during trips but they can also be vulnerable to high-grading if subject to quotas (Batsleer et al. 2015), as well as becoming choke species if that quota is low relative to other species caught. There is also the risk that non-quota species are used to misreport species with limited quota. Commercial species that do not undergo detailed stock assessments may still be managed for their long-term sustainability. In these cases, size-based estimates may not be necessary, but total catches or numbers landed are still required to quantify total fishing mortality.

Non-commercial bycatches, sometimes referred to as ‘incidental’ catches, are those species that fishers have no intention of catching. Fish in this group can either be directed to fishmeal or discarded, creating a high risk of being unreported. Some of these species could have potential commercial value but are discarded or landed as fishmeal because there is currently no market for them. In these situations, quantifying unreported catches would help to assess the potential to develop a targeted fishery. New knowledge on catches could compliment scientific survey data to build a stock assessment which would provide evidence for a sustainable fishery. This would increase the value of the product, improve utilisation, and may help relieve pressure on more heavily fished alternatives if developed sustainably. Incidental catches also include endangered, threatened and protected species such as marine mammals, seabirds and sharks, and ‘charismatic’ species (Hall 1996) which when caught as bycatch can create a negative perception of the fishery (Gray and Kennelly 2018) and be a strong factor in influencing discard policy (Bellido et al. 2011).

Inaccurate estimates of unreported catches of non-commercial bycatch species will impact on management decisions, sustainability certifications for fisheries, and national import requirements. Management of unwanted catches is focussed on their avoidance under the Norwegian discard ban, so an estimation should consider the factors that influence their capture. For example, Cosandey-Godin et al. (2014) identified that bycatches of Greenland shark (Somniosus microcephalus) were confined to small geographical areas for the duration of each fishing season, but that these areas shifted between years, indicating that active spatial management is necessary to reduce bycatches. Sex- and age-biases are common in estimations of seabird bycatch (Gianuca et al. 2017), as they may influence their habitat or feeding behaviour, which in turn could affect their vulnerability to fishing gear. When monitoring the bycatches of non-commercial species to assess biodiversity and ecosystem function, neglecting fisheries bycatches will lead to an over-optimistic view of sustainability.

A fishery-based estimation of unreported catches

Based on various expert workshops and national reporting systems, it is commonly agreed that it is best to estimate unreported bycatches and discards by fishery (FAO 2015; ICES 2007a; NMFS 2011; Kennelly 2020). Framing the issue of unreported catches in a fisheries context allows for the consideration of unique dynamics and the broader ecosystem. For example, the management actions to reduce discards on one species may have a negative effect on mortality of other species through displacement (Gilman et al. 2019). A fishery-based approach will also complement the structure of sustainability certification assessment. Nevertheless, catch data requirements can differ between stocks depending on the selected assessment model and data availability. Therefore, for estimates of unreported catches to be useful, they should be of a similar type as those used in the stock assessments (Anon 2003), or appropriate for the available management options. This means that whilst estimations should be fishery-based, they should not disregard potential variations between species which would influence data collection requirements and the estimation procedure.

The management framework developed in Norway since the discard ban (Gullestad et al. 2017) provides the foundations for a fishery-based estimation of unreported catches. Fisheries are continuously assessed to prioritise issues such as the gear selectivity of different species groups and direct consideration of discards. Individual stocks also receive a similar assessment, which help to identify individual risks and demand for further knowledge for specific species. Norwegian stocks are also classified based on their economic importance and management objectives (Table 1). Within the table it is important to note that some species of low economic importance are grouped together due to limited knowledge. Estimates of unreported catches of individual species within these groups could help to distinguish them as a defined stock for targeted management.

Table 1 Summary of Norwegian stock classifications.

Difficulties in enforcement and surveillance at sea mean that there is still a continued risk of discarding under the Norwegian discard ban. As a result, it is likely that discarding is still the main source of unreported catches in many fisheries. Improvements in the Norwegian reporting system and at-sea surveillance by the Norwegian Coast Guard and Directorate of Fisheries in recent decades have reduced the risk of discarding, illegal catches, and misreporting (Gezelius 2006; Gullestad et al. 2015). In 2019 the Norwegian Coast Guard conducted 1138 inspections and 738 aircraft surveillance hours with long range photo and video recording (Anon 2020). The use of drones and aircraft surveillance has greatly increased the ability to observe fishing vessels without detection. Nevertheless, there is always some degree of risk of illegal fishing. We have also argued why low-resolution reporting of fishmeal and certain species groups (e.g. sharks and rays) should be classified as unreported catches, even though they have been reported. Therefore, where there are no direct observations of discarding, caution should be used when interpreting the sources of unreported catches.

In fisheries using on-board fishmeal production, it is misleading to assume that unreported catches are a result of discards. Fishmeal production is a positive alternative to discarding, but can still be a source of unreported catches, so acknowledging the contributions will help to improve reporting requirements. Even with direct observations of discarding, it may be important to quantify the mortality of discarded fish, considering the exemption for discarding of live fish under the Norwegian discard ban. Discard survivability can be considerably higher in coastal fisheries where handling times are shorter (ICES 2020), whilst survivability of slipped catches in purse seine fisheries is highly variable, depending on a much wider range of factors, related both to fishing practices and environmental parameters (Tenningen et al. 2012, 2019; Gilman et al. 2013; ICES 2020). In such cases, contributions of discards to total fishing mortality may be overestimated if 100% mortality is assumed. In both these examples, poorly informed interpretations of results could be detrimental to the public image of the fishery and could lead to misguided management and enforcement decisions.

Data collection

The various methods for collecting data on bycatches and discards have been discussed extensively (ICES 2000; Cotter and Pilling 2007; Faunce 2011; Suuronen and Gilman 2020), providing a consesus on many of the benefits and limitations. However, more recent discussions on fisheries data collection under a discard ban (e.g. Kraan et al. 2013; Mangi et al. 2013; James et al. 2019) encourage a new evaluation of methods to address the influences of a ban and the consideration of novel methods and technologies. In this section, we gather the available data sources in Norwegian fisheries, as well as addressing data collection methods not currently used in Norway. Considering the limitations of the discard ban, we evaluate their ability to provide reliable data for estimating unreported catches, taking into account practical and social considerations.

Scientific observers

By far the most trusted method of sampling catches globally is by using on-board scientific observers (Anon 2003; Kelleher 2005; ICES 2007a; Suuronen and Gilman 2020). They are the major source of fisheries data collection in many countries (Karp et al. 2019), such as in the USA where numerous fisheries have achieved 100% coverage (NMFS 2011). Their benefits include the ability to gather a broad range of data including catch composition, biological sampling, post-release survival and species identification (Suuronen and Gilman 2020), all of which can be collected based on a well-defined statistical sampling design to allow for a simple estimation procedure (Lohr 2010). Notwithstanding the above, the presence of an observer may influence fishing behaviour, known as the observer effect (Benoît and Allard 2009), whilst rejections or vessels being unsafe for observers could potentially bias the representativeness of sampled vessels. These effects are likely to be increased under a discard ban, where the presence of an observer would increase the risk of changing behaviour if the observer could witness illegal activity.

Many observer programmes worldwide require observers to report illegal activity on-board (Ewell et al. 2020). Arguments for merging scientific and monitoring roles include the moral obligation to report illegal activity, and improvements in compliance (especially with 100% coverage). However, for programmes focussing on unreported catches under a discard ban, there is an argument for the separation of roles (Cotter and Pilling 2007; Mangi et al. 2013). Even where observations are purely scientific, there could still be concerns from fishers about the later use of such data that could influence fishing behaviour or data quality. A review of 17 mandatory scientific observer programmes worldwide by Ewell et al. (2020) found that all programmes have issues with some aspect of the safety of their observers, regardless of the responsibility to monitor compliance. This includes a lack of measures to address intimidation, obstruction, and blackmail, but at worst, to investigate the disappearance or death of observers at sea. The risks to observer safety and welfare will be mitigated if observer roles are separated, but it is nevertheless important to consider that the presence of the discard ban will likely have negative effects on data quality from such programmes.

Higher observer coverage can reduce bias in estimates of unreported catches, but increasing the coverage without addressing rejection rates may weaken this improvement, or at worse increase bias (Lohr 2010). Increasing coverage is restricted by the high costs involved in maintaining an observer programme (Borges et al. 2004; Mangi et al. 2013). This is particularly the case in Norway where implementing an extensive scientific observer programme has been previously seen as logistically difficult, particularly for smaller demersal vessels. The extensive coastline has many landing sites that are separated by long fjords and mountains, making harbour access difficult for observers.

Remote electronic monitoring

The use of remote electronic monitoring (REM) is rapidly developing as an alternative to at-sea observers. For example, most recently REM programmes have been developed in commercial fisheries in Australia to improve the reliability of data from industry logbooks whilst reducing costs (Emery et al. 2019). Improved data reliability is also the reason for numerous European countries trialling REM in response to the landing obligation (Needle et al. 2014; Ulrich et al. 2015; James et al. 2019). Despite the infancy of REM technology, it is broadly seen as a vital tool in the future of fisheries monitoring (van Helmond et al. 2020), with its efficacy demonstrated as a mandatory requirement (Emery et al. 2019). Nevertheless, James et al. (2019) highlighted that REM cannot provide physical samples such as otoliths for age determination, or data on maturity and sex, all of which can be necessary for stock assessments. Therefore, any data collection programme that uses REM must also include at least some form of human sampling.

Except for a vessel monitoring system, Norway does not have an REM programme for either the scientific monitoring, control or enforcement of catches. Part of the reason is due to technological limitations and high costs (NOU 2019), although both will likely improve as the technology develops (Suuronen and Gilman 2020). However, a more fundamental reason for a lack of uptake surrounds privacy concerns (NOU 2019), which is a serious barrier in the acceptance of REM programmes.

Enforcement and surveillance sampling

The Norwegian Directorate of Fisheries runs the Monitoring and Surveillance Service (MSS), an on-board observer programme for control purposes, which is divided into two categories. Observers can observe passively, gathering data on gross catches whilst the vessel is undergoing normal fishing activity, or they can hire a vessel for a specific objective, such as to identify bycatch hotspots for real-time closures. When MSS observers are passively observing, the observer effect could increase as skippers are concerned about reasons for the data collection. When vessels are hired, data do not represent normal fishing as samples will be clustered, confined to certain areas and times, and possibly contain more bycatches. However, if observations overlap with the active fishery, their representativeness could be justified.

The Norwegian Coast Guard also gathers data on catch compositions through at-sea enforcement inspections. Inspectors board vessels during the hauling procedure so that the skipper has selected the fishing ground without prior influence of the inspection, but vessel selection may be biased by a risk-based enforcement strategy. Alongside comparing logbooks to catches on board, inspectors take a representative sample of length measurements for commercial species to determine if the current haul contains a high proportion of undersized fishes.

MSS and Coast Guard inspectors are obliged to report any illegal activity they observe, making it highly unlikely for discarding to occur in their presence. Nevertheless, MSS and Coast Guard sampling is done on gross catches so still offer relevant information for estimating unreported catches through comparison with reported catches from vessels in the same area and time. An estimation of total retained catches in the Norwegian Economic Zone by Aanes et al. (2011) used Coast Guard inspections, stating that vessel selection is based solely upon the proximity to the pre-defined patrol route. Passive sampling by the MSS was used as the primary data source for the prediction of historical cod bycatch in the Barents Sea shrimp fishery (Breivik et al. 2017). Potential observer effects were deemed to be negligible due to the nature of the monitoring programme, but they did highlight that such assumptions should be reconsidered if the method is transferred to other fisheries.

Self-sampling

An alternative to observer sampling is self-sampling of catches by fishers, either throughout the entire fleet or by a defined group of vessels, known as a reference fleet (or study fleet). Mangi et al. (2013) distinguishes a reference fleet from other forms of fisher self-sampling by its enhanced data collection role. The Norwegian Reference Fleet is a collaboration between the Institute of Marine Research (IMR) and fishing industry, in which active fishing vessels are paid to collect data about their fishing activity and catches during normal fishing operations. It is divided into a coastal and offshore segment, covering both demersal and pelagic fisheries using gears such as trawls, purse seine, Danish seine, gillnets, longlines and traps.

The Norwegian Reference Fleet offers a direct source of information about discards as they are explicitly reported in samples. Coastal vessels began recording discards in 2005, whilst offshore vessels began in 2019. Prior to 2019, offshore vessels recorded gross catches. Sampling protocols differ between offshore and coastal vessels, and between gears, but the general routine involves constant reporting of landed catches and fishing activity, with biological sampling and reporting of discards (or gross catches) at regular intervals (Clegg and Williams 2020). Purse seine vessels also report details of slipping events.

All data recorded by the Norwegian Reference Fleet are property of IMR and are physically isolated from other catch records. An agreement between enforcement and surveillance authorities, IMR and fishers ensures that data shall not be requested for inspection or enforcement purposes. Even though this agreement is not legally binding, there have been no incidences where the agreement was compromised in the history of the programme, creating a trustful environment for fishers. This trust is core to the effectiveness of the programme. Reflecting upon the history of self-sampling programmes in New Zealand (Starr 2010), USA (Johnson and van Densen 2007), Ireland (Hoare et al. 2011; Lordan et al. 2011), the United Kingdom (Mangi et al. 2018) and the Netherlands (Kraan et al. 2013), long-term success relies on maintaining commitment and a strong communication channel between fishers and scientists. With membership in a reference fleet comes ownership in the scientific process, improving two-way support and communication between scientists and fishers and promoting transparency, which in turn will benefit other stakeholders, such as fisheries managers.

To maintain high quality data in the Norwegian Reference Fleet, IMR offers regular training, and IMR staff are assigned to vessels to maintain the sampling programme, regularly visiting vessels and checking incoming data. These data undergo the same quality assurance procedures as scientific survey data before being added to the database. One risk to data quality in long term self-sampling programmes is sampling fatigue (Hoare et al. 2011; Mangi et al. 2018). To alleviate this, the Norwegian Reference Fleet offers four-year contracts to vessels with direct monetary payment for sampling in compensation. An external evaluation of the Norwegian Reference Fleet by Bowering et al. (2011) concluded that based upon these quality assurance procedures, the programme meets the fundamental needs for effective scientific sampling of catches.

The reliability of self-sampling data has been open to question more than data collected by independent observers (Mangi et al. 2013). Based on scientific principles, data collectors should be disinterested in the scientific process. We must therefore acknowledge that fishers collecting the data may have a conflict of interest in the results from the data. Without regular quality control and validation, there is no direct evidence that proper, unbiased sampling protocols are consistently followed. Kraan et al. (2013) concluded that acceptance of self-sampling data by scientists can be hindered by a lack of trust in how the data are collected. The best practice for statistical data validation is to compare self-sampling data with a secondary source of data of known reliability (Fox and Starr 1996; ICES 2007b; Faunce 2011; Kraan et al. 2013), such as from scientific observers, remote electronic monitoring or scientific surveys. Importantly, such validation needs to be considered at all temporal scales to ensure that data quality is consistently maintained (Lordan et al. 2011) such that users have confidence in the data (Bell et al. 2017).

Whilst the Norwegian Reference Fleet maintains a strong quality control system, little has been done to validate it and there is no routine procedure in place for comparison with other reliable data sources. There is potential to investigate if data quality changes when IMR staff are on-board. Similarly, inspections by the Norwegian Coast Guard or passive observations by the MSS are done by independent observers and could therefore offer a suitable comparison. Nevertheless, qualitative evidence of reliability is available through multiple studies estimating the bycatch of species of high conservation importance, namely seabirds (Fangel et al. 2015; Bærum et al. 2019) and porpoises (Bjørge et al. 2013) in coastal gillnet fisheries. Reporting of seabirds and sea mammals by the Norwegian Reference Fleet is notably higher than through official reporting channels, indicating a greater willingness to record sensitive data for scientific purposes.

A fundamental aspect of a reference fleet is its representativeness of the wider fishing fleet (Mangi et al. 2013). The vessel selection process in the Norwegian Reference Fleet limits the use of a truly random sampling design, as it is legally required to follow a publicly transparent tender process (Clegg and Williams 2020). Vessels can voluntarily submit applications, which could introduce bias in vessel selection. Willingness to participate will increase the reliability of data but, as is the case with rejections in observer programmes, vessels willing to participate in a reference fleet may behave differently to those unwilling. To account for this, contracts are awarded based on gear and vessel specifications, fishing patterns and coverage to mitigate bias and ensure stratification throughout fisheries. For a non-random vessel selection where the statistical properties of the sample are unknown, using statistical tests to assess representativeness is not recommended (Anon 2003). Instead, general comparisons in vessel characteristics and fishing behaviour of sampled vessels can be compared to the wider fishery to determine representativeness on a case-by-case basis (Anon 2003). Such studies have been done for the Norwegian Reference Fleet in general (Bowering et al. 2011), but individual studies should be done prior to implementing programmes in specific fisheries. For example, a comparison of estimates of seabird bycatches in the Norwegian coastal gillnet fishery using Norwegian Reference Fleet data and access-point surveys of the broader fleet (Fangel et al. 2015) yielded identical results, giving evidence for the representativeness of reference fleet data for the reporting of non-commercial and controversial bycatches.

Industry data and mandatory reporting

Under a discard ban, official landings statistics are a record of all species landed by commercial vessels and are therefore the reference to which unreported catches are compared. Norwegian vessels must fill out a daily logbook which records information about individual hauls, including locations and total weights of catches per species. Upon returning to port, a landing note is generated which contains all catches on that trip. Through the daily logbook and landing notes, catch and effort data are available for the entire Norwegian fishing fleet.

Regarding data on gross catches, the data collection methods discussed so far have focussed on active sampling programmes which require some form of human observation. However, modern fishing vessels use various electronic instruments to routinely gather data whilst fishing, either for commercial purposes or for mandatory reporting. The most well-known example involves satellite tracking of vessel movement, which is now widely used for control, surveillance and for scientific research (e.g. Aanes et al. 2011). Other sources of industry data include weighing of catches in the codend or on platform scales, and onshore grading machines used in fish markets to grade catches before sale (Mangi et al. 2013).

There are continual difficulties in biological sampling of catches in Norway, leading to large uncertainties in age and length compositions of catches for many fisheries (Bowering et al. 2011). An intercept sampling programme ran by IMR samples landings at specific harbours north of 62 °N latitude, although the programme focusses mainly on coastal vessels landing whole fresh fish. For vessels with on-board factories landing processed and frozen catches, intercept sampling requires the defrosting of products which affects their value, making it unfeasible. Instead, there is the potential to obtain size-based data of fishes during the grading process on board factory vessels before they are frozen, when species are identified then sorted into weight grades. Importantly, the weights of individual fish are recorded for each haul, offering a higher resolution of information necessary for accurate size distributions both spatially and temporally (Plet-Hansen et al. 2020).

There are aims to develop technology to monitor the entire harvesting process in Norway (NOU 2019). This involves automatic recording of catches at the earliest possible stage after hauling, including species identification and individual weights. Such a system would vastly improve knowledge on total extractions from fisheries and reduce the need for estimation studies if there is evidence for high compliance and reliability of data. However, until this goal is met, data from the on-board grading process could provide size-based information on landed catches which can be compared with gross catches to infer unreported catches.

Scientific surveys

Where fisheries-dependent data are unavailable or are inadequate due to reasons such as rare encounters or poor coverage, scientific survey data are a possible alternative (Fox and Starr 1996; Cook 2013). If a survey overlaps with the target fishery in both space and time then it could offer systematic, random sampling robust enough for statistical analysis (Fox and Starr 1996), albeit with caveats. Scientific surveys are very expensive compared to fisheries-dependent data, restricting their spatial and temporal coverage. The survey fishing gears commonly use finer meshed nets to catch a broad range of size classes and species, and towing times are often shorter. If these factors can be accounted for, then scientific survey data can be used in place of, or to enhance, fisheries-dependent data.

Opportunities can arise where specific survey gear has been calibrated against commercial gear in the fishery, allowing for appropriate conversions (e.g. Mayo et al. 1981; Hylen and Jacobsen 1987; McBride and Fotland 1996; Dingsør 2001b). However, routine estimations would require regular calibration studies to reflect developments in gear technology and fishing patterns by the commercial fleet. Otherwise, conversions can be based on theory (Heath and Cook 2015), or under the strong assumption of ‘knife-edge’ size selection of species at a certain length such as the minimum landing size (Mayo et al. 1981), which will introduce further uncertainty. Scientific survey design is generally of a high quality relative to fisheries-dependent sampling programmes, as scientific surveys can be highly controlled, and involve less risk and opportunism. However, the calibration methods required due to the use of non-commercial gears outweighs these benefits. Updating calibrations is not sustainable in the long-term for regular estimates of unreported catches, especially as modern fishing technology rapidly develops. Therefore, studies that have used this approach have acknowledged it is only useful in the absence of direct observations of fishing activity (McBride and Fotland 1996).

More recently, unreported catches have been estimated directly in the stock assessment modelling process, using scientific survey indices and reported catches (Hammond and Trenkel 2005; Bousquet et al. 2010; Heath and Cook 2015; Cadigan 2016), and can also incorporate observations of discarding if available (Cook 2019). In extreme cases where catch reporting is deemed highly unreliable, it can be disregarded completely in favour of an assessment using only research survey data (Cook 2013). Incorporating estimations into the stock assessment model bypasses the need to calibrate fishing gears and will benefit from continual developments in modelling tools and techniques. Whilst improvements could be made to how unreported catches are incorporated into stock assessment models, Cook (2013) acknowledges such a method should not be seen as a replacement for methods incorporating catch data, but instead be an additional tool for comparison where catch data are unreliable.

Utilising multiple data sources

Direct observations still provide the best opportunities for estimating unreported catches, despite the difficulties in observing normal fishing activity at sea under a discard ban. Self-sampling of catches by the Norwegian Reference Fleet alleviates the issue of trust, as data shall not be used for enforcement purposes, and has improved the relationship between science and industry such that results are accepted. Control and enforcement data should not be completely disregarded as a viable data source, despite issues of vessel selection and observation biases. They can serve to enhance scientific sampling programmes where data gaps are present and help particularly in closed areas when identifying bycatch hotspots. The appropriateness of surveillance or enforcement observations need to be determined for each study, requiring expert knowledge of the sampling methodologies to justify their use. Finally, scientific survey data are beneficial only where direct information is unavailable or unreliable (Cook 2013; Heath and Cook 2015), although there are examples of benefits where direct observations of discards have been included in the stock assessment model, utilising both fisheries-dependent and -independent data sources (Punt et al. 2006; Cook 2019).

New data collection methods should also be considered to improve data quality, either as an improvement to current sampling programmes (e.g. REM technologies) or where data are not available. For example, on offshore pelagic vessels, enclosed catch systems limit the opportunities to sample catches at sea. To gain sufficient information in this situation, catch volumes could be monitored using sensors to monitor the pipe system and storage tanks, with complimentary portside sampling providing information on catch composition.

Estimation procedure

A good estimation of unreported catches should be unbiased, precise, and simple (ICES 2007a). However, the scope and design of a study will affect the extent to which this goal can be met. A well-chosen estimator can account for various sources of bias and provide an accurate estimate of the uncertainty. Conversely, a poor estimator can introduce further biases and give a misleading view of uncertainty. In this section, we consider how all the themes discussed so far can influence the choice of the best available estimator.

Design- and model-based approaches

Estimates of unreported catches or discards can be obtained using standard formulae for extrapolations based on defined sampling programmes (e.g. Cochran 1977; Lohr 2010), known as the design-based approach. Design-based estimators rely on probabilistic sampling to ensure that the sample is representative of the population (Lohr 2010), but it is realised that high rejection rates or vessels being unsafe for observers mean that the samples can drift away from a truly probabilistic selection (Table 2). Alternatively, estimates of unreported catches or discards can be obtained using a modelling approach by estimating a set of unknown parameters that explain variations. Model-based estimators do not require probabilistic sampling, but can benefit from randomisation of important covariates, although it is necessary for the range of each covariate to be adequately covered in samples (Cotter and Pilling 2007). Where there are direct observations of discards, then these samples can be extrapolated using either a design- or model-based approach. In the absence of direct observations, then gross catches can be extrapolated to get an estimate of total catches in the fishery, then compared to reported catches to infer misreporting.

Table 2 Summary of design- and model-based solutions to issues surrounding the estimation of unreported catches

General applications of design-based estimators have been adapted for estimating discards and bycatches, producing best practice guidelines for various types of sampling (e.g. Anon 2003; ICES 2007a; Vigneau 2006). They acknowledge that the optimal procedure is highly case-specific, meaning there cannot be a simple, straight-forward method applicable generally. It is therefore necessary for every new study to identify the suitable estimators based on the sampling design and assumptions, then systematically compare them (ICES 2007a). It is common to assume that discards are proportional to an auxiliary variable such as catch or effort, allowing for extrapolation using a ratio estimator (Cochran 1977). However, a review by Rochet and Trenkel (2005) found that in all 17 case studies they considered, both catches and effort were either not influential or had a non-linear relationship with discards. In reality, studies are often constrained by data availability. The auxiliary variable required for extrapolation needs not only to be recorded during sampling, but also documented reliably for the entire fishing fleet. It is therefore possible that studies may only be able to use one procedure to obtain an estimate. In these cases, preliminary studies are still necessary to identify issues beforehand (Borges et al. 2005), as basing estimates on assumptions can introduce unknown bias and uncertainty.

Earlier workshops developing estimation methodologies did not give a large consideration to model-based estimators, mainly due to the absence of suitable case studies (ICES 2000, 2007a). However, over the last two decades there have been advances in techniques for dealing with complexities such as clustered sampling (Harrison et al. 2018), low encounter rates (Martin et al. 2005), spatial–temporal correlation (Rue and Martino 2009) and their extensions to multispecies estimations (Thorson et al. 2017). The appropriate application of these methods can result in reduced bias (Breivik et al. 2017) or improved precision (Stock et al. 2018). These methods have also seen improved computation times and more open-source support, making them more accessible to fisheries studies.

Factors affecting the choice of estimator

If high-grading is to be investigated, then a size-based estimation is necessary. Liggins et al. (1997) compared mean lengths of retained fish sampled at sea and landed catches. Although this was to detect bias in sampling of retained catches at sea, applying the same analysis with gross catches at sea would provide a method for detecting high-grading. This was used by Pálsson (2003) to compare the size distributions of aggregated samples at sea and onshore to model the probability of discard at length (see also Borges et al. 2006), which can then be extrapolated to quantify unreported catches in the entire fishery. Alternatively, multiple fish lengths or ages can be modelled simultaneously using a multivariate modelling approach (Thorson 2019). The Norwegian Reference Fleet is currently the primary source of age- and length-based data in many Norwegian fisheries. An external evaluation of the programme (Bowering et al. 2011) collated comments from various stock assessment working groups to identify that low sampling coverage of vessels and for certain gear types has impacted on the precision of estimates. Where age-length keys are used to estimate catch at age from fisheries, this has resulted in difficulties in estimating catches for those size-groups that are under-represented. The port intercept sampling programme in northern Norway only covers coastal fisheries, and is merged with Norwegian Reference Fleet data to improve size-based data for stock assessments. However, this is based on the assumption that all catches are landed, which requires an estimate of unreported catches to justify. Therefore, the quantification of high-grading is also restricted by the absence of size-based data on landings.

Multiple species estimations may be necessary in highly non-selective fisheries or when obtaining estimates for multiple fisheries for a national or global study. Comparisons can be made by using the same design-based estimator across all species or fisheries (Table 2). For example, global discard studies (Kelleher 2005; Pérez Roda et al. 2019; Gilman et al. 2020) assume a relationship between discards and reported landings, as landings data are more readily available than fishing effort. However, this relationship is not always justifiable (FAO 2015; Kennelly 2020), with discards being more often correlated with fishing effort. Therefore, in cases where both landings and effort are available, both should be used to allow for comparisons. For model-based estimators, a univariate approach can assume the same covariates are driving discarding across all species (Stock et al. 2018), but this is understandably not ideal for species with very dissimilar life histories or catch patterns. An alternative is to determine important drivers for each species (Bremner et al. 2009), which would improve accuracy, but could quickly become unfeasible as the number of species and covariates increased. Finally, multiple species can be modelled simultaneously in a joint species distribution modelling framework (Thorson et al. 2015, 2016). This addresses issues of multi-model approaches, whilst improving accuracy. The approach is particularly beneficial for rare or under-sampled species, where information on the co-occurrence of more frequently observed species can be used to improve accuracy of estimates.

Post-stratification is used due to the inability to select strata before sampling (as is true for the Norwegian Reference Fleet, and a likely scenario in many observer programmes), but it may result in certain strata being under-sampled. A model-based estimator allows unsampled strata to ‘borrow’ knowledge from similar strata where sample sizes are too small for a design-based estimate (Lohr 2010) (Table 2). Nevertheless, ad hoc solutions to poorly sampled strata are available for design-based estimators, such as collapsing the stratification, assuming values based on similar strata, or excluding the stratum from the study (Anon 2003). Stratification is partly based on the hypothesis that environmental conditions influence discards (Rochet and Trenkel 2005). Therefore, solutions to unsampled strata can cause misleading results and should always be justified (Stratoudakis et al. 1999). Any biases introduced from imputation would have little impact if strata were unsampled due to low fishing activity. However, if estimates for heavily fished strata must be imputed, then the imputation method requires a stronger justification.

Probabilistic sampling of rare encounters requires special adaptations in sampling design, which will likely not be accounted for in sampling programmes focused on the broader fishery (Table 2). This can either be in the form of sampling a rare population, such as an endangered species, or the observation of rare but extreme events (Lohr 2010), such as slipping of large catches in purse seine fisheries. Using standard formulae for common occurrences with rare encounters could result in biased estimates and an incorrect estimation of variance (Lohr 2010). Sampling can be adapted to account for this but could be impractical alongside the standard sampling programme for other species. Solutions include the delta-lognormal method (Pennington 1983), where zeros are treated separately to occurrences in the estimator, or zero inflated modelling methods (Martin et al. 2005).

The estimation of total mortality from slipping requires the consideration of more factors in addition to the estimation of rare events. The low number of total fishing operations in purse seine fisheries will alter assumptions about sampling coverage and representativeness compared to other fishing methods. For example, although Reference Fleets sample each vessel and fishing operation without replacement, low sampling coverage can allow for the assumption of replacement to allow for the use of simple estimators (Lohr 2010). However, this assumption may not hold in purse seine fisheries where there are relatively low numbers of vessels and fishing operations each year. Contributions to total mortality from slipping is highly dependent on a complimentary study on survivability. Depending on the timing of the slipping event, catch size and species, mortality rates can range from 1 to 100% (ICES 2020). It is difficult to accurately measure or estimate the weight of slipped catches before they are released (Tenningen et al. 2019). Therefore, a good understanding of mortality from slipped catches would first need to estimate the rate of slipping events, the total biomass of the slipped catches, and the survivability post-release. The diverse methodological and statistical requirements for estimating each of these steps may explain why slipped catches are understudied relative to other sources of unreported catches.

General issues of complexity should also be considered when communicating complex models to stakeholders. Poor communication can lead to misinterpretation, misuse, and mistrust of the results (Cartwright et al. 2016). When selecting a more complex approach, there is a responsibility to involve stakeholders during the modelling process. Scientists should also ensure that the decisions and assumptions are transparent and well-communicated, such that it does not restrict the ability for stakeholders to understand and criticise the results. There was previously an argument for considering the computation time of complex models. However, with advancements in computing power and software development, such run times are now measured in hours or minutes (Rue and Martino 2009; Cosandey-Godin et al. 2014; Breivik et al. 2017).

Performance of estimators

With advances in statistical modelling approaches, there is a strong case for using model-based approaches to estimate unreported catches. Another argument is the reduced dependence on the probabilistic sampling designs necessary for a design-based estimation (Cotter and Pilling 2007). The representativeness of probabilistic sampling may be compromised by rejections or inaccessible vessels, or the inability to do random sampling like the case of non-random vessel selection in the Norwegian Reference Fleet.

The benefits of design-based estimators are their versatility and simplicity, so for modelling to be justified, any improvements from increased complexity should outweigh the simplicity of a design-based approach (Stock et al. 2018). Despite the increasing popularity of modelling approaches, there is still no firm understanding of how they compare to simpler design-based methods. Both design- and model-based approaches can account for a wide range of complexities in an estimation (Table 2). In each case, there will likely be one approach that performs better, but this is dependent upon how such performance is defined.

A common measure of performance of an estimator is the trade-off between accuracy and precision (Amande et al. 2012; Stock et al. 2018). For commercial species, stock assessments require accurate estimates of total catches in the fishery, whilst the monitoring of catches of rare species over time favours precision over accuracy, as the relative changes are important in explaining their vulnerability to capture by fishing patterns over time (Stock et al. 2018). This has been demonstrated by Stock et al. (2018) and Breivik et al. (2017), who both compared spatial–temporal models to standard design-based estimators. Stock et al. (2018) found that model-based approaches performed best across the 15 species considered, despite a small increase in bias. Contrastingly, Breivik et al. (2017) found that a modelling approach reduced bias in estimates, but uncertainty was not estimated for the design-based estimators to allow for a comparison. Considering this trade-off can therefore be a useful tool for deciding the best estimator, taking into account also the factors discussed in the previous section and data availability.

Where unreported catches are estimated within a stock assessment model, there is not the same opportunity to gather multiple estimators for comparison. However, performance can still be evaluated through general best practices for model validation, such as through the reduction of total error in the model (Perretti et al. 2020), and the final model can be tested using well-established procedures such as simulation testing (Cadigan 2016; Cook 2019), cross validation (Heath and Cook 2015) and sensitivity analysis (Heath and Cook 2015).

Conclusions

This review has identified a range of best practices for estimating unreported catches which, whilst in the context of Norwegian fisheries under a discard ban, are framed to be relevant to other discard bans globally where similarities can be identified. We have explored a broad range of aspects related to the estimation of unreported catches, and therefore offer the main conclusions below:

  1. (1)

    If there are no direct observations of discards, then unreported catches can be estimated by comparing gross catches with landings. This limits the interpretation of results and management recommendations for those studies which cannot determine the relative contributions of individual sources, or where survivability of discards should be considered.

  2. (2)

    For estimates to be effective, their required use should be considered in the presentation of results. This includes considering the data structure in a stock assessment or current management plans, and good communication of accuracy and uncertainty.

  3. (3)

    Unreported catches should be estimated on a fishery-by-fishery basis to effectively include fishery-related factors and account for potential consequences on management of other species.

  4. (4)

    Self-sampling of gross catches and discards has the potential to address some of the data collection issues created by the discard ban. Cooperative research can improve trust and transparency between fishers and scientists, which in turn improve the acceptance of data and results (Johnson and van Densen 2007; Starr 2010; Lordan et al. 2011; Kraan et al. 2013; Mangi et al. 2018).

  5. (5)

    Reliability of self-sampling is more open to question than for independent scientific observers. There are still concerns from the scientific community regarding the reliability of self-sampled data, which must be addressed statistically by comparing self-sampled data with another data source of known reliability.

  6. (6)

    Studies can benefit from utilising multiple data sources, either to fill in data gaps or to increase observations, but potential biases should be considered.

  7. (7)

    Representativeness of data should be assessed prior to each study to assess the risk of bias in estimates. Differences in regulations, harvesting strategies and sampling protocols make it unadvisable to generalise across fisheries.

  8. (8)

    Model-based estimators should be applied, especially where non-random sampling designs have been applied. However, comparisons should be made with design-based estimators to justify the increase in complexity (Table 2). A useful method to determine the best estimator is the trade-off between bias and precision, which is in turn determined by the desired use of the estimate.

A fishery-based approach to estimating unreported catches can be readily incorporated into the Norwegian management system, which requires knowledge of total extractions of all species from fisheries, as well as graded objectives for individual fisheries, commercial stocks and bycatch species (Gullestad et al. 2017). Use of the fisheries and stock tables (Gullestad et al. 2017) should help to prioritise studies depending on their demand for estimates of unreported catches.

Various studies have estimated unreported catches in Norway for commercial species as both target species (Aanes et al. 2011) and bycatch (Breivik et al. 2017), as well as incidental catches of species with high conservation importance (Bjørge et al. 2013; Fangel et al. 2015; Bærum et al. 2019). They have utilised a wide variety of data sources and estimation procedures to extrapolate directly from sampled catches or infer from indirect sources. We argue that the Norwegian Reference Fleet has the greatest potential for estimating unreported catches in a wide range of fisheries in Norway. However, it will be necessary to consider multiple estimators to account for the various fleet segments, gear-specific sampling protocols and the characteristics of each fishery. Therefore, where methods are trialled then it should be considered where generalisations to similar fisheries are justifiable. Furthermore, methodologies should be reviewed at defined intervals to address changes in representativeness, sampling protocols, and advances in gear technology.

In considering the usefulness of Norwegian Reference Fleet data, the above recommendations for evaluating the representativeness of data need to be addressed. The vessel selection procedure in the Norwegian Reference Fleet aims for representativeness through expert judgement and random selection from eligible vessels. To assess the extent to which this process behaves like a simple random sample, a devoted study may help to explore the representativeness on a broader scale, whilst identifying those fisheries where the vessel selection procedure or sampling protocols could introduce bias.

The focus on self-sampling in this review is not without regard to the benefits of other methods, but rather due to the demand to identify and evaluate the data sources that are currently available in Norway. Following this, the benefits of REM (Emery et al. 2019) and industry data sources (Plet-Hansen et al. 2020) should be considered to improve future estimations. For example, incorporating REM into the Norwegian Reference Fleet would reduce workload to allow for more extensive sampling of hauls. Utilising data from fish grading systems on board factory vessels could address the current data gap in many Norwegian fisheries regarding detailed size distributions of landed catches (Bowering et al. 2011). The current mandatory reporting requirements generate size-based data which are too coarse for comparison with size distributions of gross catches from the Norwegian Reference Fleet.

Finally, the estimation of unreported catches from slipping is in a much earlier stage in Norwegian fisheries. This is partly because it involves multiple studies to understand the extent, scale, and survivability of slipping events. Sampling protocols in the Norwegian Reference Fleet include the recording of slipping events, but their suitability has not yet been determined. We therefore recommend investing in exploratory studies prior to a devoted estimation to address questions such as data requirements, appropriate sampling designs, and what approaches are suitable to synthesise the knowledge of scale and survivability to arrive at an estimation of total mortality.