1 Introduction

Recently, a paper has been published by Koster et al. (2019) entitled “Mineral Oil Hydrocarbons in Foods: is the data reliable?” A careful review revealed that this publication falls short in considering several aspects of assessing the reliability and thus comparability of analytical data produced by different laboratories.

One of the fundamental prerequisites for a sound comparison of analytical data consists in the equivalence of the measurand, i.e. the identity and amount of compound(s) addressed by the analytical procedure(s), targeted by each laboratory. This is of special importance for the so-called 'operationally defined measurands'. These are measurement targets for which the identities (e.g. chemical structure) and/or the measured amount(s) are at least partly dependent on one or more of the (experimental) conditions chosen for the applied analytical procedure from sample preparation to data evaluation. This is the case for most of the analytes in chemical food analysis. Consequently, results from different laboratories in interlaboratory comparisons on identical subsamples are only comparable, if each laboratory would apply the same analytical procedure, including harmonised approaches for, e.g., extraction, clean-up and epoxidation and evaluation of the chromatographic signal. However, the laboratories contributing to the study of Koster et al. had only partially provided details on their analytical methods used, and employed different procedures according to the paper. Therefore, the likelihood of targeting different measurands was high which does not allow establishing a uniform metrological traceability of the analytical results and does therefore not allow a meaningful comparison of the reported data.

2 Further analyses of data and information

Koster et al. took in total six different samples of varying food types, divided them up and sent aliquots of each sample to 10 laboratories with the request for analysis of mineral oil saturated hydrocarbons (MOSH) and mineral oil aromatic hydrocarbons (MOAH). It needs to be emphasised that no information on the homogeneity of the analyte distribution in each food type was available. However, homogeneity data are crucial in the frame of discussing results of an interlaboratory study. For this reason alone, an evaluation of the analytical data presented in Koster et al. (2019) must be classified as questionable. Furthermore, there were no instructions regarding the carbon range to be reported. In addition, the competence, i.e. the performance level, of the contracted laboratories had not been investigated.

Bearing these conceptual drawbacks in mind, the results showed a considerable scatter of data. The authors claim to have demonstrated that the laboratories “simply fail to deliver robust and reliable test results in several food matrices” and attribute this to the following reasons:

  1. (i)

    the lack of validated sample (pre-)treatment steps for the different matrices;

  2. (ii)

    different reporting related to carbon ranges;

  3. (iii)

    the lack of use of confirmatory techniques to verify GC-FID results.

Though the challenge of a relatively large uncertainty which could be associated with MOSH/MOAH analysis results is acknowledged, we dispute the severity of the claims in the paper of Koster et al. MOSH/MOAH are highly complex analytes in terms of composition, sample treatment, analysis and signal interpretation. It requires not only a significant level of laboratory expertise to analyse and to interpret the resulting chromatograms, but also a large competence of the end user of the analytical data assessing them correctly. One of the main difficulties is to make the distinction to other naturally occurring hydrocarbons or hydrocarbon oligomers. It is, therefore, not a surprise that a larger variability of results was observed compared to findings for more established analytes often eluting as single peaks. As the demand for MOSH/MOAH analysis has risen quickly, it is even less astonishing that some laboratories have not yet achieved an acceptable performance level. Nevertheless, the published data show that the results are reasonably reproducible, albeit with room for improvement. The exceptions are laboratories 1 and 10, which almost consistently deviate from the results from other laboratories.

As explained above the application of harmonised procedures by all laboratories is crucial for obtaining MOSH/MOAH results which can be compared. The publication of the JRC Guidance Document on the sampling, analysis and data reporting for the monitoring of Mineral Oil Hydrocarbons in food and food contact materials (Bratinova and Hoekstra 2019) is a step forward into this direction. Since the guidance document was published around the same time as the study of Koster et al. was performed, it was not to be expected that the laboratories had already followed this guideline, so the assumed effect Koster et al. had seems to be highly questionable. The statements in the published paper even demonstrate that the JRC Guidance has not been fully obeyed, as it demands in Section 5 the unambiguous reporting of the analytical procedure applied by each laboratory.

With regards to point (i), even without the availability of internationally fully harmonised methods, it should be obvious from the current literature that analytical results for MOAH from infant formula will not be reliable without the application of epoxidation as also stated in the JRC Guidance. Likewise, this is the case for food containing palm oil (possibly the biscuit powder used by Koster et al.). An experienced MOH analyst would immediately reject results from laboratories (labs 1, 3 and 4), which did not include an epoxidation step. This claim raises doubts on the results provided by these laboratories. Once again, this points to the need that laboratories have to develop a high level of expertise for performing reliable MOSH and MOAH analysis.

In their conclusions Koster et al. wrote that "the JRC guidance document does not provide a harmonized method for food categories other than vegetable oils and fats", implying it should. However, the JRC Guidance never intended to be a work instruction or Standard Operating Procedure. It rather gives guidance on the performance requirements of the analytical approach in order not to limit new developments in this area and covers all foods, as stated in the scope.

With respect to point (ii), the “different reporting related to the carbon ranges” is an aspect on which the JRC Guidance is very clear and should therefore lead to a rapid harmonisation.

Regarding the “lack of use of confirmatory techniques to verify the GC-FID results” (point (iii)), it should be understood that confirmatory measurements can help to verify whether the compounds in the sample are of mineral oil origin, but that they do not verify the quantitative data themselves. Presently, the most powerful method for characterisation of the MOSH and MOAH humps is comprehensive two-dimensional GC × GC. FID is the best detector to quantify MOSH and MOAH, and GC × GC–MS is the best technique for verification. Nevertheless, for the large majority of analyses a correct interpretation is possible without GC × GC. This important clarification has been published before (Bratinova and Hoekstra 2019; Biedermann et al. 2017; BfR 2012).

There are more inconsistencies in the details of the paper. The claim of lacking robustness of the analytical method is demonstrated in Koster et al. by listing results in the figures 1 and 3. There are examples of results which should not have been considered for the meta-analysis at all, such as a result for IF of laboratory 1 in table 3, which are described in table 2, footnote f, as not resulting from the presence of MOSH. Similarly, the results from laboratory 1 for MOAH in biscuits were described in the text as not being MOAH (section Biscuit powder in Results). The authors criticise repeatedly that the JRC Guidance does not make the use of mass-spectrometry as confirmatory technique in case of positive MOH samples a necessary requirement. They falsely claim, "The CEN method (CEN EN 16995:2017) proposes mass spectrometry as a confirmatory technique to help characterise the chemical composition of mineral oils that supports a root-cause analysis". The exact CEN EN 16995:2017 statement is "In case of suspected interferences from natural sources, the fossil origin of the MOSH and MOAH fraction can be verified by examination of the pattern by GC–MS", very similar to the phrasing in the JRC Guidance, so that the experts could decide on a case-by-case basis. Furthermore, Koster et al. have demonstrated in their own study that those confirmatory techniques are not the ultimate solution, as two of the three laboratories using mass spectrometry (lab 1 and 3) failed to characterise the hump as MOSH (see p. 80).

Overall, there is a lack of scientific rigor apparent in the publication. Apart from the points mentioned above, no attempt was made to investigate where individual laboratories may have failed. This would have been a contribution towards identifying potential procedural problems and thus enhancing reproducibility. For instance, there are several hypotheses made for some of the reported values, especially for the outlying results, where an in-depth examination would have been necessary. Despite the methodological shortcomings in Koster et al. with respect to the reported carbon ranges and the lack of homogeneity testing of the samples, the results for many of the analyte/matrix combinations are consistent (though sometimes with a broad distribution). In the last few months, it became apparent that infant formula is a very difficult matrix. Much of the analytical problems seem to be associated with the total extraction of MOH, which could be a challenge for any lipophilic analyte in such a matrix. MOSH/MOAH could be encapsulated in the spray-dried infant formula particles. Therefore, an initial release of the analytes (e.g. by saponification) seems to be the only way to extract and to determine the total MOSH/MOAH content. However, the authors do not provide any information for that step. Consequently, the reported data could not be grouped together as the measurand may have been different depending on the extraction techniques.

3 Conclusion

In summary, the interlaboratory comparison performed by Koster et al. demonstrated how scattered analytical results on presumably homogeneous samples can be, if the participating laboratories are not measuring the same targets. The authors did not take into account that by applying different methods, in reality, the participating laboratories had addressed different measurands, making their results incomparable for any user. No attempt was made to analyse the reasons for the variation by comparing the raw data. Moreover, laboratory results with obvious technical flaws were not excluded. We envisage that an ongoing and increased harmonisation of the analytical steps in the procedure for MOSH and MOAH analysis should improve the comparability and reproducibility of laboratory results in the future. This will require taking the principles of good analytical practice into account, and enhancing collaborations within the scientific community and competence building.