The slogan of this year’s World Patient Safety Day on 17 September is ‘medication without harm’. The campaign is focused on ‘traditional’ drugs — chemical compounds — for which regulations and standards have been developed to ensure that they are safe before being given to patients. However, recent advances in medical research have led to the availability of a range of technologies such as cellular and gene therapies, as well as digital and artificial intelligence (AI)-based medical devices, which do not follow the standard research and development or regulatory pathways, prompting the need to cast a much broader perspective on safety standards.

In the specific case of AI, regulatory bodies have begun to take steps to guide preclinical development, so that the formal approval process and potential market entry — which require that interventions are proven to be effective as well as safe in patients — can be made more consistent. To demonstrate the potential relevance and safety of AI in the real world, translational AI and machine learning studies are recommended to follow the US Food and Drug Administration (FDA)’s guiding principles on good machine learning practice for medical device development. These state the need to develop and test models in populations that are representative of the intended patient population with regards to age, sex, race and ethnicity, and the need for independent training and test datasets. Moreover, models based on AI or machine learning are defined by the FDA as ‘software as a medical device’ (SaMD) and acknowledged to be complex interventions that can expose patients to unexpected harm. In those cases, safety is not constrained to algorithmic errors associated with diagnosis or prognosis only, but needs to be considered from the perspective of introducing unintended bias in the delivery of care.

How adverse events present and are assessed in patients requires rethinking in the context of ‘variable’ iterative interventions. Although AI algorithms approved by the FDA have so far been locked before entering the market, the application of models that continuously learn and develop once exposed to real-world populations has to be regulated differently to ‘static’ interventions. The FDA-issued SaMD action plan outlines how the “iterative improvement power” of AI could be facilitated and controlled by asking manufacturers to outline anticipated modifications for pre-market review, such as those that relate to performance or inputs used and that would involve the re-training of SaMDs with new datasets and/or changes to the algorithm architecture. How they will be implemented in a way that manages the risk to patients also needs to be specified. Discussions between ‘interested stakeholders’ and the FDA are ongoing as to the types of modification and methodology for implementation that should be included in this ‘predetermined change control plan’. It is clear that after rollout, manufacturers are likely to be required to continuously monitor and update regulatory bodies on SaMD safety. This real-world monitoring of performance and adverse events requires the highest levels of engagement, transparency and commitment from manufacturers. How this will be achieved is yet to be defined. In addition to ensuring user friendliness of the software to prevent errors in delivering care, manufacturers will have to establish the patients’ trust in SaMDs and appropriately upskill end users, including healthcare personnel and patients, to notice individual safety signals that might become obvious only on a larger scale. However, clarifications are needed as to what degree end users will be involved in the post-marketing monitoring of SaMDs. To limit how far end users have to carry the burden of reporting and monitoring adverse events, user friendliness and systems for safety monitoring should already be factored into the preclinical development of SaMDs.

There is also potential for SaMDs to exacerbate existing health inequalities and systemic biases in the real world, representing risks to patient safety on a large scale. For example, AI-based prediction models of chest X-rays are more likely to falsely predict that patients are healthy if they are members of underserved populations. Training and testing of SaMDs in diverse populations before release is crucial to prevent such outcomes, as well as avoiding the use of socially constructed categories, such as race, in SaMDs. The above example also shows the significant risk of underdiagnosis to patients when SaMDs are used as a single tool for clinical decision making. Drawn up by experts from a wide range of SaMD stakeholders, including researchers, editors and manufacturers, the consensus-based ‘DECIDE-AI’ guidelines recommend that researchers describe how risks to patient safety or inflicted harm were identified and minimized in their studies. To further mitigate risk to patients, values such as social equity have to be maintained and accounted for in AI-supported clinical decision making. In addition, industry involvement and proprietary status of models cannot compromise transparency and independent evaluation before public release, as was the case for the ‘Epic sepsis model’, a prediction model that contributed to poor real-world performance after release. Large-scale collection of patient data using wearable technologies may enable safety signals that represent systemic bias to be picked up, provided that risks such as unnecessary anxiety in patients based on false-positive signals, or amplifying systemic bias, can be limited in the future.

Ensuring patient safety in the context of AI-based and other medical innovations is an ongoing challenge that requires the engagement of all stakeholders, including researchers, regulatory bodies, manufacturers, healthcare providers as well as patients and their advocates. On this World Patient Safety Day, we call for a broader societal discussion of what safety and harm mean in the context of evolving medical technologies.