1 Background

The recently launched National Mission on Biodiversity and Human Well-Being (NMBH)1 aims to conserve and restore the rich but rapidly degrading biodiversity of India. Launched by the Prime Minister’s Science, Technology and Innovation Advisory Council in 2019, the NMBH is designed to bring together several disciplines which impact and are impacted by biodiversity. Driven by respected research institutions in India, the NMBH is the first step toward developing the science and for building the capacity needed for the integration of biodiversity in the areas of agriculture, disaster management, climate change, bioeconomy, ecosystem services and health. Post its launch, COVID-19 happened, providing a fillip to the component on biodiversity and health, as the role of zoonotic diseases came into limelight2. The world over, the scientific community is focused on the emerging trans-disciplinary approaches of One Health, a discipline that characterizes the relationships of biodiversity vis-a-vis human and public health3.

The glue that binds the components of the ambitious mission is a geospatial database for cataloguing and mapping life (CML). The design of the CML is largely based on the experience of the India Biodiversity Portal (IBP4), which has been designed to support researchers and interested citizens in collection and collation of biodiversity related data sets. Concurrently, many other systems for biodiversity data have been created around the world, such as GBIF5, with applications ranging from species identification6 to reintroduction7. Modern algorithms using big data driven machine learning (ML)8 and neural networks (NN)9, coupled with sensors with new capabilities such as bioacoustics10 and analytical approaches such as genomics11, are used to complement traditional approaches of biodiversity conservation in situ and in vivo.

Meanwhile, data and models about human health are also becoming increasingly complex, as medical discoveries utilize new computation assisted approaches for health management from prevention to cure for the human body12. In fact, biomedical technologies for curing human health ailments are being projected as the next frontier of growth for the global economy toward an ageless generation13.

2 Human and Public Health Meets Ecosystems

The COVID-19 pandemic has provided an impetus for establishing a closer relation between individual and public health. In looking to quickly tide over this global emergency, the medical community has been spurred on to develop a vaccine to protect the public and reduce individual risk. Whereas a vaccine from the best minds in biomedical research will be welcomed by one and all, public health and biodiversity experts are now under pressure to speed up their work on preventive approaches which include early warning systems, delaying and hopefully even preventing such outbreaks, and if it occurs, better management of such outbreaks.

The existing surveillance apparatus rightly concentrates on early outbreak detection among people, and includes containment and response. While new standards for interoperability14 are being adopted in India for clinical health of individuals, standards are silent about including causal information, such as wild and domestic animal surveillance for understanding the dynamics of the pathogen-host cycles between outbreaks. Such long-term longitudinal surveillance provides insight into disease burden and helps detect possible predictable patterns in outbreaks at a much lower economic cost than responding after the pathogens emerge15.

In an attempt to create an integrated mechanism for surveillance, detection and treatment of such zoonoses, a multi-disciplinary engagement in the form of the Roadmap to Combat Zoonoses in India (RCZI) initiative was established in 200835. The RCZI had identified key thrust areas and provided several strategies for research and action. Yet, large-scale and long-term integrated surveillance, involving human, veterinary and wildlife monitoring have failed to materialise36. As a consequence, we still lag in our understanding of the burden and dynamics of emerging and re-emerging infectious diseases (ERID).

The Indian government’s Integrated Disease Surveillance Project (IDSP), launched in 2004, sought to establish a decentralised state-run India-wide surveillance programme. This programme began with the establishment of surveillance units at the district level, led by a district surveillance officer and a rapid response team to respond to outbreaks. The IDSP has generated clear information flow on outbreaks of 22 conditions and publishes periodic reports of outbreaks on their website16.

While the outbreak detection and rapid response functions are taken care of by the IDSP, the programme is unable to integrate human and animal (livestock and wildlife) surveillance. This is not surprising given that the IDSP is structured within the department of health and thus, there is limited scope for convergence with other departments. Independent evaluations of the IDSP have pointed out the need for its strengthening and have identified key limitations in achievement of timely outbreak detection and proactive monitoring of ERIDs17. An integrated human and animal surveillance system that collects primary data on disease parameters from people, livestock and wildlife is needed as it will improve our understanding of the dynamics of ERIDs and as well as our response (both locally and also policies).

Globally, there are increasing demands for the establishment of responsive and scientifically sound surveillance systems to better understand the connections between deforestation, wildlife, and pandemic risk18 and, possibly to predict outbreaks and the spread of ERIDs. Recent reviews of surveillance systems have recognized that these need to be strengthened in developing countries. There is also moderate evidence to suggest that most efforts in strengthening response to zoonoses have been focused on “laboratory capacity and technical training, with relatively little attention given to the collection of field data, particularly at the interface between human and livestock populations”19.

3 Artifacts: In Silico Models of One Health

The biomedical profession is developing advanced algorithms using machine learning and neural networks to derive hypotheses with strong correlations to enable drug discovery for medicines and vaccines to address human health20. The health industry has been captivated by cost savings through efficient transactions and better diagnostic outcomes through the use of artificial intelligence (AI) techniques21. In fact, current systems of medical informatics focus on human biology only, with most of the research efforts evolving to solve health problems of the individual22. Even in the developed health care systems in the west, the vision of future medical systems does not include much about zoonotic diseases23. Some AI techniques are being used to further derive correlations using large data sets for individual human-centric medicine24.

Meanwhile, there is much to be done to develop proactive, in silico models of One Health for public health related applications for prevention and management of outbreaks. When causal models of outbreaks are known, e.g., free-ranging dogs causing zoonotic diseases, targeted management approaches can be designed using modern tools such as agent-based modeling25. However, the main difficulty with developing in silico causal models of One Health are founded on the lack of data which can help us characterize the ecosystem of pathogens in which the human is simply one actor, who we tend to focus on. Scientists are calling for the NMBH to create a decentralized, national system of surveillance of zoonotic disease outbreaks26 which also will collate data about ecosystems and biodiversity, since it is their degradation due to human actions which leads to ERIDs. But is that enough?

In fact, modeling such complex ecosystems requires us to understand the myriad behavioral patterns of pathogens and other actors who possess different contextual mechanisms of problem solving intelligence best described in the “ants on a beach” parable in Herbert Simon’s classic 1969 book, Sciences of the Artificial,27. It is, therefore, quite understandable that research in One Health calls for decades long, painstaking, and heroic efforts to discover causal linkages28 which can provide sufficient data for deriving correlations with confidence29, and which then can be used as predictive causal models. Surveillance databases need to be coupled with such causal models in the form of knowledge bases to create useful artifacts, i.e., in silico models of One Health.

4 Reasoning with Incomplete Information

The One Health system for data management is a necessary and immediate requirement to enhance our understanding and for rapid response to outbreaks. When such a data management system is available and continually updated, and if we know a well founded causal “law of nature”, we can deduce conclusions from observations. For example:

Causal law: IF all < humans with Ixodes tick bites in the US > have < Lyme disease > .

Observation: < Arundhati > is a < human with Ixodes tick bite in the US > .

Deduction: THEN < Arundhati > has < Lyme disease > .

Deductive rules are represented by the famous syllogism that:

Causal law: IF all < men > are < mortal > .

Observation: < Socrates > is a < man > .

Deduction: THEN < Socrates > is < mortal > .

However, the complexity of ecosystems and zoonotic diseases rarely present such simple situations for the application of rules of deductive logic. Definitive causal laws of nature simply are not established or well founded. Therefore, the analytical approach will still be reactive in nature and largely dependent on correlations between observations and hypotheses generated by the integration of knowledge from the diverse disciplines such as public health, epidemiology, and biodiversity. The research question is whether knowledge from disparate sources can be captured and utilized to create causal models which, in turn, are capable of generating hypotheses for a proactive response to ERIDs.

Recent developments in ML and NN have proliferated in the data analytics community to solve many complex problems. Similar to traditional time series forecasting methods, ML and NN algorithms work well when there is no dearth of data30. Some slight variations in the applications of such algorithms also allow for “learning” and deriving models that fit reality to an acceptable degree31. In fact, all such more or less statistical methods allow for deriving causal models from large datasets for which virologists created the metaphor in Fig. 1 to represent problem solving for prediction of occurrence of the Kyasanur Forest Disease (KFD) in India.

Figure: 1
figure 1

Induction for predicting KFD (an illustration of Boshell’s cup of coffee. Credit: Ita Mehrotra, https://science.thewire.in/science/kyasanur-kfd-rajagopalan-boshell/).

That is:

Case n = 1:

Observation: IF < KFD Virus > is < Present > 

Observation: IF < population > is < Susceptible to KFD > 

Observation: IF < Climate and Environment > is < Conducive for KFD > 

Observation: IF < Vector Population > is < Present for KFD > 

Observation: IF < Susceptible Monkey > is < Present for KFD > 

Observation: IF < Arundhati > is < a human in the population > 

Observation: IF < Arundhati > has < KFD > 

Case n = 2:

Observation: IF < KFD Virus > is < Present > 

Observation: IF < population > is < Susceptible to KFD > 

Observation: IF < Climate and Environment > is < Conducive for KFD > 

Observation: IF < Vector Population > is < Present for KFD > 

Observation: IF < Susceptible Monkey > is < Present for KFD > 

Observation: IF < Arnab > is < a human in the population > 

Observation: IF < Arnab > has < KFD > 

… and so on for all known humans (or mathematically, as n → all members in the population…

Induction: THEN All < humans in the population > have < KFD > 

The corresponding syllogism is:

Observation: < Socrates > is a < man > .

Observation: < Socrates > is < mortal > .

Observation: < Plato > is a < man > .

Observation: < Plato > is < mortal > .

Observation: < Aristotle > is a < man > .

Observation: < Aristotle > is < mortal > .

Induction: THEN all < men > are < mortal > .

The rules of inductive logic are not as automatically applicable as the rules of deductive logic. However, when one has statistically representative datasets of the population, inductive rules can enable low-risk reasoning with some predictive capabilities. History is replete with stories of poor, inductive reasoning leading to beliefs which were difficult to revise. Galileo would have agreed.

Perhaps the most interesting case of reasoning for problem solving arises when there is paucity of data. In such cases, problem solving requires that we make hypotheses and test them as we obtain more information. The painstaking gathering of information, leading to incrementally improving hypotheses leads scientists to causal models such as the one developed by scientists working on KFD. The causal models, often represented as directed graphs, show the current state of knowledge based on whatever information is available.

That is:

Causal law: IF all < migratory birds from Russia > have < encephalitis > .

Observation: < KFD > has same origins as < encephalitis > .

Abduction: THEN < KFD > will be in < migratory birds from Russia > .

But, < KFD > could be indigenous! And, in fact, this was the logic that was used in the quest to find KFD, and found to be an erroneous assumption.

Abductive rules are represented by the famous syllogism that:

Causal law: IF all < men > are < mortal > .

Observation: < Socrates > is < mortal > .

Abduction: THEN < Socrates > is a < man > .

But < Socrates > could be a dog!

Abductive reasoning carries significant risk, and can lead to dangerous assumptions which can have subsequent knock-on effects. Furthermore, such hypothetical models carry the inherent risk of being disproved when additional information conflicts with the information gathered to date.

The scientific method essentially incorporates such “abductive” reasoning based on hypothesis testing, and it was in full display in the mystery of the KFD outbreaks which re-emerged after half a century as an ERID in India. Abductive reasoning was applied to develop hypotheses that small mammals on the forest floor could be the reservoirs for KFD and yet again, was proven wrong. Through a process of hypothesis testing, causal chains such as ‘small mammal-Haemaphysalis-small mammal’ chain, the ‘small mammal-Ixodes-small mammal’ chain, and ‘small mammal-Haemaphysalis-monkey’ chain were all eliminated. Before the development of data intense techniques like ML and NN, the science of AI cultivated sophisticated methods32 to enable building artifacts, i.e., in silico problem solving knowledge bases to emulate such reasoning and support incremental development of causal models.

5 Discussion

The current causal model (Fig. 2) for the re-emergence of KFD was traced to human interventions which reduce biodiversity and provide opportunities for the virus to infest species that they otherwise may not have. The important lesson from the KFD story is that for different types of reasoning to be applied, it is important to develop tools which go beyond simple databases to store and retrieve datasets. It will be important to develop statistical approaches to enable the use of large datasets. But more realistically, it will be important to assist the ecologists, field biologists, epidemiologists, and other scientists with systems which can represent the current state of knowledge, that can be changed as more information is obtained to consolidate and revise the best known models of the time.

Figure: 2
figure 2

Current causal model for KFD (Credit: Ita Mehrotra, https://science.thewire.in/science/kyasanur-kfd-rajagopalan-boshell/).

Models based on incomplete information can be dangerous. They can set up societal trends that can influence societies in good and bad ways33. As the world responds to the COVID-19 crisis with emphasis on health financing34, it would behoove us to invest in technologies that actually assist One Health scientists in building not only databases, but also their knowledge bases toward prevention and management of zoonotic diseases. Investment in developing such comprehensive artifacts for One Health is the need of the day.