1 Introduction

At the beginning of the outbreak of COVID-19, the general perception was that the transmission of the disease occurred mostly through the infectious persons having influenza-like symptoms. One reason behind this view was the similarities between the SARS pandemic in 2003 caused by SARS-CoV-1 and the current threat of COVID-19 caused by SARS-CoV-2. Both SARS and COVID-19 patients show similar influenza-like symptoms, propagate infections through person-to-person contact via respiratory droplets generated when an infected person breathes, coughs and sneezes. Measures taken to successfully control the SARS in 2003 were majorly based on testing those having symptoms and isolating the positive cases. Nevertheless, a similar guideline has not been very effective in controlling the COVID-19 as the number of infected individuals has gone past 4 million in India and 27 million worldwide. Despite severe containment measures, these numbers are several orders of magnitude higher than SARS in 2003 that reported 8422 cases worldwide with a case fatality rate of 11%. In India, even when the nation was under lockdown over the past few months, a noticeable surge in the new COVID-19 positive cases was observed (Fig. 1). Initially, when the nationwide lockdown was enforced on March 23, the infection growth showed a gradual slowdown for a few weeks. However, as the days passed, the number of daily new confirmed cases increased significantly which prompted us to revisit our earlier work [1] on the COVID-19 outspread in India using the well-known SIRD model [2,3,4,5,6,7,8,9]. Normally, an individual exposed to the disease sufficient to catch the infection goes through an incubation period of several days (\(\sim\) 5 days [10]) before showing symptoms, i.e., presymptomatic. However, many reports [11,12,13,14,15,16] suggested that a large fraction of those who catch the disease do not show any symptoms at all, i.e., they remain asymptomatic (tested positive for SARS-CoV-2 RNA but show no signs of illness). The presence of the virus in presymptomatic or asymptomatic persons means that they can transmit the disease to others. In this study, we investigate whether the trajectory of the continuous upsurge of infections (Fig. 1) can be attributed to the following characteristics of this disease spread: (a) in addition to the symptomatic cases, the abundance of the infected individuals who do not develop any symptoms at all or remain mildly symptomatic, before recovery (asymptomatic) and (b) human-to-human transmission via those asymptomatic carriers. We use a modified version of the well-known SEIRD model in epidemiology [17, 18] as a working handle to account for the asymptomatic infections (Fig. 2). The model analysis as per the current data for India and few Indian states shows that the dynamics of the infection spreading is highly influenced by the asymptomatic population. The real data for India and various states within the country are acquired from the repository with an interactive handle hosted at https://www.covid19india.org. The purpose of this article is neither to influence policy decisions nor to put forward any quantitative projections that should be used to design public health guidelines.

Fig. 1
figure 1

Increment in the cumulative number of confirmed cases in India. The bars boxed in red highlight the surge of infections in the early stages of COVID-19 pandemic in India (color figure online)

2 Model

In the standard SEIRD model, the population N is divided into subpopulation of susceptible (S), exposed (E), infected (I), recovered (R) and dead (D) for all times t. Thus, \(N = S + E + I +R + D\). In our case, in order to account for the asymptomatic population, we introduced several new ‘subpopulation’ compartments and modified the corresponding flows between these compartments. The infected population I in SEIRD model is subdivided into two compartments: (a) infected symptomatic population \(I_{s}\) and (b) infected asymptomatic population \(I_{a}\) (Fig. 2A). As per the WHO and the ICMR, India reports, the COVID-19-infected individuals can be divided into two subgroups: (a) symptomatic carriers and (b) asymptomatic carriers, infected individuals without any symptoms. The asymptomatic population is reported to be several times larger in number compared to the symptomatic population (Fig. 2B). Similarly, the population of recovered R in the SEIRD model is also classified into two new compartments: (a) population recovered from symptomatic infection \(R_{s}\) and (b) population recovered from asymptomatic infection \(R_{a}\) (Fig. 2A). The population of dead (D) has inflow from \(I_{s}\) only (Fig. 2A); we assumed no death for the asymptomatic individuals due to the disease.

Fig. 2
figure 2

Schematic representation of the modified SEIRD model used for mapping the current COVID-19 trajectory in India’s context. (A) The arrows denote the flux direction between the compartmentalized subpopulations of susceptible (S), exposed (E), symptomatically infected (\(I_{s}\)), asymptomatically infected (\(I_{a}\)), recovered from symptomatic infection (\(R_{s}\)), recovered from asymptomatic infection (\(R_{a}\)) and dead (D). (B) The COVID-19-infected individuals can either develop symptoms or remain asymptomatic throughout before recovery. According to the reports by the WHO and the ICMR, India, the asymptomatic infections can be as large as 4–5 times of the total symptomatic infections

The following set of mean-field differential equations dictates the time-dependent dynamics of the population of susceptible (S), exposed (E), symptomatically infected (\(I_{s}\)), asymptomatically infected (\(I_{a}\)), recovered from symptomatic infection (\(R_{s}\)), recovered from asymptomatic infection (\(R_{a}\)) and dead (D), comprehensively capturing the time evolution of the outspread across the total population.

$$\begin{aligned}&{\frac{\mathrm{d}S(t)}{\mathrm{d}t}=-\beta _{s} S I_{s}/N - \beta _{a} S I_{a}/N} \end{aligned}$$
(1)
$$\begin{aligned}&{\frac{\mathrm{d}E(t)}{\mathrm{d}t}=\beta _{s} S I_{s}/N + \beta _{a} S I_{a}/N - \lambda E} \end{aligned}$$
(2)
$$\begin{aligned}&{\frac{\mathrm{d}I_{s}(t)}{\mathrm{d}t}= \phi \lambda E - (\gamma _{s} + \delta ) I_{s}} \end{aligned}$$
(3)
$$\begin{aligned}&{\frac{\mathrm{d}I_{a}(t)}{\mathrm{d}t}= (1-\phi ) \lambda E - \gamma _{a} I_{a}} \end{aligned}$$
(4)
$$\begin{aligned}&{\frac{\mathrm{d}R_{s}(t)}{\mathrm{d}t}=\gamma _{s} I_{s}} \end{aligned}$$
(5)
$$\begin{aligned}&{\frac{\mathrm{d}R_{a}(t)}{\mathrm{d}t}=\gamma _{a} I_{a}} \end{aligned}$$
(6)
$$\begin{aligned}&{\frac{\mathrm{d}D(t)}{\mathrm{d}t}=\delta I_{s}} \end{aligned}$$
(7)

The parameters \(\beta _{s}\), \(\beta _{a}\), \(\lambda\), \(\phi\), \(\gamma _{s}\), \(\gamma _{a}\) and \(\delta\) regulate the temporal flow between different subpopulations of infected, recovered and dead (Fig. 2A). Note that, in the above set of equations, the susceptible population (S), upon interactions with symptomatic and asymptomatic population, becomes exposed (E) to the disease with contact rates \(\beta _{s}\) and \(\beta _{a}\), respectively. The model construction alludes that the individuals belonging to the exposed population are not contagious; hence, they do not transmit the disease by infecting others. The disease transmission from one individual to another occurs when an exposed individual transforms into either a symptomatic or an asymptomatic carrier of the disease (Fig. 2A–2B). The time-dependent outflux from the exposed compartment (E) into symptomatic and asymptomatic compartments (\(I_{s}\) and \(I_{a}\)) is governed by the rate \(\lambda\). The outflux from the exposed compartment with the rate \(\lambda\) develops symptomatic infections with a probability \(\phi\) and asymptomatic infections with a probability \((1-\phi )\). The recovery of a symptomatic infected person and an asymptomatic infected are determined by the rates \(\gamma _{s}\) and \(\gamma _{a}\), respectively. An infected individual ceases to transmit the disease once s/he is recovered. Previous reports [10, 19, 20] indicate that, on average, contagiousness begins to pronounce from 2–3 days before the appearance of symptoms. The contagiousness (disease spreading capability) of a symptomatically infected individual mounts to its peak before the symptoms develop and stays for about 7–9 days after the peak infection is passed. Therefore, a symptomatic individual bears the ability to transmit the disease for nearly about 12 days on average before recovery. However, how long it takes for an asymptomatic individual to recover is hard to ascertain, as a large number of asymptomatic infections remain undetected due to the sheer lack of symptoms. For symptomatic recovery, the timescale of \(\sim\) 12 days provides an estimate of the magnitude of the symptomatic recovery rate \(\gamma _{s}\). In the preliminary model analysis, the magnitude of the asymptomatic recovery rate \(\gamma _{a}\) is chosen to be close to the magnitude of symptomatic recovery rate \(\gamma _{s}\). Deaths (D) within the symptomatic population occur with a rate \(\delta\). We assume that infected, yet asymptomatic individuals do not die due to this disease. Nevertheless, it is imperative to mention that all the relevant rate parameters are varied within a reasonably feasible range to obtain the best fit of the theoretical curves with the real data.

2.1 Estimating the spread of the disease, the reproduction number

The basic reproduction number denoted by \(R_{0}\) is a crucial parameter in epidemiology that can indicate the pandemic situation of an infectious disease. It is defined as the average number of secondary infections produced by a primary infection seeded into a sea of the susceptible population and its value is suggestive to implement a control intervention for containing the disease. However, in most of the practical situations, it is difficult to realize the single primary infection that has caused the outbreak; besides all contacts may not be susceptible to the infection. Therefore, in the present circumstances, instead of using the term basic reproduction number, we denote this by an effective reproductive number \(R_{e}\). The effective reproductive number is determined by various rates presented in Eqs. (1) to (7). In general, an epidemic will begin when \(R_{e}>1\), so the number of infections increases. Pandemic will become an endemic when \(R_{e}\) becomes 1 and \(R_{e}<1\) will eventually diminish the number of infections. The effective reproduction number \(R_{e}\) can be computed from Eqs. (1) to (7) using the next-generation matrix NGM [21]. An expression for \(R_{e}\) in the prelockdown situation is given by:

$$\begin{aligned} R_{e} = \frac{\phi \beta _{s}}{\delta + \gamma _{s}} + \frac{(1-\phi ) \beta _{a}}{\gamma _{a}} \end{aligned}$$
(8)

The mathematical treatment for obtaining the closed-form expression of \(R_{e}\) using NGM is similar to that given in [17].

2.2 Accounting for the effects of enforced containment measures

Before discussing the effect of containment measures on the infection dynamics, we need to highlight the subtle differences in the connotations of the following terms: (a) lockdown, (b) containment measures and (c) social distancing. We stress that a lockdown in the model indicates the containment measures enforced across the country. But, even without a countrywide lockdown, containment can be imposed locally in the hotspots of infections (as being implemented during the current unlock phase) to prevent further spread of the disease into a larger region. In a sense, the word ‘lockdown’ bears an essence of universality, whereas the implementation of containment can be both local and universal. However, social distancing is not an external norm to be enforced. It is rather a choice of lifestyle where one maintains a ‘good’ habit of physical distancing with other individuals in a public place/crowded environment. Ideally, even if there is no lockdown and/or containment measures in place, strict maintenance of social distancing can significantly reduce the chances of new infections. In the manuscript, in a general sense, where we mention the term ‘lockdown,’ we imply that containment measures and/or social distancing are in effect.

Under normal circumstances, the implementation of the containment measures would reduce the interaction of the susceptible population (S) with the infected population (\(I_{s}\) and \(I_{a}\)). As gleaned from Eq. (1), the susceptible population, while interacting with the symptomatically and asymptomatically infected population, becomes exposed to the disease with rates \(\beta _{s}\) and \(\beta _{a}\), respectively. We can consider that the direct impact of the enforced containment measures (lockdown in a nationwide sense) and social distancing would be reflected on \(\beta _{s}\) and \(\beta _{a}\) in Eqs. (1) and (2). The susceptible-to-exposed transition rates \(\beta _{s}\) and \(\beta _{a}\) should diminish with time as the lockdown is enforced and continued. Hence, as a simple choice, we set the susceptible-to-exposed transition rates \(\bar{\beta _{s}} (t)\) and \(\bar{\beta _{a}} (t)\) in such a way that these rates gradually decline with time as the containment measures are put in place [6, 7]. Prior to the lockdown, \(\beta _{s}\) and \(\beta _{a}\) are constant. During the lockdown, the time-dependent \(\bar{\beta _{s}}(t)\) and \(\bar{\beta _{a}}(t)\) are assumed to decrease exponentially as depicted in the following [6]. During the mixing between susceptible population (S) and symptomatically infected population (\(I_{s}\)),

$$\begin{aligned}&\bar{\beta _{s}}(t) = \beta _{s} (1 - \xi _{s}) e^{- [(t - \tau )/T]} + \xi _{s}\beta _{s} \ \text {for} \ t > \tau \end{aligned}$$
(9)
$$\begin{aligned}&\bar{\beta _{s}}(t) = \beta _{s} \ \text {for} \ t \le \tau \end{aligned}$$
(10)

Similarly, during the mixing between susceptible population (S) and asymptomatically infected population (\(I_{s}\)),

$$\begin{aligned}&\bar{\beta _{a}}(t) = \beta _{a} (1 - \xi _{a}) e^{- [(t - \tau )/T]} + \xi _{a}\beta _{a} \ \text {for} \ t > \tau \end{aligned}$$
(11)
$$\begin{aligned}&\bar{\beta _{a}}(t) = \beta _{a} \ \text {for} \ t \le \tau \end{aligned}$$
(12)

Here, the lockdown (enforcement of the containment measures) begins on the day \(\tau\) which is calculated from the day 0 (start date or initial time \(t = 0\)) set in the simulation. The parameters \(\xi _{s}\) and \(\xi _{a} \in [0,1]\) are measures of interaction between the susceptible and infected population (symptomatic and asymptomatic, respectively); T denotes the timescale that determines how fast the effect of lockdown on infection transmission becomes prominent. \(\xi _{s} = \xi _{a} = 1\) means that there is no lockdown; the infected population is freely interacting with the susceptible population and exposing the susceptible individuals to the disease at rates \(\beta _{s}\) and \(\beta _{a}\). \(\xi _{s} = \xi _{a} = 0\) is the idealized scenario of lockdown where all the infected individuals (both symptomatic and asymptomatic) are isolated/quarantined. Hence, any sort of interaction of the infected individuals with the susceptible population is ruled out. The factors \(1- \xi _{s}\) and \(1- \xi _{a}\) characterize the asymptotic reduction of \(\bar{\beta _{s}}(t)\) and \(\bar{\beta _{a}}(t)\) due to intervention/containment measures.

The time-dependent form of \(\bar{\beta _{s}}(t)\) and \(\bar{\beta _{a}}(t)\) is a non-unique choice consistent with the overall trend of the disease progression and tuned reasonably to arrive at the best fit with the available data. In the current form of \(\bar{\beta _{s}}(t)\) and \(\bar{\beta _{a}}(t)\), when T is considerably large compared to \(t-\tau\) (and \(\phi = 0\) ), we revert to the classic SEIRD model for symptomatic growth with a constant \(S \rightarrow E\) transition rate \(\beta _{s}\). Smaller values of T mark faster mitigation of \(\bar{\beta _{s}}(t)\) and \(\bar{\beta _{a}}(t)\). Although the choice of exponential decay is arbitrary, it has been exploited in prior COVID-19 modeling studies [1, 6, 7] and HIV/AIDS models [22, 23]. We also tested other predetermined, time-dependent functional forms of \(\bar{\beta _{s}}(t)\) and \(\bar{\beta _{a}}(t)\) such as compressed exponential and linearly decaying functions. The model outcomes/features qualitatively remain unchanged subjected to these choices. To determine \(\bar{\beta _{s}}(t)\) (and \(\bar{\beta _{a}}(t)\)), a more effective choice would be to compute/extract a dynamic \(\bar{\beta _{s}}(t)\) (and \(\bar{\beta _{a}}(t)\)) from the reported infection data instead of regulating the time evolution via any predetermined functional form. A recent study on COVID-19 pandemic using SIRD model proposes a novel algorithm of dynamically evaluating the transmission rate from reported deaths [7]. Adopting a similar protocol in our SEIRD model and exploring the disease progression in the Indian context would be an interesting future endeavor.

During the lockdown, \(\beta _{s}\) and \(\beta _{a}\) are modified as depicted in Eqs. (9) to (12). Hence, the mathematical form of \(R_{e}\) during the lockdown period stands as

$$\begin{aligned} R_{e} = \frac{\phi \bar{\beta _{s}}}{\delta + \gamma _{s}} + \frac{(1-\phi ) \bar{\beta _{a}}}{\gamma _{a}} \end{aligned}$$
(13)

In longer time limit, \(R_{e}\) becomes,

$$\begin{aligned} R_{e} = \frac{ \xi _{s}\phi \beta _{s}}{\gamma _{s} + \delta } + \frac{ \xi _{a} (1-\phi ) \beta _{a}}{\gamma _{a}} \end{aligned}$$
(14)

It is evident from Eq. (14) that the first term in the expression of \(R_{e}\) concerns about the disease transmission via the symptomatic carriers, whereas the second term deals with the transmission through asymptomatic carriers. In a population under lockdown, it is plausible to mitigate \(R_{e}\) through the gradual decrease in \(\xi _{s}\) and \(\xi _{a}\), the interaction parameters. In essence, this is consistent with the default definition of \(R_{e}\); diminishing the interaction between the susceptible and the infected reduces \(R_{e}\) and vice versa.

Since the constituting compartments of the model are S, E, \(I_{s}\), \(I_{a}\), \(R_{s}\), \(R_{a}\) and D (Fig. 2), from now on, we can call the model adopted here, with the acronym of \(SEI_{s}I_{a}R_{s}R_{a}D\) model.

All the relevant parameter values are listed in Tables 12.

3 Results

We investigated the role of asymptomatic population in propagating the COVID-19 infection in India and various states within the country using the \(SEI_{s}I_{a}R_{s}R_{a}D\) model. The results are elucidated in the following and represented in Figs. 36. To begin with, we varied the initial susceptible population \(S_{0}\) within a range of 100–300 million for India (Fig. 3) and explored how the infected population is getting subdivided into two categories: (a) symptomatic population and (b) asymptomatic population; moreover, how the asymptomatic population is contributing to the recent surge of infections in India (Fig. 1, Figs. 34). Next, we explored how the symptomatic and asymptomatic infection peaks are correlated and whether detecting and quarantining the symptomatic population is sufficient enough to contain the disease spread or the ‘hidden’ carriers without any symptoms make the disease transmission difficult to contain (Fig. 4A–4B). Furthermore, we briefly demonstrate how the effective reproduction number \(R_{e}\) computed from the current model behaves under the effect of lockdown/containment measures (Fig. 5). Next, we explore the symptomatic and asymptomatic infection growth curves for a few Indian states using the \(SEI_{s}I_{a}R_{s}R_{a}D\) model (Fig. 6). Lastly, we conclude by examining the COVID-19 data for cumulative infections and deaths in Indian context in the light of Benford’s law [24,25,26,27] projections (Fig. 7).

Fig. 3
figure 3

Time evolution of the population of symptomatically infected (solid curve), asymptomatically infected (dotted curve) and dead (dashed curve) in India. The initial susceptible population is varied within a range 100–300 million. The color shades encasing the curves indicate the variation in the initial susceptible population \(S_{0}\). (inset) Time evolution of the cumulative population of symptomatic and asymptomatic infections. The cumulative population of symptomatically infected is given by \(I_{s} + R_{s} + D\). Similarly, cumulative asymptomatic population is given by \(I_{a} + R_{a}\). The real data (plotted with points) are considered from March 2, 2020, up to September 1, 2020 (color figure online)

3.1 Asymptomatic population and the recent surge of infections: a worrisome affair for India

We first investigated how the infection growth curves would look like in India’s context if we consider asymptomatic infections. We start with an effective reproduction number \(R_{e} \sim 4\) at the onset of the simulation (Fig. 3). As the lockdown is enforced during the initial stages of the outbreak, \(R_{e}\) reduces to value \(\sim\) 1.32. The initial susceptible population is varied within a range of 100–300 million. During the lockdown period, the interaction parameters are chosen as \(\xi _{s} \sim 0.27\) and \(\xi _{a} \sim 0.36\) for best fit. \(\xi _{s} \sim 0.27\) and \(\xi _{a} \sim 0.36\) imply that the interaction between the susceptible and the symptomatically infected population is reduced to \(\sim 27 \%\) during the lockdown; similarly, the interaction between the susceptible and the asymptomatically infected population is reduced to \(\sim 36 \%\). The model analysis shows that the total number of asymptomatic infections is several-fold higher than the symptomatic infections (which is of course subjected to the choice of fitting parameters). In the following section, we discuss the relative fraction of symptomatic and asymptomatic population from the model perspective.

As expected, with an increase in the initial susceptible population \(S_{0}\), the infection peaks also rise to greater heights. The symptomatic and asymptomatic infection growth curves appear to reach the respective peaks in the middle of November 2020 (Fig. 3). With the increase in \(S_{0}\), the peaks shift to a later time point (Fig. 3). Note that, both the infection peaks may occur at the same time point as depicted in Fig. 3. The cumulative infection curves (inset, Fig. 3) show a flattening signature from the end of December.

The question that intrigues next is: How robustly do the model predictions zero in on the relative numbers of symptomatic and asymptomatic infections?

Fig. 4
figure 4

Additional parameter dependence of symptomatic and asymptomatic infection peaks. (A) Symptomatic and asymptomatic infection peak heights (and tentative time window around which the peaks occur) depend upon the choices of probability \(\phi\) and rate parameter \(\lambda\). (B) Non-unique choices of symptomatic and asymptomatic interaction parameters \(\xi _{s}\) and \(\xi _a\) may yield identical infection curves (not shown) and identical peak heights at the same time. The peak heights of the infection curves are obtained from the best fit with the real data for different values of \(\xi _{s}\) and \(\xi _{a}\), keeping other parameters fixed at base values. The last set of bars shaded in grey indicates that the infection peaks would be much lower than the projected trend if the asymptomatic patients are detected and quarantined (manifested through the lower value of \(\xi _{a}\)). For all cases, the initial susceptible population \(S_{0}\) is chosen to be about 200 million. The real data for fitting are considered from March 2

3.1.1 Sensitivity of the model projections on the relative sizes of symptomatic and asymptomatic populations

From various public health bulletins and media reports, we gather that the number of asymptomatic infections is possibly much higher than that of the symptomatic. Since the asymptomatic individuals do not develop symptoms (or have mild symptoms), they are harder to detect. In the Indian context, even though the testing capacity has been significantly increased during the past few months, the testing strategy is largely concentrated on individuals showing typical symptoms. To that effect, attempting to find asymptomatic cases by contact tracing and random testing raises the possibility that many of the asymptomatic carriers will go undetected. Hence, there exists a significant paucity of data about the exact number of asymptomatic cases compared to the symptomatic ones. This uncertainty leaves us with a finite window of variability in the parameter space concerning the model analysis.

If we inspect Eqs. (1) to (4) and Fig. 2 closely, we find that the exposed population is ‘fed’ into the compartments of the symptomatic and asymptomatic population with probability rates \(\phi \lambda\) and \((1-\phi )\lambda\), respectively. The choices for the numerical value of \(\phi\) (\(\phi \in [0,1]\)) regulates the relative size of the symptomatic and asymptomatic populations (since \(\gamma _{s} \sim \gamma _{a}\), the recovery time for symptomatic and asymptomatic populations are similar). If \(\phi \sim 0.5\), the peaks for the symptomatic and asymptomatic infections are likely to have the same height, as per the default model construction (Fig. 4A). The fact that the number of asymptomatic infections likely to be several-fold higher than the symptomatic infections [16] prompted us to make the current choice of the numerical value of \(\phi\). The greater size of the asymptomatic population, as suggested in various reports, alludes that it is reasonable to restrict \(\phi\) in the range 0.2–0.4. The best fit with real data shows that, as \(\phi\) increases, the peak of the symptomatic infections consistently climbs up to a certain value determined by the current trend (Fig. 4A). In Indian context, we have chosen \(\phi = 0.25\), unless otherwise mentioned (Tables 12). Note that in the default model construction, what fraction of the net outflux from the ‘exposed’ compartment (E), would go into the symptomatic (\(I_{s}\)) and asymptomatic (\(I_{a}\)) compartments depends on the factors \(\phi\) and \(1-\phi ,\) respectively. Therefore, when \(\phi = 0.25\), we expect that the peak of the asymptomatic growth would be \(\sim\) threefold higher than that of symptomatic growth. This is evident from Figs. 34A.

The next question is whether one can attribute the recent surge in the number of symptomatic infections to the largely undetected asymptomatic population.

3.1.2 The recent surge in symptomatic infections: Does the onus fall on the asymptomatic carriers?

To address this question from the current model perspective, we visit Eqs. (1) and (2), and Fig. 2. Imagine that all the active cases with symptoms are detected and quarantined. Then, the interaction among the susceptible and the symptomatic population becomes almost null as \(\bar{\beta _{s}}\) effectively falls close to zero. But, even in that scenario, the ‘exposed’ (E) compartment can still be ‘fed’ through the interaction between susceptible and the asymptomatic individuals at a rate \(\bar{\beta _{a}}\). This exposed population, in turn, fluxes into symptomatic and asymptomatic compartments with probability rates \(\phi \lambda\) and \((1-\phi )\lambda\). Thus, it is clear that merely quarantining the symptomatic alone cannot prevent the surge in both symptomatic and asymptomatic infections. The mixing of asymptomatic individuals with the susceptible may lead to new infections which are both symptomatic and asymptomatic. In the Indian context, to date, the testing capacity is mostly dedicated to the detection of symptomatic candidates. Once a positive case with symptoms is detected, the infected person is isolated and quarantined along with others who are traced to come in contact with the infected person. But the cases that show no symptoms are largely undetected. In most of the asymptomatic cases, the person her/himself is unaware of the fact that s/he is transmitting the disease. Thus, even during the lockdown, the mixing of an asymptomatic person with the susceptible cannot be fully filtered and prevented. From the model perspective, this may be reflected in the numerical values of symptomatic and asymptomatic interaction parameters \(\xi _{s}\) and \(\xi _{a}\), during the lockdown. Extensive detection and quarantining of the cases with symptoms may reduce \(\xi _{s}\) significantly. However, since the asymptomatic population remains largely untapped, \(\xi _{a}\) may not decrease that much compared to \(\xi _{s}\) during the lockdown. In brief, the infection curves will continue to surge if only the symptomatic population is put into quarantine (reflected in significantly suppressed \(\xi _{s}\)), but asymptomatic individuals roam freely (gleaned from the relatively marginal reduction in \(\xi _{a}\)) during the lockdown. Interestingly, we notice that non-unique choices of \(\xi _{s}\) and \(\xi _{a}\) (while other relevant parameters are kept fixed at base values) may lead to identical best fits with the real data. If \(\xi _{s}\) is decreased and simultaneously \(\xi _{a}\) is increased, the combinatorial effect of these two interaction parameters results in identical best-fit curves with precisely the same infection peak height and peak location (Fig. 4B). From a physical perspective, we can argue that even if all the symptomatic carriers are detected and quarantined, it is not sufficient to suppress the infection growth. A reasonably higher value of \(\xi _{a}\) (asymptomatic carriers mixing with susceptible) can compensate for the isolated symptomatic population (lower \(\xi _{s}\), symptomatic infections barred to mix with susceptible) and single-handedly keep on fuelling the infection growth at a significant rate (see the first four set of bars, Fig. 4B). It is also evident that if \(\xi _{a}\) is significantly reduced by extensive and/or randomized testing and quarantining patients without symptoms, the infections may be contained at a much lower number (see the last set of bars shaded in grey, Fig. 4B).

During the initial stages of the pandemic (April–May, 2020), once the lockdown was enforced, India had observed a surge of reverse migration of migrant workers from one part of the country to another. Preliminary thermal screening at the destination of their journey, e.g., at rail/bus stations, could diagnose symptomatic patients only. However, a large number of these people were possibly asymptomatically infected. Due to a lack of social distancing during their plighted journey, the chance of infection propagation increased manifold. The rapid increase in infections visible in Fig. 1 (bars enclosed in dashed box) might be linked with the reverse migration. Several recent studies highlight the effect of human mobility on the COVID-19 progression in detail [28,29,30].

Fig. 5
figure 5

Best fit of the symptomatic infection and death curves along with the projected asymptomatic infection curve when the prelockdown value of the effective reproduction number \(R_{e}\) is tuned within a range 3.8–4.2. The color shades encasing the curves indicate the variation in the effective reproduction number \(R_{e}\). The initial susceptible population is chosen to be about 200 million. The real data (represented as points) are plotted from March 2 (color figure online)

Fig. 6
figure 6

Best fit of the symptomatic infection and death curves along with the projected asymptomatic infection curves for Indian states Maharashtra (A), Delhi (B), West Bengal (C) and Tamil Nadu (D)

Next, we discuss how the effective reproduction number \(R_{e}\) computed from the current \(SEI_{s}I_{a}R_{s}R_{a}D\) model evolves during the lockdown.

3.1.3 How does the lockdown affect the effective reproduction number \(R_{e}\)?

During the lockdown, the interaction parameters \(\xi _{s}\) and \(\xi _{a}\) are suppressed because the enforcement of containment measures and social distancing reduces the mixing of infected and susceptible. It is evident from Eq. (14) that, as \(\xi _{s}\) and \(\xi _{a}\) decrease, the prelockdown \(R_{e}\) should also plummet down to a lower value indicating a relatively slower speed of infection.

In Fig. 5, we observe that the prelockdown \(R_{e}\) decreases significantly during the lockdown. Also, in consonance with the definition of \(R_{e}\) [1], we find that the higher the value of \(R_{e}\), the greater the number of infections as gleaned from the peak heights (Fig. 5). However, in order to convincingly flatten the infection curve, \(R_{e}\) needs to be taken down to a value below 1. This is a challenge that we have to tackle with utmost priority and sincere policy-making.

Next, we explore the COVID-19 progression in a few Indian states and the influence of the ‘hidden’ asymptomatic population on the infection growth using this \(SEI_{s}I_{a}R_{s}R_{a}D\) model.

3.1.4 Asymptomatic carriers and COVID-19 propagation in few Indian States

What we observe from the best fit of the infection curves for Indian states Maharashtra, Delhi, West Bengal and Tamil Nadu is that in most of the states (except Maharashtra), the infections curves have already attained the respective peaks 6A–6D. For Maharashtra, the active infection continues to surge. The infection peak is expected toward the end of October/beginning of November 2020, if the current trend (mapped from the real data) continues (Fig. 6A). In the case of Delhi, toward the end of July 2020, the number of active infections was significantly reduced due to strict intervening measures (Fig. 6B). However, with the gradual ‘unlock’ and resuming the social and economic activities, interventions effectively relaxed over time. To that effect, the active infections have started to increase again, lately (Fig. 6B). To account for the resurgence as visible from the real data into the theoretical infection curve, we resorted to the following: (a) after the lockdown is enforced, \({\bar{\beta }}_{s}\) and \({\bar{\beta }}_{a}\) follow the exponential decline as noted in Eqs. (9) to (11); (b) \({\bar{\beta }}_{s}\) and \({\bar{\beta }}_{a}\) are governed by Eqs. (9) to (11), till \(\sim\) 1 month after the peak infection is attained. The \(\sim\) 1-month time window is set by the fact that after attaining the peak, the number of active infections was still on the decline. However, 5–6 weeks after the peak infection, active infections started to rise again gradually. To match the trend evident from the real data, we relaxed \({\bar{\beta }}_{s}\) and \({\bar{\beta }}_{a}\) in the model, mimicking the effective lifting of intervening measures during the ‘unlock’ phases. Once the active infections began to resurge, \({\bar{\beta }}_{s}\) and \({\bar{\beta }}_{a}\) were exponentially increased as opposed to the pattern set by Eqs. (9) to (11). Upon these considerations, we observe that the theoretical curve is in reasonable agreement with the available real data (Fig. 6B, inset).

Notably, another consistent feature is that the predominantly ‘hidden’ asymptomatic infections may be several-fold higher than the symptomatic infections in all the cases. This, of course, in the model, depends on the fraction determined by \(\phi\) as discussed previously (Fig. 4A, 2).

4 Summary and discussion

The spread of COVID-19 via presymptomatic and asymptomatic cases has been a big concern in recent times as the mobility of people is increasing, while lockdown is being relaxed across the country. Since presymptomatic and asymptomatic persons, having no influenza-like symptoms, are not aware of their potential of infecting others, a relaxing lockdown gives ample opportunity to expose a contained population to the virus that effectively increases the number of susceptible. Therefore, the projected trajectories of infections (particularly the infection peaks) largely depend upon the number of susceptible (\(S_{0}\)) in the model. The choice of the initial size of the susceptible population \(S_{0}\) is a tricky business. Theoretically, in the absence of immunity (acquired or otherwise), an entire population may become susceptible to the COVID-19 outspread. However, in reality, all people will rarely be susceptible to the disease no matter how rapidly it spreads. Various factors like geography, socioeconomic characteristics and the demographic landscape can significantly regulate \(S_{0}\). The estimation of a ‘ballpark number’ for \(S_{0}\) is crucial to initiate the modeling analysis. To estimate a plausible value of \(S_{0}\), first, we look at the infection curves for the countries where the cumulative positive cases have already moved past the peak infection and attained a plateau. Dividing the number of infections at the plateau by the total population of the country under consideration yields a fraction that reflects the average percentage of the population infected by the virus. This fraction turns out to be \(\sim 10^{-3}\) for large countries like Germany, USA, Spain, Italy [1]. Thus, an estimate of susceptible can be computed by multiplying this fraction with the total population of a country. For India, the total population is in the order of \(\sim 10^{9}\). Hence, a rough ‘ballpark’ number for the initial susceptible population \(S_{0}\) would be in the range of \(10^{9} \times 10^{-3} \sim 10^{6}\), i.e., in the order of millions. This estimation for \(S_{0}\) gives us a lower bound. The reason is, in a hugely populated and large country like India, even if the testing capacity is increased, due to limited resources, the testing strategy probes individuals having symptoms in the first place. Thus, many mildly symptomatic and asymptomatic cases remain undetected. Therefore, the numbers gauged from the cumulative infection plateau allude to an underestimation up to a certain degree. An alternative approach finding an approximate \(S_{0}\) would be to compute the ratio between the total number of positive cases (i.e., the number of cumulative infections) and overall tests carried out. The ratio essentially is the test positivity rate. Multiplying this ratio with the total population of a country gives another ‘ballpark’ estimation of \(S_{0}\) or approximately the order of magnitude for \(S_{0}\) [1]. For example, in the case of India, the estimation from test positivity rate yields an order of magnitude \(\sim 10^{8}\) for \(S_{0}\). Normally, in these kinds of compartmentalized epidemiological models, \(S_{0} \sim N\). To attain the best fit with the reported data, we varied \(S_{0}\) within a range gleaned from the above estimations in a context-dependent manner for India and a few Indian states.

In the context of COVID-19 infection propagation, a few interesting aspects to ponder over are: (a) whether there is any bias in people getting infected from asymptomatic patients to be asymptomatic and (b) whether the rates of infection from symptomatic and asymptomatic patients are the same. About (a), it may seem intuitive that a person getting infected from an asymptomatic would turn out to be asymptomatic as well. But, in reality, the immune response of an infected individual can be very different. Presently, to our knowledge, there are no conclusive data to assess this point. About (b), we discuss a few key features of infection propagation through asymptomatic and symptomatic carriers in the following. An asymptomatic individual is likely to carry less viral loads compared to a symptomatic individual. The droplet transmission (via coughing, sneezing) is a dominant mode of infection propagation. Thus, the chance of infection via an asymptomatic individual is less likely than a symptomatically infected person as an asymptomatic carrier hardly develops conspicuous symptoms like coughing, sneezing, etc. However, asymptomatic individuals lead ‘normal’ social life of human interactions without being aware that they are ‘silent’ carriers, whereas symptomatic individuals are more likely to be under restriction mixing with people. Therefore, within a population, chances of disease spread via asymptomatic carriers may be higher.

In the model analysis, for simplicity, we have chosen the values of \(S \rightarrow E\) transition rates \({\bar{\beta }}_{s}\) and \({\bar{\beta }}_{a}\) to be the same (Tables 12). But what fraction of E emerges out as symptomatic or asymptomatic is regulated by \(\phi\). Note that, in the model construction, there is no bias in people getting infected from a/symptomatic patients to be a/symptomatic. In the ‘exposed’ compartment E, the population is a mixed due to (a) \(S-I_{s}\) interaction and (b) \(S-I_{a}\) interaction. But no information about the mode of infection via (a) and (b) is book-kept in the total exposed population E. Therefore, individually, neither (a) nor (b) can influence the division in the outflux from E into \(I_{s}\) and \(I_{a}\) (Fig. 1, Eqs. (1) to (4)).

We have varied \(S_{0}\) for India within a range 100–300 million (Table 1, Fig. 2). Note here that this estimation is a simplistic approximation. After all, it is difficult to comment on the percentage of the population infected by the virus until the pandemic is over. Further details regarding the estimation of \(S_{0}\) is described in our earlier study ([1]).

In brief, as per the current trend of the symptomatic cases, India is expected to see a peak in the active infection (symptomatic) in November 2020 (Fig. 3), tentatively. The number of active infections (symptomatic) at the peak may lie somewhere between 1.5 and 2.5 million. The total infection (cumulative) would then reach around 10–20 million (mid-November) as shown in Fig. 3, inset. Note that the COVID-19 pandemic is an ongoing phenomenon with its time evolution depending upon various local and global factors. Therefore, quantitative predictions from a simplistic model investigated in this article are expected to change as the pandemic progresses.

Fig. 7
figure 7

Data for cumulative infection and death due to COVID-19 pandemic in few Indian states and India as a whole in the light of Benford’s law (BL) projections. (A–B) The probability distributions of first significant digits n (\(n \in [1, 2,\ldots ,9]\)) appearing in the data chart of cumulative infection (A) and death (B) are plotted as histograms. The probability of n, according to BL is given by \(P(n) = log_{10} (1 + \frac{1}{n})\). The overall trend as observed from the reported data for India reasonably follows the projections from BL

The quality of the COVID-19 data (e.g., numbers representing cumulative infection, death, etc.) for various countries, provinces and local regions made available in the public domain is a crucial determinant for the study of the pandemic progression. A frequently used tool for testing the reliability of an empirical data set is Benford’s law (also known as Newcomb–Benford law) [24,25,26, 31]. The phenomenological Benford’s law (BL) states that in diverse sets of ‘real-world’/‘naturally occurring’ numerical data, the first significant digit n appears with probability \(P(n) = log_{10} (1 + \frac{1}{n})\) [24,25,26, 31]. In this study, we compared the data of cumulative infections and deaths for India and few Indian provinces (e.g., Delhi, Maharashtra, Tamil Nadu, West Bengal) in the light of BL and tested the compliance (Fig. 7A–7B). It is important to note that while many numerical data sets comply with BL, several collections of numbers do not. Therefore, only checking the compliance (or deviation) of a distribution of the first significant digit with P(n) is not a sufficient condition to ascertain the reliability of a data set. Furthermore, it is also discussed that in case of a certain number of individual distributions that do not have significant digit frequencies close to those determined by BL, random sampling from those distributions may result in a combined collection of numbers with significant digit frequencies obeying BL [26]. Recent studies on the quality of COVID-19 data suggest that better compliance with BL occurs when the cases are steadily on the rise during the infection progression in a region [27, 32]. At later stages, when the implementation of containment/intervening measures flattens the infection curves, the data sets are unlikely to follow BL.

In this current study, we have included the gradual relaxation of lockdown measures (‘unlock 1.0’ plan starting from June 1, laid out by Govt. of India and at present it is unlock 4.0), only in the case of Delhi, where the active infection data shows a resurgence after the attainment of the peak. In the Indian context as a whole, as the ‘unlock’ phases are initiated, we expect a considerable surge in the active cases in the upcoming days. We assessed the COVID-19 progression in the presence of ‘one-time enforcement of containment measures’ where the lockdown and other social distancing norms remain in place for an indefinite time. However, a more realistic reconstruction of the pandemic situation would be to impose ‘intermittent lockdown’ in specific regions where the containment and other social distancing measures are enforced once the number of active infections crosses a threshold determined by the capacity of the regional healthcare system. Afterward, the lockdown measures are relaxed/lifted as the active infections fall below a certain threshold (‘unlocking’ the lockdown). This shuttling between the phases of ‘lockdown’ and ‘unlock’ continues until the contagion comes totally under ‘control’ or the threat of an ‘out-of-bound’ infection is eliminated. The nature of intermittent intervention depends on various controlling factors like acquiring ‘herd immunity,’ availability of proper vaccines, the capacity of public health facilities where all the patients can be accommodated and treated, etc. A previous study ([16]) in the context of the USA, has discussed these aspects. In the Indian context, it would be interesting to explore further along this avenue as a future venture.

Like the SIRD model used in our previous study [1], this \(SEI_{s}I_{a}R_{s}R_{a}D\) model does not accommodate any spatial information about the infection spread. In other words, there is no spatial degree of freedom in the governing equations of \(SEI_{s}I_{a}R_{s}R_{a}D\) model. The spatial dependence of the disease progression can be executed via simulating this model on a lattice or connected network. The network may feature the following attributes: (a) every node on the network virtually carries the territorial/regional information collected from the map of India; (b) the network connectivity bears the details of human mobility, transmission spread from one region to another. The network nodes also can account for the topology/geography of India in a coarse-grained manner. For example, the nodes representing Himalayan highlands would likely to be attributed to lesser human mobility and transmission spread compared to the nodes representing Indian metro cities. This proposed model would enable us to study how the infection spreads from a few initial local pockets to a larger region, thereby rendering a broader picture of the dynamics of the pandemic. A previous study investigating the COVID-19 progression in Italy describes a novel approach using a spatially explicit SEIR model [18]. Future investigation with similar spatial/topological detailing in India’s context would be useful to delve into. Also, note that the current \(SEI_{s}I_{a}R_{s}R_{a}D\) model is a generalized model. It does not contain any specific biological/clinical features of the COVID-19 disease. In this model, COVID-19 enters through the rates and governing parameters in Eqs. (1) to (12) that are tuned to obtain the best fit of the theoretical curves with the available COVID-19 data.

In the currently explored model parameter regime, we observed that the number of asymptomatic cases is several-fold higher than the symptomatic cases. In reality, even though the number of tests has been increased significantly, individuals showing symptoms are given priority to undergo tests. In a hugely populated country like India, this leaves a large fraction of the asymptomatic population untraceable. Therefore, to determine the exact ratio of symptomatic to asymptomatic population, in reality, we need extensive data curation across the country and further research in that direction. The infectiousness of an asymptomatic individual may be similar to a person having symptoms. The asymptomatically infectious population, mixing freely with the healthy susceptible population, keeps spreading the infection causing a surge in the number of symptomatic as well as asymptomatic cases. To reduce the infection, it is important to trace the source of infection and isolate it from the healthy susceptible population. Currently, there are merely any data available to validate our prediction of the asymptomatically infected population. To assess the community spreading and prevalence of asymptomatic cases, India conducts a serology-based (antibody test) survey in select districts. The test aims to find the presence of a specific antibody developed by the immune system of the infected person in response to the viral infection. If the data from the survey are made available in the public domain, the model prediction can be assessed. Irrespective of the testing policy, maintaining social distancing in tandem with prolonged or intermittent containment measures would be crucial. Besides, it is also necessary to use cloth face coverings or mask across the population.

To conclude, ‘indefinite lockdown’ is not a solution to put an end to the COVID-19 outspread. The broader purpose, the lockdown in India served, is the opportunity to buy ‘time.’ In a nutshell, the lockdown slows down the infection and the lockdown time window provides us the chance to ramp up the healthcare facilities, testing capacity, etc. so that when the lockdown is lifted, the COVID-19 does not catch us off guard. During the gradual ‘unlock’ phases, the effect of lockdown can still prevail if social distancing, sanitation protocols and proper mask wearing in public places are adopted by every individual as the ‘new normal’ of lifestyle.

Table 1 List of parameters chosen for the best fit with the real data in Indian context
Table 2 List of parameters chosen for the best fit with the real data in few states within India