1 Introduction

Coronavirus disease 2019 (COVID-19) is an infectious disease caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2 virus) belonging to the family of coronaviridae. It was first reported in Wuhan, China in December 2019 [104], and continues to wreak havoc globally with 47.9+ million cases detected and 1.2+ million fatalities reported globally at the time of this writing. The number of new confirmed cases went on increasing from February to November as illustrated in Fig. 1. In comparison with the SARS pandemic of 2003, which infected 8422 people with 916 fatalities [23], this pandemic has turned out to be highly infectious and has a far greater spread. Age and co-morbidity are the greatest risk factors associated with COVID-19 [81]. According to some reports in the UK, about 20% of the patients who were already being treated for another illness also tested positive for COVID-19 [34]. The disease is commonly characterized by fever, dry cough and fatigue [80]. Symptoms of critical stages of the infection include bluish lips/face, hypoxia, coughing up blood and acute respiratory distress syndrome (ARDS) [73].

Fig. 1
figure 1

Global Epidemic Curve, February to November 2020 (adapted from [103])

COVID-19 is believed to have arisen through zoonotic transmission [2], i.e., animal-to-human transmission, which is usually slow. However, the subsequent human-to-human spread, which occurs either through community transmission or nosocomial transmission, is comparatively much faster. Nosocomial infections, also known as hospital-acquired infections, have always posed a challenge to the biomedical community [42, 43], with COVID-19 being no exception. Different categories of transmissions and their inter-relation are depicted in Fig. 2. Scientists all over the world are currently focusing on preventive and treatment therapies for COVID-19. Hundreds of clinical trials are ongoing globally testing various drugs and vaccines, but so far only select vaccines have been approved for clinical use, which leaves us with “stay home, stay safe” as the most effective strategy to deal with the ongoing pandemic, at least until vaccines developed reach a majority of the population.

A great deal of progress is being made in COVID-19 diagnosis. Two types of diagnostic testing have been developed—molecular and serological [46]. For the molecular tests, a sample of mucus and saliva is taken using a nasopharyngeal swab and is tested for the virus presence using Reverse Transcription- Polymerase Chain reaction (RT-PCR) [52]. The time taken to get the results varies from a few minutes to a few days, depending on the assay kit developed. Serological tests detect the presence of specific antibodies in the blood [56], which is usually collected through a finger prick. These tests are usually faster than the molecular tests. Although the diagnostic tests have been proven effective, the major hurdle we currently face is unavailability of sufficient testing kits. With the exception of a few nations, the healthcare systems of most countries are under-prepared to face a pandemic of this magnitude. In particular, many countries have struggled with low testing rates [90], leaving a large number of people untested. Thus, many governments have adopted a strategy (outlined in Fig. 3) to prioritize testing for high-risk groups (defined by the WHO as those older than 60 years or who have health conditions like lung or heart disease, diabetes or conditions that affect their immune system). Improper adherence to quarantine has been seen in many cases, which burdens the testing system with more candidates to trace and test. In the current scenario, it is of utmost importance to ‘flatten the curve’ to ensure the number of cases remain within the capacity of the healthcare systems to handle.

Fig. 2
figure 2

Different categories of transmissions

ML and image processing techniques can offer assistance in this scenario by remote monitoring of vital signs of infected as well as suspected individuals. This work discusses the application of Artificial Intelligence (AI) to perform contactless screening and monitoring for the various symptoms of COVID-19. We demonstrate the potential of machine learning and image/signal processing techniques many of which can be deployed using simple cameras, without the need of a specialized equipment, for monitoring of several vital signs such as heart and respiratory rate, cough, blood pressure, and oxygen saturation. This approach would make the patients much more aware of their vital signs contributing to increased overall quality of life. Although these techniques are not fully mature and have certain limitations, this work aims to unearth the abundant potential of ML-enabled remote monitoring, in turn calling for a greater effort in this direction by researchers. The main contributions of the paper are as follows:

  • Clearly demonstrate how ML-based remote monitoring can help augment the doctor’s expertise for focused and timely care to the patients.

  • Comparing the conventional methods used for remote monitoring and illustrating how ML-based monitoring can be a viable approach in the context of COVID-19 pandemic.

  • Draw a perspective on how effective prioritization based on vital signs can help the hospital and doctors to reduce the burden on the health infrastructure.

  • Discussion of major challenges in monitoring the vital signs and listing the direction of future work in this direction.

Fig. 3
figure 3

A flowchart detailing COVID-19 testing policy based on symptoms and known exposure. Testing policy is laid down at the local level by the respective health and governing bodies. Although this figure is not a universally followed procedure, it is representative of a typical policy that might be expected to be followed

2 Background

Monitoring the vital signs is an important aspect of patient care. Measuring these is very important because these signs usually give firsthand information about abnormal physiology. Even during hospitalization, they enable patient-care providers to monitor the patient’s prognosis and track recovery as well as any adverse conditions. Traditionally, four signs are monitored—temperature, blood pressure, respiratory rate and pulse rate [17]. Often, oxygen saturation is also included as a vital sign. Although these vary according to several factors including one’s age, gender, weight, time of the day etc., for an average adult, the following values are considered to be normal:

  • Temperature: 97–99 \(^{\circ }\)F

  • Blood pressure: 120/80 mm of Hg

  • Respiratory rate: 12–20 breaths per minute at rest

  • Pulse rate: 60–100 beats per minute at rest

  • Oxygen saturation: 95–100%

Vital signs are the first to ring the bell in several disease conditions including COVID-19. Studies currently estimate a period of 1–14 days for incubation of SARS-CoV-2 in the human body [54]. Only one out of a hundred infected patients develop symptoms after 14 days. Upon incubation, disruption of specific vital signs is noted. One of the first symptoms observed is increased fever, technically defined as a body temperature greater than 100.4 \(^{\circ }\)F. Since COVID-19 is associated with inflammation of lungs [61], the oxygen uptake capacity is attenuated. This leads to decreased oxygen saturation in the body [81], which in turn results in decreased oxygen supply to the body cells. To meet the oxygen demands of the body cells, the heart starts pumping blood at an increased pace which is observed as an increase in heart rate. At the same time, to restore the normal oxygen levels, the body responds by increasing the number of breaths per minute leading to increased respiratory rate. In this way, vital signs of the body are disrupted upon SARS-CoV-2 infection.

Currently, most healthcare systems are not sufficiently equipped to carry out large scale diagnostic tests. Thus, the criteria to prioritize testing can be based on observation of vital signs. However, hospital visits for vital sign monitoring can lead to increased nosocomial transmission. It is particularly important to keep the frontline healthcare workers safe. Furthermore, hospital visits by an uninfected person would itself increase the probability of contracting infection. It is necessary to control the chain of transmissions. Thus, home isolation is being advised for quarantined individuals as well as those infected with COVID-19 but at low or medium risk. Therefore, the COVID-19 pandemic calls for an extraordinary need for enhancing the technology for remote vital signs monitoring. The wide spread use and availability of smartphones makes camera-based solutions immediately available at no extra cost. Hence, this work is focused on discussing ML-based solutions for remote monitoring of vital signs which can be run or deployed using simple cameras available on a smartphone or laptop.

3 The potential of ML for contactless COVID-19 screening

Artificial intelligence has come a long way from being effective in limited applications [14, 47, 48], to having a transformative impact across numerous domains [1, 20, 26, 28, 38, 49, 53, 57, 58]. Artificial Intelligence (AI) has been revolutionizing multiple fields like Drones [24],VANETs [36], and IoT Security [35]. Particularly, its utility in addressing the challenges in healthcare services is being vigorously pursued by both the industry and government agencies. AI and ML have great potential in disease prediction using healthcare data [96]. Efforts have also been made on surveying the role of AI and other technologies in the context of combating COVID-19 [21, 22]. Explainable AI for monitoring COVID-19 like pandemics have also been proposed [40]. Work has been done for efficient management of epidemics using B5G and deep learning [76].

The use of AI/ML in remote monitoring of collective vital signs has scope for further detailed study. The ground-breaking performance improvements by deep learning techniques for vision-based analysis [11, 13, 16, 75, 79, 86, 105, 106] have opened possibilities for the development of non-intrusive or contact-less diagnosis system. Such systems can be highly useful for early screening and monitoring of easily transmitted diseases such as COVID-19. Block-chain based approaches have also been proposed for edge based healthcare monitoring [7]. Use of deep learning based image analysis has shown tremendous impact in classic medical problems like brain tumor classification [66] and endoscopy [67]. The approaches which use video as an input, multiple operations are involved for vital signs extraction which are shown as a basic framework given in Fig. 4.

Fig. 4
figure 4

A basic framework for vital signs extraction using video as input

The COVID-19 pandemic is overwhelming the healthcare facilities in the affected countries. The many unknown factors about COVID-19 are making it dangerous for the healthcare workers involved in treating the contracted patients. For example, a complete list of symptoms of COVID-19 are yet to be known, the comorbidities increase disease uncertainty, and finally, the asymptomatic persons further put the nearby people vulnerable to contract the disease. To ensure the safety of the valuable healthcare professionals, contactless screening for COVID-19 offers promising alternatives.

Contactless screening may involve visual inspection through video cameras, X-ray, MRI imaging, physical check-ups, remote heart-rate monitoring, cough analysis, respiratory rate monitoring, oxygen saturation monitoring and blood pressure measurement. Recently, there have been efforts on using humanoids and affective systems for remote healthcare [93]. ML-based algorithms can be used to perform many of these tasks through remote analysis, although certain tasks may require the use of specialized equipment.

Especially in this pandemic scenario, remote monitoring is of immense augmentation to the doctor’s expertise. Vital signs can be monitored continuously using contactless techniques which enables effective prioritization of patients for doctor visits. The doctors can devise the order depending on the current resources available and hence special care and supervision can be given to patients who need it the most. In a pandemic, there are many instances in which the health of a person deteriorates rapidly and the person is not aware of it till it gets worse. Here, continuous monitoring can be a game-changer helping save lives by providing an accurate feedback on the changing vital signs, so that medical help can be sought well in advance. The following sections will discuss some of these tasks which are particularly relevant to the COVID-19 context and can be achieved with AI and image/signal processing techniques, without the need of specialized equipment.

4 Remote heart rate monitoring

Human heart is a mechanical pump which pumps blood throughout our body. There are minor color changes and subtle motion variations at each heart beat. The color changes as blood flows in and out of the blood vessels below the skin, while the motion based effects are caused due to opposite reaction of the blood flow as per the Newton’s 3rd Law of motion [9]. Vision based heart rate monitoring is done mainly by using remote Photoplethysmography (rPPG) signals extracted from the color changes and/or the subtle motions. The reflection that remains from the absorption and scattering in skin tissue and which varies as blood volume changes is known as diffused reflection. On the other hand, the pure light reflection from the skin is known as specular reflection. For analyzing the micro color variations of skin, the variance of red, green, and blue light reflection changes from the skin are measured, contrasting between diffused and specular reflection. Changes in light absorption due to the skin is analyzed and the heart rate is estimated. This technique has far reaching use cases for human computer interaction and health monitoring. However, relative motion and change in lighting has direct effect on the performance of this technique.

4.1 Conventional workflow for heart rate measurement

Most heart-rate measurements at hospitals are through contact-based testing methods, requiring patches to be stuck to skin or fingers clipped to a sensor. Contact-based workflow may not be ideal for COVID-19 monitoring due to the highly contagious nature of the disease. Heart rate can be extracted from two information sources: periodic color changes due to blood flow [72], and ballistic forces generated by the heart [9]. Due to expensive equipment required and electrocardiography(ECG) being easy and cost-effective, the use of ballistocardiography was reduced. However, it is gaining traction again in recent years due to its contactless nature [32]. Currently, heart rate is measured using ECG and photoplethysmograph (PPG) devices in hospitals. For extracting the ECG signals, electrodes are placed over certain positions on the body such as arms, chest and legs. Cardiac activity periodically changes the blood volume in the micro-vascular tissues. Thus, peripheral body tissues (like palm or finger print) are used to calculate the blood volume pulse (BVP). PPG works on the principle of illuminating the skin with a light-emitting diode (LED) and then measuring the amount of light reflected or transmitted to a photo-diode, as depicted in the Fig. 5. As the blood volume is a direct function of the light absorbed, BVP can be directly measured [98].

Fig. 5
figure 5

An illustration of the Photoplethysmography (PPG) technique, which uses the contrast between emitted and reflected light, indicative of blood volume to estimate heart rate. The variance of red, green, and blue light reflection changes from the skin are measured using Remote Photoplethysmography (rPPG), as the contrast between specular and diffused reflection, in a contactless manner. Image Attribution : Marcus.vollmer / CC BY-SA (Color figure online)

4.2 ML-enabled workflow for remote heart rate monitoring

ML-based heart rate monitoring approaches utilize either of the two information sources: pulse induced micro-motions and micro color variations due to blood flow under the skin. Balakrishnan et al. [9] proposed that there are subtle head oscillations during a cardiac cycle caused by blood flow in the head arteries which can be used to extract heart rate from videos. This can be especially useful in COVID-19 scenario, as the head motion cues can be taken even when the face is covered using a mask. For capturing the micro color variations, an experimental protocol was introduced by Fukunishi et al. [30] to predict heart rate variability based on a skin optics model. The skin was modeled in two parts—epidermis and dermis. Then, Lambert Beer Law was used to model the internal reflection and the movement was compensated using the LEAR (Local Evidence Aggregation for Regression) feature detector [59]. Using these feature points, the final region of interest (ROI) was determined. After calculation of haemoglobin component in the ROIs of each frame, and the heart rate is subsequently extracted from the waveform.

Another study was done by Alghoul et al. [4] in which independent component analysis (ICA) was applied on the color channels of the video recordings to get the PPG signals. On the other hand, Eulerian Video Magnification (EVM) was discovered to have better performance when motion is involved or high frequency components are not important. Recently, a two stream convolutional neural network (CNN) was devised by Wang et al. [99] for heart-rate estimation using rPPG (remote PPG). The approach is based on fusing the two streams—one stream extracts the feature by adopting a low-rank constraint, while the other stream focuses on extracting reliable rPPG signals from facial videos.

Similarly, a two step CNN was introduced by Špetlík et al. [89] for extraction of heart rate from a sequence of facial images. The first feature extractor is trained using a temporal image sequence of faces. Then the extracted features are sent to the heart rate estimator module and the heart rate is predicted. They also introduced a challenging dataset of 204 fitness-themed videos (ECG-Fitness) where ground truth heart rate is measured by ECG. Also, a siamese-based rPPG network for heart-rate estimation using face videos is proposed by Tsou et al. [94]. For analyzing the temporal periodicity of rPPG signals, a 3D CNNs network is constructed and the two-branch model is jointly trained under the negative Pearson loss function. A weight sharing network (using a siamese type architecture) is employed for learning distinctive, robust, and complementary features from multiple facial regions.

4.3 Applications of heart rate monitoring in COVID-19

In modern day medical diagnosis, heart rate is one of the most important vital signs. Remote monitoring of heart rate has far reaching consequences and the correspondence of cardiovascular diseases and COVID-19 is still under active research [29, 44]. It has been found that cardiovascular disease increases the fatality rate of COVID-19 as well as the severity of symptoms in patients [44]. Contactless heart-rate monitoring using a smartphone camera will help in early identification of heart rate anomalies and a cardiologist can be consulted for the same. Generally, heart rate measured by Photoplethysmography (PPG) has been found to have high correlation [102] with that measured by an ECG machine or a contact-based vital signs monitor.

Conventional PPG methods make use of a contact-based light source and detector, whereas remote PPG (rPPG) makes use of ambient light and a video camera in place of a light sensor. While the conventional sensor-based approach does provide greater signal quality than the rPPG approach [95] it requires more expensive dedicated instruments, whereas the rPPG method offers the convenience and accessibility of monitoring using only a smartphone/laptop camera. Furthermore, for continuous monitoring, the instrument has to be attached to the person whereas in remote PPG, all the processes can be done with a camera without any physical contact.

Remote PPG signals captured using a video camera suffer from various artifacts such as facial structure, lighting difference and a variation in skin tone among individuals. As there is a lack of datasets covering the said artifacts, training on a particular dataset would not make the method applicable for a larger population. Thus, self-supervised methods which provide fast adaptability to a varied population would be more useful. Lee et al. [50] proposed a transductive meta-learner which takes unlabeled samples during testing itself and does a self-supervised weight adjustment.

Thus, there is an urgent need of design of unsupervised or self-supervised architectures. Sometimes, measuring heart rate using an ECG or any other contact based method is infeasible, which has become more common in COVID-19 context. Using a video camera based approach, variability in the heart rate can be studied using rPPG and can be addressed by a doctor in case of an anomaly.

5 Monitoring of respiratory rate

Respiratory rate (RR) is one of the primary vital signs of the human body and is a direct indicator of potential respiratory dysfunction. Respiratory rate is the rate at which our breathing occurs. It is usually measured by the number of breaths a person takes per minute, when a person is at rest. Respiration helps to maintain the oxygen delivery to various tissues in the body. One of the first signs of deterioration of oxygen delivery to the body tissues is a change in respiratory rate. Thus, a continuous respiratory rate monitor is essential, particularly in the context of COVID-19. The various approaches to respiration monitoring can typically be classified as-: contact or contactless.

5.1 Conventional workflow for respiratory rate measurement

Traditionally, respiratory rate measurement is done using professional medical equipment in hospitals and research facilities; these tend to be complex, heavy and expensive, putting them out of the reach of most individuals. They also tend to use contact-based methods of measurement. Contact based systems involve the placement of sensors on the subject’s body. These include acoustic methods, fiber optic sensors, chest/abdominal movement detection (using mercury strain gauges or impedance methods), airflow-based methods, transcutaneous C\(O_2\) monitoring, oximetry probe (Sp\(O_2\)) based and ECG derived [3]. The Spirometer (Fig. 6), used to measure air-volume intake by a patient, is an example of such a contact-based device. The concern with these methods is that it could represent a potential avenue for COVID-19 transmission due to congregation of patients with lung disease and the transmission through contact of sensors with patients.

Contactless methods offer the advantage of not requiring any physical contact with sensors. Current contactless methods include approaches which are radar-based, thermal imaging based and optical imaging based [3]. Of these, only the optical imaging technique may be done directly with equipment accessible to most patients such as a laptop camera. It also addresses the problem of congregation of patients and transmission through coughing/droplet infection through remote monitoring. Various approaches to video camera based techniques make use of post-processing of pixel data such as subtraction of images or analysis of optical flow. For instance, Massaroni et al. [60] made use of input from a single notebook computer RGB camera by analyzing the intensity of reflected light at the level of the pit of the neck. This area needs to be manually defined by the user.

Fig. 6
figure 6

Spirometer, used for measuring breathing rate. An example of a conventional, contact-based device that measures the volume and/or the rate of air intake by the lungs. Source: National Heart Lung and Blood Institute (NIH) / Public domain

5.2 ML-enabled workflow for respiratory rate monitoring

The detection of a patient in the video frame and selection of the Region of Interest (ROI) in the above techniques are tasks which have scope to be automated through the application of ML/DL. An example workflow involves two models to be employed—one to identify and select the ROI, and the second, which focuses on the selected region to classify the breathing pattern as normal or abnormal. The performance of these tasks of selection of ROI and patient detection is essential to the successful estimation of vital signs from the video camera [19]. Chaichulee proposed a multi-task convolutional neural network (CNN) model that automatically detects the presence or absence of a patient and segments the patient’s skin regions if the patient is found in front of the camera. In this work, the skin annotation was performed using a semi-automatic approach to reduce the effort required for labelling the skin regions.

Cho et al. [25] proposes DeepBreath, a deep learning model using a CNN which automatically recognises people’s psychological stress level (mental overload) from their breathing patterns. To avoid the issue of patient privacy in using RGB cameras, this paper proposes using a low-cost thermal camera, to track a person’s breathing patterns as temperature changes around his/her nostril. DeepFilter (Liu et al. [55]) makes use of a bidirectional recurrent neural network (RNN) for fine-grained breathing rate monitoring that works on a smartphone. The main idea of this scheme is to infer breathing events from low signal-to-noise-ratio (SNR) recordings. This currently has applications in sleep apnoea detection, treatment for asthma and sleep stage detection.

ML based detection of deterioration in vital signs faces challenges such as small volume of datasets and domain complexity. Small datasets lead to overfitting of the neural networks. Domain complexity of the disease can lead to inaccurately mapping causes and stages of disease. Alloghani et al. [5] assess several machine learning based algorithms for clustering and prediction of vital signs.

5.3 Applications of respiratory rate monitoring in COVID-19

Many DL based techniques suffer from overfitting and hence require special techniques to boost the training approach. Two main challenges to video-based monitoring are those of patient detection and ROI selection. Existing literature describes specific architectures of Artificial Neural Networks (ANNs) for ROI segmentation for specific applications such as for neo-natal infants.

In the particular case of COVID-19, there is an increasing need to come up with remote monitoring solutions that are accessible to people for greater reach. Since the symptoms of COVID-19 are predominantly respiratory, there is great potential for aiding quick diagnosis through application of ML in this domain. As hospitals are placed under the immense stress of handling the influx of patients during the pandemic, development of such techniques which enable remote monitoring and diagnosis can help alleviate some of this pressure on the medical system. To this end, the methods outlined above that make use of a single RGB camera are particularly useful, as they are easily accessible in the form of a smartphone or laptop webcam. This however poses two challenges due to patients not being in a controlled environment—firstly, the patient may not be present in all the video frames. Second, the lighting conditions may vary thereby making skin colour-based region of interest detection difficult. Hence, any research that yields fully automatic ROI selection using ML techniques would be invaluable.

6 Cough analysis

Coughing is a sudden air expulsion from the nasal airways or throat which is characterized by a distinctive sound. Dry cough, which is a characteristic symptom of COVID-19, is one where no mucus or phlegm is produced with the cough. Prior studies have shown that cough from distinct respiratory syndromes have distinct latent features [10]. Recent medical findings [31, 51] about the pathological alternations caused by COVID-19 have shown that the alternations are distinct from those caused by other common non-COVID-19 respiratory diseases.

6.1 Conventional workflow for cough analysis

Traditionally, analyzing the conditions that affect breathing such as asthma, chronic obstructive pulmonary disease (COPD) is done using spirometer. The spirometer measures the forced vital capacity (FVC) which represents the total volume of air exhaled from the lungs after the deepest possible inhalation and the forced expiratory volume in the first second (FEV1) [62]. The results obtained from spirometry are then compared against the reference or predicted measurements. The FEV1/FVC(Tiffeneau-Pinelli index), FVC and the forced expiratory ratio, help differentiate restrictive, obstructive and normal breathing patterns [84]. Although spirometry is a non-invasive technique, it requires large amount of patient coordination and skin contact as shown in Fig. 6. For maintaining consistency in results, especially among elderly and patients with significant lung disease, the technique has to be repeated multiple times. This can be inconvenient and difficult for some subjects such as the elderly or those with significant lung disease. Further more, the cost of equipment and health care professional time is an additional overhead. Additionally, the assessment of cough severity by Leicester cough questionnaire (LCQ) and visual analogue scales (VAS) is currently of subjective nature and requires health care professionals to supervise them [87].

The existing cough frequency monitors can be divided into two categories : those that only use an audio signal and the others that use other signals also in addition to the audio signals. Among the audio signal based methods, the Hull Automatic Cough Counter [12] utilizes a free-field microphone to record audio signals throughout the day. The coughs are detected automatically by the microphone but human supervision is required for counting the number of coughs. The Pulmo Track-CC [87] takes multiple signals as input for getting the cough information. A piezoelectric belt is included for measuring the motion of chest wall, two contact microphones placed on the trachea and the thorax and a lapel microphone is added for measuring the audio signal. Nevertheless, Turner et al. [87] in their study found that this device had a sensitivity of only 26% for detecting coughs as compared to those heard through the ear. Therefore, most of the existing cough analysis solutions either require professional medical assistance to conduct them or have a very low sensitivity. Additionally, there is no standard equipment available in different geographical locations. As a result the tests are extremely expensive and available scarcely. For optimal performance, most of the above mentioned methods have to be undertaken as in-person in an hospital or a clinic. This inevitably requires breach of isolation of the suspected patients. Such visit exposes more members of the public to COVID-19 while the patient is on the way to the test facility.

In comparison, the recording of cough sounds requires minimal patient cooperation. Recording of spontaneous coughs is straightforward. Adults can also be asked to produce voluntary coughs, which can be recorded via a mobile device. These approaches require no physical contact.

6.2 ML-enabled workflow for cough analysis

The automatic analysis of cough parameters for quick detection of COVID-19 is the task that has scope to be automated through the application of Machine Learning. Sharan et al. [84] proposes feature extraction using Mel-frequency Cepstral Coefficients (MFCC) followed by passing the extracted feature to a support vector regression (SVR) model. Using the Root Mean Squared Error (RMSE) of the predicted output, sequential backward feature selection is used to remove irrelevant features and to select subset of features that minimize RMSE for making predictions. Figure 7 shows an example workflow of an ML classifier model that can identify and classify cough patterns as either a COVID positive or COVID negative diagnosis.

Rao et al. [77] proposes extracting MFCC for short overlapping segments of cough sound sample resulting in a sequence of MFCCs for each cough recording. This is then converted to single vector by computing average value for each element in MFCC vectors in sequence. Support Vector regression (SVR) model is used for regression and final spirometry reading is computed by taking median of predicted values across all instances. Kosasih et al. [45] proposes using features like format frequencies (FF) and MFCC for extracting wavelet features from each cough sample followed by normalization of features for removing the effects of sound intensity variations. Logistic regression model (LRM) and leave-one-out-validation (LOOV) are used for model designs. This model currently finds usage in pneumonia detection as well.

Sharan et al. [85] proposes using gammatone filter models which mimic the human auditory system. The gammatone filter is augmented with linear MFCCs for extracting the frequency components of cough sound sample. Support vector machine (SVM) and Artificial Neural Network (ANN) models are then used for classifying the cough as having symptoms of croup (a respiratory disease found in children) or not.

Fig. 7
figure 7

A general workflow for an ML classifier system that can identify a cough and appropriately flag it if symptomatic of COVID-19. The classifier may use deep learning models to make a prediction based on recordings from a smartphone, thus enabling a user to perform a quick check-up using no specialised equipment other than a smartphone

6.3 Applications in COVID-19 for cough analysis

The existing solutions mentioned above face major problems in COVID-19 detection due to the fact that medical findings [51] about the pathological alternations caused by COVID-19 have shown that the alternations are distinct from those caused by other common non-COVID-19 respiratory diseases. Therefore, existing ML models adopted for other respiratory diseases like pneumonia and croup are not optimal for COVID-19 detection and hence require special techniques to boost their effectiveness. Researchers at the University of Cambridge are aiming to create a crowd-sourced dataset from recordings provided by participants through the COVID-19 Sounds App [15]. This dataset can then be used to build models that can classify as COVID or non-COVID based on voice recordings of patients. Brown et al. [15] describe how both handcrafted features like RMS energy, and features from Transfer learning (using a VGGish pre-trained network) can be used to generate a feature vector for the classification task.

Further, existing solutions must also be tweaked to give more importance to minimizing false negatives which are a more serious problem than false positives in the medical domain. Some preliminary studies [41] using CNN based models using Mel spectrogram image of input cough sound samples have shown impressive results for COVID-19 testing. Parallel and independent classifier systems based on deep learning and classical machine learning classifiers can be particularly useful in COVID-19 testing. This architecture is helpful in minimizing misdiagnosis by giving a veto to each classifier and hence giving “COVID negative” result only when all the classifiers give similar results. If even one classifier disagrees with the others, then results like “inconclusive” can be given to effectively minimize false negatives. The accuracy of these models is expected to increase over time with the increase in quality of data.

7 Monitoring of oxygen saturation

Oxygen saturation also known as Sp\(O_2\), is a measure of the amount of oxygen-carrying hemoglobin in the blood relative to the amount of hemoglobin not carrying oxygen. Normal oxygen saturation is between 96% and 98%. Low levels of Sp\(O_2\) can cause hypoxemia and warrants immediate oxygen supplementation for the patient’s lung condition. The methods for monitoring oxygen saturation can be classified into contact or contactless.

7.1 Conventional workflow of oxygen saturation measurement

Traditionally, saturated oxygen (Sp\(O_2\)) levels are measured using arterial blood gases and pulse oximetry. Monitoring Sp\(O_2\) levels using arterial blood gases involves obtaining blood samples from an artery such as radial artery in the wrist or femoral artery in the groin region and are measured in millimeters of mercury which represent how effectively the human body is exchanging oxygen and carbon dioxide. Whereas, pulse oximetry relies on detecting the difference in absorption of particular wavelengths of light by oxygenated and reduced hemoglobin where a sensor is placed on a finger or earlobe that reads the wavelengths of light reflected from the blood (Fig. 8). This is possible as oxygenated and reduced hemoglobin have different light absorption at different wavelengths.

The current contactless methods are optical imaging based, where a convolutional neural network (CNN) based encoder-decoder architecture is employed to segment the regions of interest in the image. Then on the segmented image, heart rate and oxygen saturation are extracted separately from the signal [92]. Estimating oxygen saturation with RGB cameras is much more difficult with respect to the contact based pulse oximetry. RGB camera based oximeters use ambient light with unknown intensity on a large tissue volume and background noise as compared to monochromatic light with known intensity on a small tissue volume for contact based pulse oximetry. This is a great challenge which needs to taken care algorithmically for Sp\(O_2\) to be estimated remotely with good accuracy. The section given below describes how ML-based approaches are currently used for oxygen saturation estimation.

Fig. 8
figure 8

A fingertip oximeter—a non-invasive but contact based method of measuring oxygen saturation by using a sensor placed on the fingertip. A variant of this technique is also becoming increasingly common in smartwatches such as devices from Fitbit and Garmin. Source: ©Teutotechnik, Med. Produktions- und Vertriebs-GmbH, Niedersachsenstr. 7,49186 Bad Iburg / Wikimedia Commons / CC-BY-SA-3.0 / GFDL

7.2 ML-enabled workflow for oxygen saturation monitoring

The detection of presence of the patient and accurate segmentation of the region of interest while handling dynamic movement of the ROI within the frame can be automated through ML methods. These objectives are paramount in accurately determining the Sp\(O_2\) levels in blood of the patient. The main approach for calculating Oxygen Saturation from camera is from the PPG signals at two different wavelength. Herrmann et al. [39] proposed an apparatus consisting of three separate monochrome cameras each with suitable optical filters, the focus is on spectral areas where the extinction coefficients of oxygenated and reduced hemoglobin are distinct. The region is chosen to be the hand as peripheral cyanosis can be observed distinctly on this region. For capturing the signals only from the tissue part of the image, first hand is segmented using Fully Convolutional Networks (FCN-8s) [37]. After separating heart rate and oxygen saturation from the signal, ’ratio of ratios’ method [92] is applied to estimate Sp\(O_2\) levels.

Recently, smartphone cameras have been proposed to monitor oxygen saturation level. Ding et al. [27] used SVD (Singular Value Decomposition) to remove motion artifacts which are prevalent seconds before the saturation level is lowest and thereby making the original artifact reduction using SVD [78] more robust. While constructing the signal matrix if motion artifacts are deducted in a coming cycle, this data is not added to the SVD matrix which led to the recreated signal to be visibly natural, The RGB frames are converted to photoplethysmography (PPG) signals by averaging pixel values from candidate regions of interest. Reduced hemoglobin absorbs more red light which results in less red light reaching the camera resulting in a decreased PPG signal, as this is a valuable feature the signal was decomposed into bandpass and lowpass filtered versions. These signals are then fed into a convolutional neural network for Sp\(O_2\) estimation, where a novel 1D convolutional neural network for regressing oxygen saturation which proved superior to the previous state-of-the-art ratio-of-ratios model is proposed. The CNN model is trained for a particular smartphone camera type. Generalization across cameras of various companies is an aspect to be looked upon.

Gabriella [18] used a see-through mirror provided with a camera to acquire video-frames. They used the color of the patient’s lips to detect Sp\(O_2\) levels as lips turning purple or bluish may indicate low oxygen levels in blood. A pre-trained face detection model is employed to detect the patient’s face in the video stream, and the facial landmarks are used so that ROI of the lips is detected and isolated. The ROI is preprocessed to identify the dominant color and classified as “regular”,“altered” or “purplish”, K-means clustering with \(k=3\) is applied to detect dominant color in ROI, Then a fuzzy-based intelligent system is used and the color of the lips which serves as the representation of Sp\(O_2\) level functions as a linguistic variable along with other vital signs to decide the risk level of the said patient. The MAE of this approach compared with the standard finger tip oximeter was 1.83 with a standard deviation of 2.43 on their dataset of 10 individuals. We would like to mention however, due to lack of standardization of the datasets used in remote oxygen saturation monitoring, it is difficult to compare multiple methods on their error rates.

7.3 Applications in COVID-19 for oxygen saturation

The advances made in this field suggests that it is growing at a fast pace as contactless methods to determine Sp\(O_2\) levels are becoming increasingly desired in the fight against COVID-19 pandemic. Pulse oximetry units, in spite of their low-price and affordability, are low in production and COVID-19 has made the demand much higher. The widespread reach of smartphones makes it an ideally suited medium of detection of Sp\(O_2\) levels in the current global scenario. Work done by Herrmann [39] suggests that three separate monochrome cameras find two spectral areas where extinction coefficients of oxygenated and reduced hemoglobin are distinct as compared to only one in normal RGB based approach. If these three different monochromatic filters can be replicated using one/multiple smartphones then the results of the smartphone based approach will be more reliable. Artifacts removal through a generative adversarial network can be employed when an appropriate dataset is made available. This is expected to perform better than SVD and hence further improves the smartphone Sp\(O_2\) measuring ability. Further, since the domain and characteristics of the input images are the same, pre-trained networks like VGG-16 may be surpassed by a superior attention-based encoder-decoder approach for segmentation.

Another fast developing area has been blood oxygen level monitoring by means of an oximeter built into many smartwatches such as the Fitbit Versa 2 and the Garmin Forerunner 245. While there do exist some oximeter apps for smartphones, there have been questions raised about their accuracy, range of measurement and training datasets used [91].

8 Monitoring of blood pressure

The human heart is responsible for circulating blood to the rest of the body. Blood is pumped into large blood vessels of the circulatory system by the heart. The blood flow causes pressure on the walls of the vessels. The pressure exerted can be classified into two types. The pressure exerted when the muscles of the heart contract and blood flows into the vessels is called Systolic blood pressure, whereas the pressure exerted when the muscles of heart relax is called Diastolic blood pressure. Systolic blood pressure is always higher as more blood is flowing through the vessels during a systole. COVID-19 can damage the heart directly. As high blood pressure has been found to be a precursor of many diseases, people with high blood pressure are more likely to get infected, have worse symptoms and die from the infection [69, 88]. Thus, a safe and accurate blood pressure measurement workflow is very important.

8.1 Conventional workflow for blood pressure measurement

There are two approaches to measure blood pressure, invasive and non-invasive. Invasive methods are more accurate and are used to measure blood pressure continuously in critical patients. It involves direct measurement of arterial pressure by inserting a cannula needle in a suitable artery. There are mainly two instruments currently used to measure blood pressure in a non-invasive manner. The one used at homes for self-monitoring and reporting is a digital blood pressure monitor. It can be placed on a finger, wrist, or the arm. The blood pressure is measured automatically based on variation in the volume of blood flowing through the vessels.

This device, however, is often inaccurate and unreliable, especially for people with blocked arteries and heart rhythm problems. Hospitals, thus generally use a far more accurate device known as sphygmomanometer (Fig. 9). It has a cuff which is wrapped around the patient’s arm and inflated until blood stops flowing through the brachial artery. The corresponding pressure is measured on the pressure meter. A stethoscope is also required, to register the moment at which blood stops flowing [74]. A trained professional is required for measurement, hence the workload is increased. There are multiple contact points in this procedure, thus this is not a safe method of measuring blood pressure in suspected COVID-19 patients. Moreover, none of the non-invasive methods can be used for continuous monitoring.

Blood pressure can also be derived from vitals other than direct measurement. Photoplethysmography (PPG), also used for estimating heart rate, works by detecting the changes in blood volume due to changes in microvascular tissue during a heart beat. PPG readings along with ECG readings can be used to calculate the Pulse Transit Time (PTT), from which the blood pressure can be calculated [82]. ECG measurements, however, require contact with sensors, which adds to the risk of infection.

Fig. 9
figure 9

Sphygmomanometer, a device conventionally used for blood pressure measurement. It consists of an inflatable cuff which is placed on the upper arm of the patient, and is thus a contact-based system of measurement. Source: OpenStax/CC BY Version 8.25 from the Textbook OpenStax Anatomy and Physiology, published May 18, 2016

8.2 ML-enabled workflow for blood pressure monitoring

Various researchers have attempted to derive Blood Pressure readings solely from PPG signals, which reduces contact points in measurement as well as the human effort required in critical situations. PPG signals can be obtained continuously, using non-invasive, simple and low cost methods. These signals are divided into 30 second intervals and pass through a CNN architecture which extracts relevant features, which then pass through fully connected (FC) layers to obtain final BP readings. A siamese network (which takes in the first available 30 second PPG window along with the corresponding BP reading) is used to calibrate the measurements according to a particular patient for more accurate results, but even without calibration the results are extremely accurate for diastolic blood pressure and slightly less, but still quite accurate for systolic blood pressure [82].

As done in estimating heart rate remotely, the PPG measurements can be designed to be contactless for blood pressure as well. A robust method for estimating the blood volume pulse is done by Poh et al. [72] using human face videos captured using a RGB camera. These blood absorption changes caused by change in blood volume during a heart beat are measured by a low cost RGB camera. After detecting the face using Viola-Jones algorithm, the forehead is selected as the region of interest as there is least disturbance there. The ground truth blood pressure and extracted PPG signals are fed into a feed forward neural network. The approach currently uses a single layer network for getting a real time performance on a standard smartphone camera. Although the results are affected by ambient light due to collection of data during different times of the day, this algorithm shows satisfactory accuracy of more than 85% [71]. There are multiple research works on further improving the accuracy of derived BP values from PPG signals [33, 82, 83, 97].

Table 1 Potential of ML-based techniques for mitigating workload in hospitals

8.3 Applications of blood pressure in COVID-19

The current research in blood pressure measurement focuses mainly on PPG signals. Reliability is extremely important for any method to be useful, and while there has been some work on contactless measurement via low cost video devices, reliable PPG readings can currently be taken only via contact based methods like sensors on fingers. This is a major bottleneck in the establishment of a safer workflow, and thus should be the focus of any research that aims to work towards the problem of blood pressure measurement, keeping COVID-19 in mind.

One of the problems faced by researchers is the unreliability of the quality of data which can be used for training models. Blood pressure measurements are erratic and unstable in most critical patients (in whom it is continuously measured), which renders only a small portion of data to be useful. Thus there is a need for techniques that either refine currently unusable data or create more trainable data using architectures such as Generative Adversarial Networks.

Moreover, a new workflow can only be accurate if it has multiple sources of information, and as such solely depending on the PPG signal can be dangerous. A viable source can be ECG readings, which can currently be measured only via contact, through sensors placed on the chest and other areas. BP readings can be derived from ECG and PPG signals combined. Since ECG detects electrical signals in the body, it will be difficult to obtain the signal solely via video based inputs. Hence, research is required for contactless ECG [70]. Meanwhile, the focus should also be on direct calculation of BP from ECG using suitable deep learning models.

9 Conclusion

Use of Machine Learning and Deep Learning techniques has tremendous potential and advantages for use over the traditional used approaches for vital signs monitoring. In addition to the advantages given in the below Table 1, these methods are extremely useful for continuous monitoring without any contact.

Remote monitoring of vital health signs has the potential to be extremely useful in the scenario of a pandemic. The patient’s vital signs can be monitored regularly by medical practitioners and attended to when there is any exigency. This work presents a perspective on how remote monitoring based on ML and imaging can be leveraged for estimation of vital signs for early detection of COVID-19 and also for regular monitoring of remote patients. Currently, contact based monitoring instruments are employed in the hospitals and are difficult to use for continuous monitoring. Many remote monitoring techniques can also alleviate that issue and facilitate continuous, non-invasive monitoring. In the current state of affairs, datasets for training such models is a biggest limitation. Since most of these techniques rely on images or videos, the lighting of the room needs to be optimum in order to attain good accuracy. Moreover, since most of the techniques are based upon standard machine learning and signal processing techniques, these can be easily deployed upon a smartphone. This helps to alleviate the privacy concerns as all of the processing is happening on the smartphone itself. Models which require more compute power can be pruned and a lighter version can be used for inference at the edge itself. There have been efforts on Software Defined Networks (SDNs) for securing the edge-cloud interplay [8] which help to alleviate the privacy concerns. Work has also been done on preserving anonymity [6] and authentication [101] for medical devices deployment. Moreover, secure mobile protocols for healthcare [100] have also been proposed which can be deployed when using these techniques.

Through this work, we hope to draw attention towards this promising field of research which holds great importance in a COVID-affected world. As the influx of COVID-19 patients increases, hospitals will have lesser trained professionals than needed. It is therefore important that any new methods are automated and do not require much intervention from medical professionals, apart from infrequent monitoring. All the vital signs can be monitored using a centralised system using intelligent systems as demonstrated by Muhammad et al. for fire scenes [68] and smoke [65]. In future work, effective video summarization can be helpful for providing key insights as done in surveillance networks [63, 64]. Once fully deployed, these methods are especially helpful for the susceptible people in a pandemic scenario such as COVID-19 as they can get indication of their deteriorating health and seek medical attention . These methods will also help in the post-COVID world as they are affordable, more accessible than the conventional instruments and most importantly have the ability to perform continuous contactless monitoring.