当前位置: X-MOL 学术Appl. Soft Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Forecasting the dynamics of cumulative COVID-19 cases (confirmed, recovered and deaths) for top-16 countries using statistical machine learning models: Auto-Regressive Integrated Moving Average (ARIMA) and Seasonal Auto-Regressive Integrated Moving Average (SARIMA)
Applied Soft Computing ( IF 7.2 ) Pub Date : 2021-02-08 , DOI: 10.1016/j.asoc.2021.107161
K E ArunKumar 1 , Dinesh V Kalaga 2 , Ch Mohan Sai Kumar 3 , Govinda Chilkoor 4 , Masahiro Kawaji 2 , Timothy M Brenza 1, 5
Affiliation  

Most countries are reopening or considering lifting the stringent prevention policies such as lockdowns, consequently, daily coronavirus disease (COVID-19) cases (confirmed, recovered and deaths) are increasing significantly. As of July 25th, there are 16.5 million global cumulative confirmed cases, 9.4 million cumulative recovered cases and 0.65 million deaths. There is a tremendous necessity of supervising and estimating future COVID-19 cases to control the spread and help countries prepare their healthcare systems. In this study, time-series models — Auto-Regressive Integrated Moving Average (ARIMA) and Seasonal Auto-Regressive Integrated Moving Average (SARIMA) are used to forecast the epidemiological trends of the COVID-19 pandemic for top-16 countries where 70%–80% of global cumulative cases are located. Initial combinations of the model parameters were selected using the auto-ARIMA model followed by finding the optimized model parameters based on the best fit between the predictions and test data. Analytical tools Auto-Correlation function (ACF), Partial Auto-Correlation Function (PACF), Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) were used to assess the reliability of the models. Evaluation metrics Mean Absolute Error (MAE), Mean Square Error (MSE), Root Mean Square Error (RMSE) and Mean Absolute Percent Error (MAPE) were used as criteria for selecting the best model. A case study was presented where the statistical methodology was discussed in detail for model selection and the procedure for forecasting the COVID-19 cases of the USA. Best model parameters of ARIMA and SARIMA for each country are selected manually and the optimized parameters are then used to forecast the COVID-19 cases. Forecasted trends for confirmed and recovered cases showed an exponential rise for countries such as the United States, Brazil, South Africa, Colombia, Bangladesh, India, Mexico and Pakistan. Similarly, trends for cumulative deaths showed an exponential rise for countries Brazil, South Africa, Chile, Colombia, Bangladesh, India, Mexico, Iran, Peru, and Russia. SARIMA model predictions are more realistic than that of the ARIMA model predictions confirming the existence of seasonality in COVID-19 data. The results of this study not only shed light on the future trends of the COVID-19 outbreak in top-16 countries but also guide these countries to prepare their health care policies for the ongoing pandemic. The data used in this work is obtained from publicly available John Hopkins University’s COVID-19 database.



中文翻译:


使用统计机器学习模型预测前 16 个国家/地区累计 COVID-19 病例(确诊、康复和死亡)的动态:自回归综合移动平均线 (ARIMA) 和季节性自回归综合移动平均线 (SARIMA)



大多数国家正在重新开放或考虑取消封锁等严格的预防政策,因此每日冠状病毒病(COVID-19)病例(确诊、康复和死亡)大幅增加。截至7月25日,全球累计确诊病例1650万例,累计治愈病例940万例,死亡病例65万例。非常有必要监督和估计未来的 COVID-19 病例,以控制传播并帮助各国准备其医疗保健系统。在本研究中,时间序列模型——自回归综合移动平均线 (ARIMA) 和季节性自回归综合移动平均线 (SARIMA) 用于预测前 16 个国家/地区的 COVID-19 大流行的流行病学趋势,其中 70% – 定位了全球累计病例的 80%。使用自动 ARIMA 模型选择模型参数的初始组合,然后根据预测和测试数据之间的最佳拟合找到优化的模型参数。使用分析工具自相关函数(ACF)、偏自相关函数(PACF)、赤池信息准则(AIC)和贝叶斯信息准则(BIC)来评估模型的可靠性。评估指标平均绝对误差(MAE)、均方误差(MSE)、均方根误差(RMSE)和平均绝对百分比误差(MAPE)被用作选择最佳模型的标准。提出了一个案例研究,详细讨论了模型选择的统计方法和预测美国 COVID-19 病例的程序。手动选择每个国家的 ARIMA 和 SARIMA 最佳模型参数,然后使用优化的参数来预测 COVID-19 病例。 预测趋势显示,美国、巴西、南非、哥伦比亚、孟加拉国、印度、墨西哥和巴基斯坦等国家的确诊和康复病例呈指数增长。同样,巴西、南非、智利、哥伦比亚、孟加拉国、印度、墨西哥、伊朗、秘鲁和俄罗斯等国家的累计死亡人数呈指数增长。 SARIMA 模型预测比 ARIMA 模型预测更加真实,证实了 COVID-19 数据中存在季节性。这项研究的结果不仅揭示了前 16 个国家中 COVID-19 疫情的未来趋势,还指导这些国家为当前的大流行制定医疗保健政策。这项工作中使用的数据来自公开的约翰·霍普金斯大学的 COVID-19 数据库。

更新日期:2021-02-12
down
wechat
bug