当前位置: X-MOL 学术Int. J. Environ. Res. Public Health › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Estimation of COVID-19 Epidemiology Curve of the United States Using Genetic Programming Algorithm
International Journal of Environmental Research and Public Health ( IF 4.614 ) Pub Date : 2021-01-22 , DOI: 10.3390/ijerph18030959
Nikola Anđelić 1 , Sandi Baressi Šegota 1 , Ivan Lorencin 1 , Zdravko Jurilj 2 , Tijana Šušteršič 3, 4 , Anđela Blagojević 3, 4 , Alen Protić 2, 5 , Tomislav Ćabov 6 , Nenad Filipović 3, 4 , Zlatan Car 1
Affiliation  

Estimation of the epidemiology curve for the COVID-19 pandemic can be a very computationally challenging task. Thus far, there have been some implementations of artificial intelligence (AI) methods applied to develop epidemiology curve for a specific country. However, most applied AI methods generated models that are almost impossible to translate into a mathematical equation. In this paper, the AI method called genetic programming (GP) algorithm is utilized to develop a symbolic expression (mathematical equation) which can be used for the estimation of the epidemiology curve for the entire U.S. with high accuracy. The GP algorithm is utilized on the publicly available dataset that contains the number of confirmed, deceased and recovered patients for each U.S. state to obtain the symbolic expression for the estimation of the number of the aforementioned patient groups. The dataset consists of the latitude and longitude of the central location for each state and the number of patients in each of the goal groups for each day in the period of 22nd January 2020–3rd December 2020. The obtained symbolic expressions for each state are summed up to obtain symbolic expressions for estimation of each of the patient groups (confirmed, deceased and recovered). These symbolic expressions are combined to obtain the symbolic expression for the estimation of the epidemiology curve for the entire U.S. The obtained symbolic expressions for the estimation of the number of confirmed, deceased and recovered patients for each state achieved R2 score in the ranges 0.9406–0.9992, 0.9404–0.9998 and 0.9797–0.99955, respectively. These equations are summed up to formulate symbolic expressions for the estimation of the number of confirmed, deceased and recovered patients for the entire U.S. with achieved R2 score of 0.9992, 0.9997 and 0.9996, respectively. Using these symbolic expressions, the equation for the estimation of the epidemiology curve for the entire U.S. is formulated which achieved R2 score of 0.9933. Investigation showed that GP algorithm can produce symbolic expressions for the estimation of the number of confirmed, recovered and deceased patients as well as the epidemiology curve not only for the states but for the entire U.S. with very high accuracy.

中文翻译:

使用遗传编程算法估计美国 COVID-19 流行病学曲线

估计 COVID-19 大流行的流行病学曲线可能是一项非常具有计算挑战性的任务。到目前为止,已经有一些人工智能(AI)方法的实施应用于开发特定国家的流行病学曲线。然而,大多数应用的人工智能方法生成的模型几乎不可能转化为数学方程。本文利用称为遗传编程(GP)算法的人工智能方法开发了一种符号表达式(数学方程),可用于高精度估计整个美国的流行病学曲线。GP 算法用于包含美国各州确诊、死亡和康复患者数量的公开数据集,以获得用于估计上述患者组数量的符号表达式。该数据集包含2020年1月22日至2020年12月3日期间每个州中心位置的纬度和经度以及每个目标组中每天的患者数量。将获得的每个州的符号表达式相加以获得用于估计每个患者组(确诊、死亡和康复)的符号表达式。将这些符号表达式组合起来,得到用于估计整个美国流行病学曲线的符号表达式 获得的用于估计每个州的确诊人数、死亡人数和康复人数的符号表达式2得分范围分别为 0.9406–0.9992、0.9404–0.9998 和 0.9797–0.99955。将这些方程相加,形成符号表达式,用于估计整个美国的确诊患者、死亡患者和康复患者数量,并实现2得分分别为 0.9992、0.9997 和 0.9996。使用这些符号表达式,制定了估计整个美国流行病学曲线的方程,从而实现了2得分为 0.9933。调查显示,GP算法可以生成符号表达式,用于估算各州乃至整个美国的确诊人数、康复人数和死亡人数以及流行病学曲线,且精度非常高。
更新日期:2021-01-22
down
wechat
bug