Machine learning for prediction of euploidy in human embryos: in search of the best-performing model and predictive features,Fertility and Sterility

当前位置： X-MOL 学术 › Fertil. Steril. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Machine learning for prediction of euploidy in human embryos: in search of the best-performing model and predictive features
Fertility and Sterility ( IF 6.6 ) Pub Date : 2022-01-19 , DOI: 10.1016/j.fertnstert.2021.11.029
Stefanie De Gheselle ₁ , Céline Jacques ₂ , Jérôme Chambost ₂ , Celine Blank ₃ , Klaas Declerck ₁ , Ilse De Croo ₁ , Cristina Hickman ₂ , Kelly Tilleman ₁

Affiliation

Objective

To assess the best-performing machine learning (ML) model and features to predict euploidy in human embryos.

Design

Retrospective cohort analysis.

Setting

Department for reproductive medicine in a university hospital.

Patient(s)

One hundred twenty-eight infertile couples treated between January 2016 and December 2019. Demographic and clinical data and embryonic developmental and morphokinetic data from 539 embryos (45% euploid, 55% aneuploid) were analyzed.

Intervention(s)

Random forest classifier (RFC), scikit-learn gradient boosting classifier, support vector machine, multivariate logistic regression, and naïve Bayes ML models were trained and used in 9 databases containing either 26 morphokinetic features (as absolute [A1] or interim [A2] times or combined [A3]) alone or plus 19 standard development features [B1, B2, and B3] with and without 40 demographic and clinical characteristics [C1, C2, and C3]. Feature selection and model retraining were executed for the best-performing combination of model and dataset.

Main Outcome Measure(s)

The main outcome measures were overall accuracy, precision, recall or sensitivity, F1 score (the weighted average of precision and recall), and area under the receiver operating characteristic curve (AUC) of ML models for each dataset. The secondary outcome measure was ranking of feature importance for the best-performing combination of model and dataset.

Result(s)

The RFC model had the highest accuracy (71%) and AUC (0.75) when trained and used on dataset C1. The precision, recall or sensitivity, F1 score, and AUC were 66%, 86%, 75%, and 0.75, respectively. The accuracy, recall or sensitivity, and F1 score increased to 72%, 88%, and 76%, respectively, after feature selection and retraining. Morphokinetic features had the highest relative predictive weight.

Conclusion(s)

The RFC model can predict euploidy with an acceptable accuracy (>70%) using a dataset including embryos’ morphokinetics and standard embryonic development and subjects’ demographic and clinical features.

中文翻译：

用于预测人类胚胎整倍体的机器学习：寻找性能最佳的模型和预测特征

客观的

评估性能最佳的机器学习 (ML) 模型和特征来预测人类胚胎的整倍体。

设计

回顾性队列分析。

环境

大学医院的生殖医学科。

耐心）

2016 年 1 月至 2019 年 12 月期间对 128 对不育夫妇进行了治疗。分析了来自 539 个胚胎（45% 整倍体，55% 非整倍体）的人口统计学和临床数据以及胚胎发育和形态动力学数据。

干预措施

随机森林分类器 (RFC)、scikit-learn 梯度提升分类器、支持向量机、多元逻辑回归和朴素贝叶斯 ML 模型经过训练并用于 9 个数据库，其中包含 26 个形态动力学特征（作为绝对 [A1] 或临时 [A2]次或组合 [A3]) 单独或加上 19 个标准开发特征 [B1、B2 和 B3]，有或没有 40 个人口统计学和临床特征 [C1、C2 和 C3]。为模型和数据集的最佳组合执行了特征选择和模型再训练。

主要观察指标）

主要结果指标是每个数据集的 ML 模型的总体准确度、精确度、召回率或敏感性、F1 分数（准确率和召回率的加权平均值）和接受者操作特征曲线 (AUC) 下面积。次要结果测量是模型和数据集的最佳组合的特征重要性排名。

结果）

RFC 模型在数据集 C1 上训练和使用时具有最高的准确度 (71%) 和 AUC (0.75)。准确率、召回率或灵敏度、F1 得分和 AUC 分别为 66%、86%、75% 和 0.75。在特征选择和再训练之后，准确率、召回率或灵敏度以及 F1 分数分别提高到 72%、88% 和 76%。形态动力学特征具有最高的相对预测权重。