当前位置: X-MOL 学术Hum. Reprod. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Embryologist agreement when assessing blastocyst implantation probability: is data-driven prediction the solution to embryo assessment subjectivity?
Human Reproduction ( IF 6.1 ) Pub Date : 2022-08-09 , DOI: 10.1093/humrep/deac171
Daniel E Fordham 1 , Dror Rosentraub 1 , Avital L Polsky 1 , Talia Aviram 1 , Yotam Wolf 1 , Oriel Perl 1 , Asnat Devir 1 , Shahar Rosentraub 1 , David H Silver 1 , Yael Gold Zamir 1 , Alex M Bronstein 1, 2 , Miguel Lara Lara 3 , Jara Ben Nagi 4 , Adrian Alvarez 5 , Santiago Munné 5
Affiliation  

STUDY QUESTION What is the accuracy and agreement of embryologists when assessing the implantation probability of blastocysts using time-lapse imaging (TLI), and can it be improved with a data-driven algorithm? SUMMARY ANSWER The overall interobserver agreement of a large panel of embryologists was moderate and prediction accuracy was modest, while the purpose-built artificial intelligence model generally resulted in higher performance metrics. WHAT IS KNOWN ALREADY Previous studies have demonstrated significant interobserver variability amongst embryologists when assessing embryo quality. However, data concerning embryologists’ ability to predict implantation probability using TLI is still lacking. Emerging technologies based on data-driven tools have shown great promise for improving embryo selection and predicting clinical outcomes. STUDY DESIGN, SIZE, DURATION TLI video files of 136 embryos with known implantation data were retrospectively collected from two clinical sites between 2018 and 2019 for the performance assessment of 36 embryologists and comparison with a deep neural network (DNN). PARTICIPANTS/MATERIALS, SETTING, METHODS We recruited 39 embryologists from 13 different countries. All participants were blinded to clinical outcomes. A total of 136 TLI videos of embryos that reached the blastocyst stage were used for this experiment. Each embryo’s likelihood of successfully implanting was assessed by 36 embryologists, providing implantation probability grades (IPGs) from 1 to 5, where 1 indicates a very low likelihood of implantation and 5 indicates a very high likelihood. Subsequently, three embryologists with over 5 years of experience provided Gardner scores. All 136 blastocysts were categorized into three quality groups based on their Gardner scores. Embryologist predictions were then converted into predictions of implantation (IPG ≥ 3) and no implantation (IPG ≤ 2). Embryologists’ performance and agreement were assessed using Fleiss kappa coefficient. A 10-fold cross-validation DNN was developed to provide IPGs for TLI video files. The model’s performance was compared to that of the embryologists. MAIN RESULTS AND THE ROLE OF CHANCE Logistic regression was employed for the following confounding variables: country of residence, academic level, embryo scoring system, log years of experience and experience using TLI. None were found to have a statistically significant impact on embryologist performance at α = 0.05. The average implantation prediction accuracy for the embryologists was 51.9% for all embryos (N = 136). The average accuracy of the embryologists when assessing top quality and poor quality embryos (according to the Gardner score categorizations) was 57.5% and 57.4%, respectively, and 44.6% for fair quality embryos. Overall interobserver agreement was moderate (κ = 0.56, N = 136). The best agreement was achieved in the poor + top quality group (κ = 0.65, N = 77), while the agreement in the fair quality group was lower (κ = 0.25, N = 59). The DNN showed an overall accuracy rate of 62.5%, with accuracies of 62.2%, 61% and 65.6% for the poor, fair and top quality groups, respectively. The AUC for the DNN was higher than that of the embryologists overall (0.70 DNN vs 0.61 embryologists) as well as in all of the Gardner groups (DNN vs embryologists—Poor: 0.69 vs 0.62; Fair: 0.67 vs 0.53; Top: 0.77 vs 0.54). LIMITATIONS, REASONS FOR CAUTION Blastocyst assessment was performed using video files acquired from time-lapse incubators, where each video contained data from a single focal plane. Clinical data regarding the underlying cause of infertility and endometrial thickness before the transfer was not available, yet may explain implantation failure and lower accuracy of IPGs. Implantation was defined as the presence of a gestational sac, whereas the detection of fetal heartbeat is a more robust marker of embryo viability. The raw data were anonymized to the extent that it was not possible to quantify the number of unique patients and cycles included in the study, potentially masking the effect of bias from a limited patient pool. Furthermore, the lack of demographic data makes it difficult to draw conclusions on how representative the dataset was of the wider population. Finally, embryologists were required to assess the implantation potential, not embryo quality. Although this is not the traditional approach to embryo evaluation, morphology/morphokinetics as a means of assessing embryo quality is believed to be strongly correlated with viability and, for some methods, implantation potential. WIDER IMPLICATIONS OF THE FINDINGS Embryo selection is a key element in IVF success and continues to be a challenge. Improving the predictive ability could assist in optimizing implantation success rates and other clinical outcomes and could minimize the financial and emotional burden on the patient. This study demonstrates moderate agreement rates between embryologists, likely due to the subjective nature of embryo assessment. In particular, we found that average embryologist accuracy and agreement were significantly lower for fair quality embryos when compared with that for top and poor quality embryos. Using data-driven algorithms as an assistive tool may help IVF professionals increase success rates and promote much needed standardization in the IVF clinic. Our results indicate a need for further research regarding technological advancement in this field. STUDY FUNDING/COMPETING INTEREST(S) Embryonics Ltd is an Israel-based company. Funding for the study was partially provided by the Israeli Innovation Authority, grant #74556. TRIAL REGISTRATION NUMBER N/A.

中文翻译:

胚胎学家在评估胚泡植入概率时达成一致:数据驱动的预测是胚胎评估主观性的解决方案吗?

研究问题 胚胎学家在使用延时成像 (TLI) 评估胚泡植入概率时的准确性和一致性如何,是否可以通过数据驱动的算法进行改进?总结答案 一大群胚胎学家的整体观察者间一致性适中,预测准确性适中,而专门构建的人工智能模型通常会产生更高的性能指标。已知情况 先前的研究表明,在评估胚胎质量时,胚胎学家之间存在显着的观察者间差异。然而,关于胚胎学家使用 TLI 预测植入概率的能力的数据仍然缺乏。基于数据驱动工具的新兴技术已显示出改善胚胎选择和预测临床结果的巨大希望。研究设计、大小、持续时间在 2018 年至 2019 年期间从两个临床地点回顾性收集了 136 个具有已知植入数据的胚胎的 TLI 视频文件,用于对 36 名胚胎学家的绩效评估并与深度神经网络 (DNN) 进行比较。参与者/材料、设置、方法 我们招募了来自 13 个不同国家的 39 名胚胎学家。所有参与者都对临床结果不知情。本实验共使用了 136 个达到囊胚期的胚胎的 TLI 视频。每个胚胎成功植入的可能性由 36 名胚胎学家评估,提供从 1 到 5 的植入概率等级 (IPG),其中 1 表示植入的可能性非常低,5 表示植入的可能性非常高。随后,三位拥有超过 5 年经验的胚胎学家提供了 Gardner 评分。根据 Gardner 评分,所有 136 个囊胚被分为三个质量组。然后将胚胎学家的预测转换为植入(IPG ≥ 3)和无植入(IPG ≤ 2)的预测。使用 Fleiss kappa 系数评估胚胎学家的表现和一致性。开发了一个 10 倍交叉验证 DNN 来为 TLI 视频文件提供 IPG。该模型的性能与胚胎学家的性能进行了比较。主要结果和机会的作用 对以下混杂变量采用逻辑回归:居住国、学术水平、胚胎评分系统、对数年的经验和使用 TLI 的经验。在 α = 0.05 时,没有发现对胚胎学家的表现有统计学意义的影响。胚胎学家的平均植入预测准确度为 51。所有胚胎为 9%(N = 136)。胚胎学家在评估优质胚胎和劣质胚胎(根据 Gardner 评分分类)时的平均准确度分别为 57.5% 和 57.4%,而对于质量一般的胚胎则为 44.6%。总体观察者间一致性适中(κ = 0.56,N = 136)。在质量较差 + 质量最高的组中,一致性最好(κ = 0.65,N = 77),而在质量一般的组中,一致性较低(κ = 0.25,N = 59)。DNN 的总体准确率为 62.5%,差、一般和高质量组的准确率分别为 62.2%、61% 和 65.6%。DNN 的 AUC 高于整个胚胎学家(0.70 DNN 对 0.61 胚胎学家)以及所有 Gardner 组(DNN 对胚胎学家——差:0.69 对 0.62;一般:0.67 对 0.53;顶部:0.77 对0.54)。限制,谨慎的原因 囊胚评估是使用从延时培养箱中获取的视频文件进行的,其中每个视频都包含来自单个焦平面的数据。关于不孕症的根本原因和移植前子宫内膜厚度的临床数据尚不可用,但可以解释植入失败和 IPG 准确性较低的原因。植入被定义为存在妊娠囊,而胎儿心跳的检测是胚胎活力的更强有力的标志。原始数据被匿名化,以至于无法量化研究中包含的独特患者和周期的数量,这可能掩盖了有限患者群体的偏见影响。此外,由于缺乏人口统计数据,因此很难就数据集对更广泛人群的代表性得出结论。最后,胚胎学家需要评估植入潜力,而不是胚胎质量。虽然这不是胚胎评估的传统方法,但作为评估胚胎质量的一种手段的形态/形态动力学被认为与活力密切相关,对于某些方法,与植入潜力密切相关。研究结果的更广泛意义 胚胎选择是体外受精成功的关键因素,并且仍然是一个挑战。提高预测能力可以帮助优化植入成功率和其他临床结果,并可以最大限度地减少患者的经济和情感负担。这项研究表明胚胎学家之间的一致性率适中,可能是由于胚胎评估的主观性质。特别是,我们发现,与优质和劣质胚胎相比,中等质量胚胎的平均胚胎学家准确度和一致性显着降低。使用数据驱动的算法作为辅助工具可以帮助 IVF 专业人员提高成功率并促进 IVF 诊所急需的标准化。我们的结果表明需要进一步研究该领域的技术进步。研究资金/竞争利益 Embryonics Ltd 是一家以色列公司。该研究的部分资金由以色列创新局提供,赠款#74556。试用注册号 不适用。我们发现,与优质和劣质胚胎相比,中等质量胚胎的平均胚胎学家准确度和一致性显着降低。使用数据驱动的算法作为辅助工具可以帮助 IVF 专业人员提高成功率并促进 IVF 诊所急需的标准化。我们的结果表明需要进一步研究该领域的技术进步。研究资金/竞争利益 Embryonics Ltd 是一家以色列公司。该研究的部分资金由以色列创新局提供,赠款#74556。试用注册号 不适用。我们发现,与优质和劣质胚胎相比,中等质量胚胎的平均胚胎学家准确度和一致性显着降低。使用数据驱动的算法作为辅助工具可以帮助 IVF 专业人员提高成功率并促进 IVF 诊所急需的标准化。我们的结果表明需要进一步研究该领域的技术进步。研究资金/竞争利益 Embryonics Ltd 是一家以色列公司。该研究的部分资金由以色列创新局提供,赠款#74556。试用注册号 不适用。我们的结果表明需要进一步研究该领域的技术进步。研究资金/竞争利益 Embryonics Ltd 是一家以色列公司。该研究的部分资金由以色列创新局提供,赠款#74556。试用注册号 不适用。我们的结果表明需要进一步研究该领域的技术进步。研究资金/竞争利益 Embryonics Ltd 是一家以色列公司。该研究的部分资金由以色列创新局提供,赠款#74556。试用注册号 不适用。
更新日期:2022-08-09
down
wechat
bug