当前位置: X-MOL 学术BDJ Open › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Generative artificial intelligence: synthetic datasets in dentistry
BDJ Open Pub Date : 2024-03-01 , DOI: 10.1038/s41405-024-00198-4
Fahad Umer , Niha Adnan

Introduction

Artificial Intelligence (AI) algorithms, particularly Deep Learning (DL) models are known to be data intensive. This has increased the demand for digital data in all domains of healthcare, including dentistry. The main hindrance in the progress of AI is access to diverse datasets which train DL models ensuring optimal performance, comparable to subject experts. However, administration of these traditionally acquired datasets is challenging due to privacy regulations and the extensive manual annotation required by subject experts. Biases such as ethical, socioeconomic and class imbalances are also incorporated during the curation of these datasets, limiting their overall generalizability. These challenges prevent their accrual at a larger scale for training DL models.

Methods

Generative AI techniques can be useful in the production of Synthetic Datasets (SDs) that can overcome issues affecting traditionally acquired datasets. Variational autoencoders, generative adversarial networks and diffusion models have been used to generate SDs. The following text is a review of these generative AI techniques and their operations. It discusses the chances of SDs and challenges with potential solutions which will improve the understanding of healthcare professionals working in AI research.

Conclusion

Synthetic data customized to the need of researchers can be produced to train robust AI models. These models, having been trained on such a diverse dataset will be applicable for dissemination across countries. However, there is a need for the limitations associated with SDs to be better understood, and attempts made to overcome those concerns prior to their widespread use.



中文翻译:

生成人工智能:牙科合成数据集

介绍

众所周知,人工智能 (AI) 算法,特别是深度学习 (DL) 模型是数据密集型的。这增加了包括牙科在内的所有医疗保健领域对数字数据的需求。人工智能进步的主要障碍是访问不同的数据集,这些数据集训练深度学习模型,确保与学科专家相媲美的最佳性能。然而,由于隐私法规和学科专家需要大量的手动注释,管理这些传统获取的数据集具有挑战性。在这些数据集的管理过程中,道德、社会经济和阶级失衡等偏见也被纳入其中,限制了它们的整体普遍性。这些挑战阻碍了它们在训练深度学习模型方面的更大规模的积累。

方法

生成式人工智能技术可用于生成合成数据集 (SD),从而克服影响传统获取数据集的问题。变分自动编码器、生成对抗网络和扩散模型已被用来生成 SD。以下文字是对这些生成式人工智能技术及其操作的回顾。它讨论了 SD 的机会和潜在解决方案的挑战,这将提高从事人工智能研究的医疗保健专业人员的理解。

结论

可以根据研究人员的需求定制合成数据来训练强大的人工智能模型。这些模型经过如此多样化的数据集的训练,将适用于跨国家传播。然而,需要更好地理解与 SD 相关的局限性,并在广泛使用之前尝试克服这些问题。

更新日期:2024-03-04
down
wechat
bug