On the Utility of Parents' Historical Data to Investigate the Causes of Autism Spectrum Disorder: A Data Mining-Based Framework,IRBM

当前位置： X-MOL 学术 › Irbm › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

On the Utility of Parents' Historical Data to Investigate the Causes of Autism Spectrum Disorder: A Data Mining-Based Framework
IRBM ( IF 4.8 ) Pub Date : 2023-04-07 , DOI: 10.1016/j.irbm.2023.100780
Zahid Halim , Gohar Khan , Babar Shah , Rabia Naseer , Sajid Anwar , Ahsan Shah

Objective

Autism Spectrum Disorder (ASD) is acknowledged as a challenge that influences the learning ability of adolescents and also negatively impacts their families. Autism may be caused due to environmental exposure or genetically inherited disorder, however, no definitive or universally customary reasons are known. This makes the issue fairly challenging.

Material and methods

This work focuses on identifying the reasons of ASD utilizing computational methods. For this, data is collected that focuses on parental history for finding the trigged features by reviewing antenatal, perinatal, and infant hazard factors of ASD. Afterwards, ML techniques are applied on the collected instances to develop a predictive models and identify the reasons to ASD. While collecting the data, samples are obtained for ASD and non-ASD individuals both. A total of 115 features are obtained from each subject. The collected dataset has 47% samples of the subjects with ASD. Dimensionality reduction, and four feature selection methods are applied on the data to eliminate noise and least valued features. The data is verified using two clustering techniques, i.e., k-means and k-medoid. To validate the clustering results five clustering validation indices are used. Later, three classifiers, i.e. k-nearest neighbor (k-NN), Support Vector Machine (SVM), and Artificial Neural Network (ANN) are trained to predict cases with ASD. The frequent items mining technique and the descriptive analysis of the clustered data are utilized to identify the factors that may cause ASD.

Results

The proposed framework enables to identify the features that may contribute towards ASD. Whereas, for the classification part, SVM classifier performs better than others do with an average accuracy of 98.34% in predicting the ASD cases.

Conclusion

The results identified stress as the dominant feature and environmental factors, like frequent use of canned food and plastic/steel bottles during fertilization period that may contribute towards ASD.

中文翻译：

利用父母的历史数据调查自闭症谱系障碍的原因：一个基于数据挖掘的框架

客观的

自闭症谱系障碍 (ASD) 被认为是一种挑战，它会影响青少年的学习能力，也会对其家庭产生负面影响。自闭症可能是由于环境暴露或遗传性疾病引起的，但是，尚无明确或普遍习惯的原因。这使得这个问题相当具有挑战性。

材料与方法

这项工作的重点是利用计算方法确定 ASD 的原因。为此，收集的数据侧重于父母的历史，通过审查 ASD 的产前、围产期和婴儿危险因素来寻找触发特征。之后，将 ML 技术应用于收集的实例以开发预测模型并确定 ASD 的原因。在收集数据时，同时获取 ASD 和非 ASD 个体的样本。每个主题总共获得 115 个特征。收集的数据集有 47% 的 ASD 受试者样本。对数据应用降维和四种特征选择方法来消除噪声和最小值特征。使用两种聚类技术验证数据，即k -means 和k-medoid。为了验证聚类结果，使用了五个聚类验证指标。随后，训练了三个分类器，即k最近邻 ( k -NN)、支持向量机 (SVM) 和人工神经网络 (ANN) 来预测 ASD 病例。利用频繁项挖掘技术和聚类数据的描述性分析来识别可能导致 ASD 的因素。