当前位置: X-MOL 学术Pattern Recogn. Lett. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Intelligent medical heterogeneous big data set balanced clustering using deep learning
Pattern Recognition Letters ( IF 3.9 ) Pub Date : 2020-09-02 , DOI: 10.1016/j.patrec.2020.08.027
Xiaofeng Li , Hongshuang Jiao , Dong Li

In order to address the clustering problem of intelligent medical data, the data sets were not preprocessed using the traditional method, leading to a large amount of calculation, low efficiency, and large data cluster center offset distance. We proposed a balanced clustering algorithm for intelligent medical heterogeneous big data set using deep learning. Firstly, a deep neural network model based on incremental updating was constructed, and adaptive training and adjustment were made according to data scale, and the multi-layer feature learning of heterogeneous big data sets of intelligent medical care. Secondly, under-sampling preprocessing was carried out on the data set so that the data of the heterogeneous big data set was in a balanced state, and on this basis, clustering calculation of the heterogeneous big data was conducted. Then, the clustering center was set according to the kernel density estimation results, and the data cluster center was updated iteratively until convergence by combining the data features obtained from deep learning and euclidean distance calculation, so as to complete the balanced clustering of the heterogeneous big data set of intelligent medical treatment. The results show that the proposed algorithm has the advantages of small data cluster center offset distance, short clustering time, low energy consumption, high Macro-F1 value and NMI value, and the accuracy of clustering can be as high as 95%, the calculational cost is low, which has certain advantages.

2020 Elsevier Ltd. All rights reserved.



中文翻译:

使用深度学习的智能医疗异构大数据集平衡聚类

为了解决智能医疗数据的聚类问题,没有使用传统方法对数据集进行预处理,从而导致计算量大,效率低和数据聚类中心偏移距离大。我们提出了一种使用深度学习的智能医疗异构大数据集平衡聚类算法。首先,构建了基于增量更新的深度神经网络模型,并根据数据规模进行了自适应训练和调整,并实现了智能医疗异构大数据集的多层特征学习。其次,对数据集进行欠采样预处理,以使异构大数据集的数据处于平衡状态,并在此基础上对异构大数据进行聚类计算。然后,根据核密度估计结果设置聚类中心,并结合深度学习和欧氏距离计算的数据特征,迭代更新数据聚类中心直至收敛,从而完成异构大数据集的平衡聚类。智能医疗。结果表明,该算法具有数据簇中心偏移距离小,聚类时间短,能耗低,Macro-F1值和NMI值高的优点,聚类精度高达95%,计算结果表明成本低,具有一定的优势。结合深度学习和欧氏距离计算的数据特征,迭代更新数据聚类中心直至收敛,从而完成智能医疗异构大数据集的平衡聚类。结果表明,该算法具有数据簇中心偏移距离小,聚类时间短,能耗低,Macro-F1值和NMI值高的优点,聚类精度高达95%,计算结果表明成本低,具有一定的优势。结合深度学习和欧氏距离计算的数据特征,迭代更新数据聚类中心直至收敛,从而完成智能医疗异构大数据集的平衡聚类。结果表明,该算法具有数据簇中心偏移距离小,聚类时间短,能耗低,Macro-F1值和NMI值高的优点,聚类精度高达95%,计算结果表明成本低,具有一定的优势。

2020 Elsevier Ltd.保留所有权利。

更新日期:2020-09-09
down
wechat
bug