Journal of Parallel and Distributed Computing ( IF 3.4 ) Pub Date : 2021-03-17 , DOI: 10.1016/j.jpdc.2021.02.025 Jia Zhao , Yan Ding , Yunan Zhai , Yuqiang Jiang , Yujuan Zhai , Ming Hu
Proactive fault management is an important problem in many areas of data management, including cloud computing, big data, vision, machine learning and especially for the cross-domain research of distributed computing and AI (Artificial Intelligence). Unfortunately, most real-world online failure prediction is facing the problem that the used data are difficult to label although the failure prediction should be a supervised learning problem. We observe that, in many cases, the large-scale unlabeled data can be classified through feature extraction and clustering for available prediction, and thus ideas from their combination can be brought to bear. Based on this, we have proposed an online failure prediction framework approach UDFP (Unlabeled Data based online Failure Prediction). It introduces the clustering analysis method based on the combination of the KNN (k-nearest neighbor) and the modularity idea to achieve prediction modeling. It is shown analytically that UDFP can mitigate a supervised learning problem for failure prediction in our situation to some extent. Experimental results demonstrate that UDFP, as a framework approach, has avoided the manual tagging workload and the huge difficulties, improved the predictive accuracy, and reduced cost of data management in safety-aware distributed cloud data centers while enhancing fault-tolerant capabilities and robustness.
中文翻译:
探索无标签的大数据学习技术,以在具有安全意识的云环境中进行在线故障预测
主动故障管理是数据管理许多领域的重要问题,包括云计算,大数据,视觉,机器学习,尤其是对于分布式计算和AI(人工智能)的跨域研究。不幸的是,尽管故障预测应该是有监督的学习问题,但是大多数现实世界中的在线故障预测都面临着这样的问题,即所使用的数据难以标注。我们观察到,在许多情况下,可以通过特征提取和聚类对大规模的未标记数据进行分类,以进行可用的预测,因此可以利用它们的组合思想。基于此,我们提出了一种在线故障预测框架方法UDFP(基于无标签数据的在线故障预测)。介绍了基于KNN(k近邻)和模块化思想相结合的聚类分析方法,以实现预测建模。分析表明,在我们的情况下,UDFP可以减轻针对故障预测的监督学习问题。实验结果表明,UDFP作为一种框架方法,避免了手动标记工作量和巨大的困难,提高了安全性感知的分布式云数据中心的预测准确性,并降低了数据管理成本,同时增强了容错能力和鲁棒性。