当前位置: X-MOL 学术IEEE Access › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Distributed Storage and Computation k-Nearest Neighbor Algorithm Based Cloud-Edge Computing for Cyber-Physical-Social Systems
IEEE Access ( IF 3.4 ) Pub Date : 2020-01-01 , DOI: 10.1109/access.2020.2974764
Wei Zhang , Xiaohui Chen , Yueqi Liu , Qian Xi

The k-nearest neighbor (kNN) algorithm is a classic supervised machine learning algorithm. It is widely used in cyber-physical-social systems (CPSS) to analyze and mine data. However, in practical CPSS applications, the standard linear kNN algorithm struggles to efficiently process massive data sets. This paper proposes a distributed storage and computation k-nearest neighbor (D-kNN) algorithm. The D-kNN algorithm has the following advantages: First, the concept of k-nearest neighbor boundaries is proposed and the k-nearest neighbor search within the k-nearest neighbors boundaries can effectively reduce the time complexity of kNN. Second, based on the k-neighbor boundary, massive data sets beyond the main storage space are stored on distributed storage nodes. Third, the algorithm performs k-nearest neighbor searching efficiently by performing distributed calculations at each storage node. Finally, a series of experiments were performed to verify the effectiveness of the D-kNN algorithm. The experimental results show that the D-kNN algorithm based on distributed storage and calculation effectively improves the operation efficiency of k-nearest neighbor search. The algorithm can be easily and flexibly deployed in a cloud-edge computing environment to process massive data sets in CPSS.

中文翻译:

基于分布式存储和计算k-最近邻算法的网络-物理-社会系统的云边缘计算

k-最近邻(kNN)算法是一种经典的监督机器学习算法。它广泛用于网络物理社会系统 (CPSS) 来分析和挖掘数据。然而,在实际的 CPSS 应用中,标准的线性 kNN 算法难以有效处理海量数据集。本文提出了一种分布式存储计算k-最近邻(D-kNN)算法。D-kNN算法具有以下优点:首先,提出了k-最近邻边界的概念,在k-最近邻边界内进行k-最近邻搜索,可以有效降低kNN的时间复杂度。其次,基于k-neighbor边界,超出主存储空间的海量数据集存储在分布式存储节点上。第三,该算法通过在每个存储节点上执行分布式计算来有效地执行 k 最近邻搜索。最后,进行了一系列实验来验证D-kNN算法的有效性。实验结果表明,基于分布式存储和计算的D-kNN算法有效提高了k-最近邻搜索的运行效率。该算法可以轻松灵活地部署在云边缘计算环境中,处理CPSS中的海量数据集。
更新日期:2020-01-01
down
wechat
bug