当前位置: X-MOL 学术Pattern Anal. Applic. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
DBSCAN -like clustering method for various data densities
Pattern Analysis and Applications ( IF 3.9 ) Pub Date : 2019-04-05 , DOI: 10.1007/s10044-019-00809-z
Rudolf Scitovski , Kristian Sabo

In this paper, we propose a modification of the well-known DBSCAN algorithm, which recognizes clusters with various data densities in a given set of data points \({\mathcal {A}}=\{a^i\in {\mathbb {R}}^n:i=1,\dots ,m\}\). First, we define the parameter \(MinPts=\lfloor \ln |{\mathcal {A}}|\rfloor\) and after that, by using a standard procedure from DBSCAN algorithm, for each \(a\in {\mathcal {A}}\) we determine radius \(\epsilon _a\) of the circle containing MinPts elements from the set \({\mathcal {A}}\). We group the set of all these radii into the most appropriate number (t) of clusters by using Least Squares distance-like function applying SymDIRECT or SepDIRECT algorithm. In that way, we obtain parameters \(\epsilon _1>\dots >\epsilon _t\). Furthermore, for parameters \(\{MinPts,\epsilon _1\}\) we construct a partition starting with one cluster and then add new clusters for as long as the isolated groups of at least MinPts data points in some circle with radius \(\epsilon _1\) exist. We follow a similar procedure for other parameters \(\epsilon _2,\dots ,\epsilon _t\). After the implementation of the algorithm, a larger number of clusters appear than can be expected in the optimal partition. Along with defined criteria, some of them are merged by applying a merging process for which a detailed algorithm has been written. Compared to the standard DBSCAN algorithm, we show an obvious advantage for the case of data with various densities.

中文翻译:

各种数据密度的类DBSCAN聚类方法

在本文中,我们提出了对著名的DBSCAN算法的修改,该算法可识别给定数据点集中\({\ mathcal {A}} = \ {a ^ i \ in {\ mathbb {R}} ^ n:i = 1,\ dots,m \} \)。首先,我们定义参数\(MinPts = \ lfloor \ ln | {\ mathcal {A}} | \ rfloor \),然后,使用DBSCAN算法的标准过程,为每个\(a \ in {\ mathcal {A}} \)我们从集合\({\ mathcal {A}} \)确定包含MinPts元素的圆的半径\(\ epsilon _a \)。我们将所有这些半径的集合归为最合适的数字(t),使用SymDIRECTSepDIRECT算法使用类最小二乘距离函数。这样,我们获得参数\(\ epsilon _1> \ dots> \ epsilon _t \)。此外,对于参数\(\ {MinPts,\ epsilon _1 \} \),我们构造一个从一个群集开始的分区,然后添加新群集,直到在半径为\()的某个圆中至少MinPts数据点的隔离组为止。 \ epsilon _1 \)存在。对于其他参数\(\ epsilon _2,\ dots,\ epsilon _t \)我们遵循类似的过程。算法实施后,出现的群集数量超过最佳分区中的预期数量。连同定义的标准一起,通过应用已为其编写详细算法的合并过程来合并其中一些标准。与标准DBSCAN算法相比,对于具有各种密度的数据,我们显示出明显的优势。
更新日期:2019-04-05
down
wechat
bug