当前位置: X-MOL 学术Eng. Appl. Artif. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
DLCSS: A new similarity measure for time series data mining
Engineering Applications of Artificial Intelligence ( IF 7.5 ) Pub Date : 2020-04-27 , DOI: 10.1016/j.engappai.2020.103664
Gholamreza Soleimani , Masoud Abessi

The Longest Common Subsequence (LCSS) is considered as a classic problem in computer science. In most studies related to time series data mining, LCSS had been mentioned as the best and the most usable similarity measurement method. The results of time series data mining under LCSS strongly depend on the similarity threshold, because the similarity measurement approach in LCSS is a zero–one approach. Since there is no knowledge about the data, and it is very difficult to determine the right amount of similarity threshold, using LCSS can actually lead to poor results. In this research, a new similarity measurement method named Developed Longest Common Subsequence (DLCSS) has been suggested for time series data mining based on LCSS. In DLCSS, by defining two similarity thresholds and determining their values, LCSS’ shortcoming was eliminated. The performance of DLCSS was compared with performance of LCSS and Dynamic Time Warping (DTW) using 1-Nearest neighbor and k-medoids clustering techniques. This evaluation was carried out on 63 time series datasets of UCR collection. Using these results, it could be claimed that the 1-NN accuracy and clustering accuracy under DLCSS is better than that of under LCSS and DTW with at least 99.5% and 99% confidence, respectively. Also, DLCSS has better effect in correctly predicting the number of clusters compared to LCSS and DTW. In addition, the effect of DLCSS in determining the better cluster representatives is greater than that of under LCSS and DTW with at least 99.95% confidence.



中文翻译:

DLCSS:时间序列数据挖掘的一种新的相似性度量

最长公共子序列(LCSS)被认为是计算机科学中的经典问题。在大多数与时间序列数据挖掘有关的研究中,LCSS被认为是最好和最可用的相似性度量方法。LCSS下时间序列数据挖掘的结果在很大程度上取决于相似性阈值,因为LCSS中的相似性度量方法是一种零一方法。由于没有数据知识,很难确定合适的相似度阈值,因此使用LCSS实际上会导致较差的结果。在这项研究中,已经提出了一种新的相似性度量方法,称为发达最长公共子序列(DLCSS),用于基于LCSS的时间序列数据挖掘。在DLCSS中,通过定义两个相似性阈值并确定它们的值,消除了LCSS的缺点。使用1-Nearest邻居和k-medoids聚类技术将DLCSS的性能与LCSS和动态时间规整(DTW)的性能进行了比较。此评估是在UCR收集的63个时间序列数据集上进行的。使用这些结果,可以断言,DLCSS下的1-NN精度和聚类精度分别优于LCSS和DTW下的1-NN精度和至少99%的置信度和99%的置信度。此外,与LCSS和DTW相比,DLCSS在正确预测群集数方面具有更好的效果。此外,在至少99.95%的置信度下,DLCSS在确定更好的群集代表方面的效果要大于LCSS和DTW下的效果。此评估是在UCR收集的63个时间序列数据集上进行的。使用这些结果,可以断言,DLCSS下的1-NN精度和聚类精度分别优于LCSS和DTW下的1-NN精度和至少99%的置信度和99%的置信度。此外,与LCSS和DTW相比,DLCSS在正确预测群集数方面具有更好的效果。此外,在至少99.95%的置信度下,DLCSS在确定更好的群集代表方面的效果要大于LCSS和DTW下的效果。此评估是在UCR收集的63个时间序列数据集上进行的。使用这些结果,可以断言,DLCSS下的1-NN精度和聚类精度分别优于LCSS和DTW下的1-NN精度和至少99%的置信度和99%的置信度。而且,与LCSS和DTW相比,DLCSS在正确预测群集数方面具有更好的效果。此外,在至少99.95%的置信度下,DLCSS在确定更好的群集代表方面的效果要大于LCSS和DTW下的效果。与LCSS和DTW相比,DLCSS在正确预测群集数方面具有更好的效果。此外,在至少99.95%的置信度下,DLCSS在确定更好的群集代表方面的效果要大于LCSS和DTW下的效果。与LCSS和DTW相比,DLCSS在正确预测群集数方面具有更好的效果。此外,在至少99.95%的置信度下,DLCSS在确定更好的群集代表方面的效果要大于LCSS和DTW下的效果。

更新日期:2020-04-27
down
wechat
bug