当前位置: X-MOL 学术Expert Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Special issue on “advances in visual analytics and mining visual data”
Expert Systems ( IF 3.3 ) Pub Date : 2020-06-29 , DOI: 10.1111/exsy.12607
Victor Chang 1 , Shadi A. Aljawarneh 2 , Chung‐Sheng Li 3
Affiliation  

Visual and multimedia analytics provides an emerging field of research combining strengths from information analytics, geospatial analytics, scientific analytics, statistical analytics, knowledge discovery, data management and knowledge representation, presentation, production and dissemination, cognition, perception, and interaction (Chen, Chiang and Storey, 2012). The aim is to gain insight into homogeneous, contradictory, and incomplete data through the combination of automatic analysis methods with human background knowledge and intuition.

While the scope of visual analytics is broad, one principle that has emerged over the years is the need for visual analytics systems to leverage computational methods in data mining, knowledge discovery, and machine learning for large‐scale data analysis. In these systems, the human operator works alongside the computational processes in an integrated fashion. Therefore, computing systems or services can sift through large amounts of data and identify the relevant information, while the human interactively explores the reduced data space to discover trends and patterns and make informed decisions. These two components operate in coordination, allowing for a continuous and cooperative analytical loop (Cybulski et al., 2015; Valdez et al., 2016). Top papers from Data 2018, Madrid, Spain and the best paper from FEMIB 2019 in Crete, Greece, have been invited. Through a robust and competitive review process, six papers have been selected. The summary of their contributions is as follows.

Arhipova et al. (2020) propose a data aggregation approach for mobile phones, which was conducted at 15‐min intervals in the area of each cellular base station. The case study examines all of Latvia's municipalities, analysing the economic activity level in each municipality in comparison to the mobile phone activity in three periods: 2015–2016, 2017, and 2018. The authors concluded that economic activity in municipalities could be estimated, and the positive dynamics of regional development have been detected. Such data and the data analytics method, which provides an understanding of how economic activities evolve in real‐time in particular to locations and economic activity centres, can improve regional development planning and plan implementation. In order to assess which are the centres of economic activity in each municipality and its sphere of influence, the patterns of human commuting and fluctuations of internal activity on workdays and weekends/holidays in 2017–2018 were determined. In general, there is a shortage of reliable data on human commuting within Latvia and its specific regions; therefore, the method described here provides a practical tool for regional governments to keep track of strategy implementation and for strategic gap analysis.

Khamayseh et al. (2019) investigate the issue of friends' management in the Social Internet of Things, and proposes a framework to manage friends' requests. The proposed framework consists of friend selection, friendship removal, and update‐modules. It develops a weight based and Naïve Bayes Classifier based algorithms for the selection component. Moreover, a random service allocation model is proposed to construct a service‐specific network model. This model is then used in the simulation setup to examine the performance of different friends' management algorithms. The performance of the proposed framework is evaluated using simulation under different scenarios. The obtained simulation results show improvement over other strategies in terms of the average degree of connections, average path length, local cluster coefficients, and Throughput.

Lara et al. (2019) propose an outlier detection method based on a clustering process. Their aim is to overcome the specificity of many existing outlier detection techniques that fail to take into account the inherent dispersion of domain objects. The outlier detection method is based on four criteria designed to represent how human beings (experts in each domain) visually identify outliers within a set of objects after analysing the clusters. This has an advantage over other clustering‐based outlier detection techniques that are founded on purely numerical analysis of clusters. To validate the proposed method, they studied method outlier detection and efficiency in terms of runtime. The results of regression analyses confirm that their proposal is useful for detecting outlier data in different domains, with a false positive rate of less than 2% and reliability greater than 99%.

Park et al. (2019) demonstrate an online principal component analysis methodology based on online eigenvector transformation with the moving average of the data stream that can reflect concept drift. They compared the network intrusion detection performance based on an online transformation of eigenvectors with that of offline methods by applying three machine learning algorithms. Both online and offline methods demonstrated excellent performance in terms of precision. However, in terms of the recall ratio, the performance of the proposed methodology with integrated online eigenvector transformation was better; thus, the F1‐measure also indicated better performance. The visualization of the principal component score shows the effectiveness of their method.

Hawashin et al. (2019) propose a new efficient hybrid similarity measure for recommender systems based on a combination of the user interest‐user interest similarity measure and the user interest‐item similarity measure. This hybrid similarity measure improves the existing work in three aspects. First, it improves the current recommender systems by using actual user interests. Second, it provides a comprehensive evaluation of an efficient solution to the cold start problem. Third, this method works well even when no co‐rated items exist between two users. They demonstrate that their proposal is efficient in terms of accuracy, execution time, and applicability. Their proposed similarity measure achieves a mean absolute error (MAE) as low as 0.42, with 64% applicability and execution time as low as 0.03 seconds, while the existing similarity measures from the literature achieve an MAE of 0.88 at their best. These results demonstrate the superiority of their proposed similarity measure in terms of accuracy, as well as having a high applicability percentage and a very short execution time.

Vangipuram et al. (2020) propose (a) a novel imputation technique for imputation of missing data values; (b) a classifier based on feature transformation to perform classification and (c) imputation measure for similarity computation between any two instances that can also be used as the similarity measure. The performance of the proposed classifier is studied by using imputed datasets obtained through applying Kmeans, F‐Kmeans and proposed imputation methods. Experiments are also conducted by applying existing and proposed classifiers on the imputed dataset obtained using the proposed imputation technique. For experiments, authors have used an open‐source dataset named distributed smart space orchestration system publicly available from Kaggle. Experimental results are also validated using the Wilcoxon non‐parametric statistical test. The performance of their proposed approach is better when compared to existing classifiers when the imputation process is performed using F‐Kmeans and K‐Means imputation techniques. It is also observed that accuracies for attack classes scan, malicious operation, denial of service, spying, data type probing, wrong setup are 100% while it is 99% for malicious control attack class when both the proposed imputation and classification technique are applied.

We thank the Expert Systems Office and Editor‐in‐Chief for their support and kind assistance in the completion of this special issue.



中文翻译:

关于“视觉分析和挖掘视觉数据的先进性”的特刊

视觉和多媒体分析提供了一个新兴的研究领域,融合了以下方面的优势:信息分析,地理空间分析,科学分析,统计分析,知识发现,数据管理和知识表示,演示,生产和传播,认知,感知和交互(陈,蒋和楼层,2012年)。目的是通过将自动分析方法与人类背景知识和直觉相结合,以获取对同质,矛盾和不完整数据的洞察力。

尽管视觉分析的范围很广,但是多年来出现的一个原则是,视觉分析系统需要利用数据挖掘,知识发现和机器学习中的计算方法进行大规模数据分析。在这些系统中,操作员以集成方式与计算过程一起工作。因此,计算系统或服务可以筛选大量数据并识别相关信息,而人类可以交互地探索减少的数据空间以发现趋势和模式并做出明智的决策。这两个组成部分协同运作,从而实现了连续而协作的分析循环(Cybulski等人,2015 ; Valdez等人,2016)。邀请了西班牙马德里Data 2018的顶级论文和希腊克里特岛的FEMIB 2019最佳论文。通过强有力的竞争性评审过程,已选出六篇论文。其贡献概述如下。

Arhipova等。(2020年)提出了一种用于手机的数据聚合方法,该方法在每个蜂窝基站的区域中以15分钟为间隔进行。该案例研究考察了拉脱维亚的所有直辖市,并与三个时期(2015-2016年,2017年和2018年)的移动电话活动进行了比较,分析了每个直辖市的经济活动水平。作者得出结论,可以估算直辖市的经济活动,以及已经发现区域发展的积极动力。此类数据和数据分析方法可以了解经济活动如何实时发展,尤其是到地点和经济活动中心的实时发展,可以改善区域发展规划和计划实施。为了评估每个城市的经济活动中心及其影响范围,确定了2017-2018年工作日和周末/节假日的人类通勤模式和内部活动的波动。总的来说,在拉脱维亚及其特定区域内,缺乏有关人类通勤的可靠数据;因此,这里描述的方法为区域政府跟踪战略实施和战略差距分析提供了一种实用工具。

Khamayseh等。(2019)调查了社交物联网中朋友管理的问题,并提出了管理朋友请求的框架。提议的框架包括选择朋友,删除友谊和更新模块。它为选择组件开发了基于权重和基于朴素贝叶斯分类器的算法。此外,提出了一种随机服务分配模型来构建特定于服务的网络模型。然后将此模型用于仿真设置中,以检查不同朋友的管理算法的性能。在不同情况下使用仿真评估了所提出框架的性能。获得的仿真结果表明,在平均连接程度,平均路径长度,局部簇系数和吞吐量方面,与其他策略相比有所改进。

拉拉等。(2019)提出了一种基于聚类过程的离群值检测方法。他们的目标是克服许多现有的离群检测技术的特殊性,这些技术无法考虑域对象的固有分散性。离群值检测方法基于四个标准,这些标准旨在表示人类(每个领域的专家)在分析聚类后如何直观地识别一组对象内的离群值。与基于聚类的纯数值分析的其他基于聚类的离群值检测技术相比,这具有优势。为了验证所提出的方法,他们研究了运行时间方面的方法异常检测和效率。回归分析的结果证实,他们的建议可用于检测不同域中的异常数据,

Park等。(2019)展示了一种基于在线特征向量变换的在线主成分分析方法,其数据流的移动平均值可以反映概念漂移。他们通过应用三种机器学习算法,将基于特征向量在线转换的网络入侵检测性能与离线方法进行了比较。在线和离线方法在精度方面均表现出出色的性能。然而,就召回率而言,所提出的方法具有集成的在线特征向量变换的性能更好;因此,F1措施也表明性能更好。主成分评分的可视化显示了他们方法的有效性。

Hawashin等。(2019)基于用户兴趣-用户兴趣相似性度量和用户兴趣-项目相似性度量的组合,为推荐系统提出了一种新的有效混合相似性度量。这种混合相似性度量从三个方面改进了现有工作。首先,它通过使用实际的用户兴趣来改进当前的推荐器系统。其次,它提供了对冷启动问题的有效解决方案的全面评估。第三,即使两个用户之间不存在共同评分项目,该方法也能很好地工作。他们证明了他们的建议在准确性,执行时间和适用性方面都是有效的。他们提出的相似性度量方法可将平均绝对误差(MAE)降低至0.42,适用性为64%,执行时间仅为0.03秒,而现有文献中的相似性度量方法则将其最佳MAE达到0.88。这些结果证明了他们提出的相似性度量在准确性方面的优越性,并且具有很高的适用性百分比和非常短的执行时间。

Vangipuram等。(2020年)提出(a)一种用于估算缺失数据值的新颖估算技术;(b)基于特征变换的分类器以执行分类;以及(c)用于在任意两个实例之间进行相似度计算的归因测度,也可以用作相似度测度。通过使用应用Kmeans,F-Kmeans和拟议的插补方法获得的估算数据集来研究拟议分类器的性能。通过将现有分类器和拟议分类器应用于使用拟议插补技术获得的估算数据集,也可以进行实验。对于实验,作者使用了可从Kaggle公开获得的名为分布式智能空间编排系统的开源数据集。实验结果也使用Wilcoxon非参数统计检验进行了验证。当使用F-Kmeans和K-Means插补技术执行插补过程时,与现有分类器相比,他们提出的方法的性能更好。还观察到,当同时使用提议的归因和分类技术时,攻击类别扫描,恶意操作,拒绝服务,间谍,数据类型探测,错误设置的准确性为100%,而恶意控制攻击类别的准确性为99%。

我们感谢专家系统办公室和总编辑在完成本期特刊中所提供的支持和帮助。

更新日期:2020-06-29
down
wechat
bug