Factor analysis of mixed data for anomaly detection,Statistical Analysis and Data Mining

当前位置： X-MOL 学术 › Stat. Anal. Data Min. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Factor analysis of mixed data for anomaly detection
Statistical Analysis and Data Mining ( IF 2.1 ) Pub Date : 2022-05-02 , DOI: 10.1002/sam.11585
Matthew Davidow ₁ , David S. Matteson ₁

Affiliation

Anomaly detection aims to identify observations that deviate from the typical pattern of data. Anomalous observations may correspond to financial fraud, health risks, or incorrectly measured data in practice. We focus on unsupervised detection and the continuous and categorical (mixed) variable case. We show that detecting anomalies in mixed data is enhanced through first embedding the data then assessing an anomaly scoring scheme. We propose a kurtosis-weighted Factor Analysis of Mixed Data for anomaly detection to obtain a continuous embedding for anomaly scoring. We illustrate that anomalies are highly separable in the first and last few ordered dimensions of this space, and test various anomaly scoring experiments within this subspace. Results are illustrated for both simulated and real datasets, and the proposed approach is highly accurate for mixed data throughout these diverse scenarios.

中文翻译：

用于异常检测的混合数据的因子分析

异常检测旨在识别偏离典型数据模式的观察结果。异常观察可能对应于财务欺诈、健康风险或实践中错误测量的数据。我们专注于无监督检测以及连续和分类（混合）变量案例。我们表明，通过首先嵌入数据然后评估异常评分方案来增强检测混合数据中的异常。我们提出了混合数据的峰度加权因子分析用于异常检测以获得用于异常评分的连续嵌入。我们说明异常在该空间的第一个和最后几个有序维度中是高度可分离的，并在该子空间内测试了各种异常评分实验。模拟和真实数据集的结果都进行了说明，并且所提出的方法对于这些不同场景中的混合数据非常准确。

更新日期：2022-05-02

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11