当前位置: X-MOL 学术J. R. Stat. Soc. Ser. C Appl. Stat. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Clustering and automatic labelling within time series of categorical observations—with an application to marine log messages
The Journal of the Royal Statistical Society: Series C (Applied Statistics) ( IF 1.0 ) Pub Date : 2021-05-03 , DOI: 10.1111/rssc.12483
Emanuele Gramuglia 1 , Geir Storvik 1 , Morten Stakkeland 2
Affiliation  

System logs or log files containing textual messages with associated time stamps are generated by many technologies and systems. The clustering technique proposed in this paper provides a tool to discover and identify patterns or macrolevel events in this data. The motivating application is logs generated by frequency converters in the propulsion system on a ship, while the general setting is fault identification and classification in complex industrial systems. The paper introduces an offline approach for dividing a time series of log messages into a series of discrete segments of random lengths. These segments are clustered into a limited set of states. A state is assumed to correspond to a specific operation or condition of the system, and can be a fault mode or a normal operation. Each of the states can be associated with a specific, limited set of messages, where messages appear in a random or semi-structured order within the segments. These structures are in general not defined a priori. We propose a Bayesian hierarchical model where the states are characterised both by the temporal frequency and the type of messages within each segment. An algorithm for inference based on reversible jump MCMC is proposed. The performance of the method is assessed by both simulations and operational data.

中文翻译:

在分类观测的时间序列内聚类和自动标记——适用于海洋日志消息

许多技术和系统会生成系统日志或包含带有相关时间戳的文本消息的日志文件。本文提出的聚类技术提供了一种工具来发现和识别这些数据中的模式或宏观事件。激励应用是船舶推进系统中变频器生成的日志,而一般设置是复杂工业系统中的故障识别和分类。该论文介绍了一种离线方法,用于将日志消息的时间序列划分为一系列随机长度的离散段。这些段聚集成一组有限的状态。假设状态对应于系统的特定操作或条件,并且可以是故障模式或正常操作。每个状态都可以与特定的、一组有限的消息,其中消息以随机或半结构化的顺序出现在段内。这些结构通常不是先验定义的。我们提出了一个贝叶斯分层模型,其中状态的特征在于每个段内的时间频率和消息类型。提出了一种基于可逆跳跃MCMC的推理算法。该方法的性能通过模拟和操作数据进行评估。
更新日期:2021-06-05
down
wechat
bug