当前位置: X-MOL 学术arXiv.cs.PF › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Understanding Model Drift in a Large Cellular Network
arXiv - CS - Performance Pub Date : 2021-09-07 , DOI: arxiv-2109.03011
Shinan Liu, Francesco Bronzino, Paul Schmitt, Nick Feamster, Ricardo Borges, Hector Garcia Crespo, Brian Ward

Operational networks are increasingly using machine learning models for a variety of tasks, including detecting anomalies, inferring application performance, and forecasting demand. Accurate models are important, yet accuracy can degrade over time due to concept drift, whereby either the characteristics of the data change over time (data drift) or the relationship between the features and the target predictor change over time (model drift). Drift is important to detect because changes in properties of the underlying data or relationships to the target prediction can require model retraining, which can be time-consuming and expensive. Concept drift occurs in operational networks for a variety of reasons, ranging from software upgrades to seasonality to changes in user behavior. Yet, despite the prevalence of drift in networks, its extent and effects on prediction accuracy have not been extensively studied. This paper presents an initial exploration into concept drift in a large cellular network in the United States for a major metropolitan area in the context of demand forecasting. We find that concept drift arises largely due to data drift, and it appears across different key performance indicators (KPIs), models, training set sizes, and time intervals. We identify the sources of concept drift for the particular problem of forecasting downlink volume. Weekly and seasonal patterns introduce both high and low-frequency model drift, while disasters and upgrades result in sudden drift due to exogenous shocks. Regions with high population density, lower traffic volumes, and higher speeds also tend to correlate with more concept drift. The features that contribute most significantly to concept drift are User Equipment (UE) downlink packets, UE uplink packets, and Real-time Transport Protocol (RTP) total received packets.

中文翻译:

了解大型蜂窝网络中的模型漂移

运营网络越来越多地将机器学习模型用于各种任务,包括检测异常、推断应用程序性能和预测需求。准确的模型很重要,但由于概念漂移,准确性会随着时间的推移而降低,由此数据的特征随时间变化(数据漂移)或特征与目标预测器之间的关系随时间变化(模型漂移)。漂移对于检测很重要,因为基础数据的属性或与目标预测的关系的变化可能需要模型重新训练,这可能既耗时又昂贵。出于多种原因,运营网络中会出现概念漂移,从软件升级到季节性,再到用户行为的变化。然而,尽管网络中普遍存在漂移,它的范围和对预测准确性的影响尚未得到广泛研究。本文在需求预测的背景下,初步探索了美国一个主要大都市区的大型蜂窝网络中的概念漂移。我们发现概念漂移主要是由数据漂移引起的,它出现在不同的关键性能指标 (KPI)、模型、训练集大小和时间间隔中。我们为预测下行链路量的特定问题确定了概念漂移的来源。每周和季节性模式会引入高频和低频模型漂移,而灾难和升级会由于外生冲击导致突然漂移。人口密度高、交通量低、速度快的地区也往往与更多的概念漂移相关。
更新日期:2021-09-08
down
wechat
bug