当前位置: X-MOL 学术J. Big Data › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Leveraging fine-grained mobile data for churn detection through Essence Random Forest
Journal of Big Data ( IF 8.6 ) Pub Date : 2021-04-29 , DOI: 10.1186/s40537-021-00451-9
Christian Colot , Philippe Baecke , Isabelle Linden

The rise of unstructured data leads to unprecedented opportunities for marketing applications along with new methodological challenges to leverage such data. In particular, redundancy among the features extracted from this data deserves special attention as it might prevent current methods to benefit from it. In this study, we propose to investigate the value of multiple fine-grained data sources i.e. websurfing, use of applications and geospatial mobility for churn detection within telephone companies. This value is analysed both in substitution and in complement to the value of the well-known communication network. What is more, we also suggest an adaptation of the Random Forest algorithm called Essence Random Forest designed to better address redundancy among extracted features. Analysing fine-grained data of a telephone company, we first find that geo-spatial mobility data might be a good long term alternative to the classical communication network that might become obsolete due to the competition with digital communications. Then, we show that, on the short term, these alternative fine-grained data might complement the communication network for an improved churn detection. In addition, compared to Random Forest and Extremely Randomized Trees, Essence Random Forest better leverages the value of unstructured data by offering an enhanced churn detection regardless of the addressed perspective i.e. substitution or complement. Finally, Essence Random Forest converges faster to stable results which is a salient property in a resource constrained environment.



中文翻译:

利用细粒度的移动数据通过本质随机森林进行流失检测

非结构化数据的兴起为营销应用带来了前所未有的机遇,同时也为利用此类数据带来了新的方法挑战。特别是,从此数据中提取的特征之间的冗余值得特别注意,因为这可能会阻止当前的方法从中受益。在这项研究中,我们建议调查多个细粒度数据源的价值,例如网络冲浪,应用程序的使用和地理空间移动性,以检测电话公司中的客户流失。分析该值以替代和补充公知通信网络的值。此外,我们还建议对称为“本质随机森林”的随机森林算法进行改编,以更好地解决提取的特征之间的冗余问题。分析电话公司的细粒度数据,我们首先发现,地理空间移动性数据可能是传统通信网络的良好长期替代方案,而传统通信网络由于与数字通信的竞争而变得过时。然后,我们表明,从短期来看,这些替代的细粒度数据可能会补充通信网络,以改善流失检测。此外,与随机森林和极度随机树相比,本质随机森林通过提供增强的流失检测能力而更好地利用了非结构化数据的价值,而无论所解决的观点是替代还是互补。最后,本质随机森林更快地收敛到稳定的结果,这是在资源受限的环境中的一个显着特性。

更新日期:2021-04-29
down
wechat
bug