论文标题

使用贝叶斯分层联合模型来分析电视观看行为,以获取事件的时间并计算数据

Profiling Television Watching Behaviour Using Bayesian Hierarchical Joint Models for Time-to-Event and Count Data

论文作者

Moral, Rafael A., Chen, Zhi, Zhang, Shuai, McClean, Sally, Palma, Gabriel R., Allan, Brahim, Kegel, Ian

论文摘要

在许多行业中,客户流失预测是一项有价值的任务。在电信中,鉴于数据的高维度以及确定潜在的挫败感签名是多么困难,这可能代表了关于未来流失行为的重要驱动因素。在这里,我们提出了一种新颖的贝叶斯分层联合模型,该模型能够根据不同电视观看旅程中发生的事件以及事件之间需要多长时间来表征客户资料。该模型大幅度地将数据的维度从每个客户的数千个观察值降低到11个客户级参数估计和随机效果。我们使用来自40个BT客户(有20名活跃和20名最终取消订阅的20人)的数据来测试我们的方法,他们的电视观看行为是从2019年10月到2019年12月记录的,总计约为半百万。使用不同的机器学习技术,使用参数估计值和贝叶斯分层模型的随机效应作为特征,可预测高达92 \%的准确性,与100 \%的真实正率和误报率相关,在验证集中低至14 \%。我们提出的方法是降低数据维度的有效方法,同时保持了高描述性和预测能力。我们提供代码以在https://github.com/rafamoral/profiling_tv_watching_behaviour上实现贝叶斯模型。

Customer churn prediction is a valuable task in many industries. In telecommunications it presents great challenges, given the high dimensionality of the data, and how difficult it is to identify underlying frustration signatures, which may represent an important driver regarding future churn behaviour. Here, we propose a novel Bayesian hierarchical joint model that is able to characterise customer profiles based on how many events take place within different television watching journeys, and how long it takes between events. The model drastically reduces the dimensionality of the data from thousands of observations per customer to 11 customer-level parameter estimates and random effects. We test our methodology using data from 40 BT customers (20 active and 20 who eventually cancelled their subscription) whose TV watching behaviours were recorded from October to December 2019, totalling approximately half a million observations. Employing different machine learning techniques using the parameter estimates and random effects from the Bayesian hierarchical model as features yielded up to 92\% accuracy predicting churn, associated with 100\% true positive rates and false positive rates as low as 14\% on a validation set. Our proposed methodology represents an efficient way of reducing the dimensionality of the data, while at the same time maintaining high descriptive and predictive capabilities. We provide code to implement the Bayesian model at https://github.com/rafamoral/profiling_tv_watching_behaviour.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源