Are You Failing to Identify Clumpiness Signals?
Netflix binge-watching became a thing around 2013 when the company started releasing full seasons of its original series all at once, allowing viewers to watch multiple episodes or entire seasons in a single sitting. The first major Netflix original series to be released this way was “House of Cards,” which premiered on February 1, 2013. This approach to content release, combined with the growing library of TV shows and movies on the platform, contributed to the popularity of binge-watching on Netflix and other streaming services.
Binge behavior is measured using “clumpiness” metrics, and these metrics have been applied to ﬁnancial market microstructure, criminology, seismology, and digital media consumption. Clumpiness metrics are calculated from inter-event times (IETs). The more variable the IETs, the greater the clumpiness. These metrics can be powerful, and published research has demonstrated that clumpiness “adds to the predictive power, above and beyond RFM and ﬁrm marketing action, of both the churn, incidence, and monetary value parts” of customer lifetime value.
For successful feature engineering of clumpiness signals, follow these tips:
- Use a metric that standardizes for the number of events e.g. use the coefficient of variation of IETs rather than the variance
- Research suggests that the most effective clumpiness metric is the entropy of IETs
- Choose a time window that is long enough to capture at least two cycles of binge behavior
- Calculating IETs can be difficult and computationally intensive, so use a tool that has optimized this transformation (more on this below)
Here at FeatureByte, we’ve built an open-source feature engineering library that makes it easy to create and serve clumpiness features. Click here to access our open-source SDK, with worked examples in Python.