In this talk, we will present a general multi-armed bandit framework for recommending titles to our 117M+ members on the Netflix homepage. A key aspect of our framework is closed loop attribution to link how our members respond to a recommendation. Our framework performs frequent updates of policies using user feedback collected from a past time interval window.
We will take deeper look at the system architecture. We will illustrate the use of that framework by focusing on two example policies – a greedy exploit policy which maximize the probability a user will play a title and an incrementality-based policy. The latter is a novel online learning approach that takes the causal effect of a recommendation into account. An incrementality-based policy recommends titles that brings about the maximum increase in a specific quantity of interest, such as engagement. This helps discount the effect of recommendations when a user would have played anyway. We describe offline experiments and online A/B test results for both of these example policies.
Jaya is currently a Research Scientist @ Netflix, member of the influencial Woman in Tech, a strong ML Enthusiast and a newly minted Mom
Elliot is a software engineer at Netflix on the Personalization Infrastructure team. He graduated from UC Berkeley (B.S.) and Stanford (M.S.) and has previously worked at eBay and Apple. Currently, he builds big data systems using a variety of technologies including Scala, Spark (Streaming), Kafka, and Cassandra.