At Dstillery we are experts in using machine learning to predict user behavior and build accurate and high performing audiences for digital advertising. We convert a stream of billions of events per day into a large sparse feature set containing millions of behavioral attributes tied to hundreds of millions of devices. We train thousands of models daily, producing thousands of audiences; each device is scored in and out of each audience daily.
A core challenge in predicting user behavior from behavioral attributes such as website visitation is the extremely sparse, extremely high dimensional feature space - 10 million features in its most naive representation. Here we present a novel approach to both reducing the dimensionality of the problem, and extracting maximal information from each feature, even those rarely or never seen in the training data. The result is an embedding of the data that captures the behavioral meaning behind each website visit, allowing us to improve our model building even for problems for which training data are scarce.
In this talk we will discuss how we use online learning to learn website embeddings at scale, improving our model building with scarce data. We will dive into the neural network architecture and production pipeline that allows us to achieve online learning of embeddings. We will also demonstrate the use of these embeddings by presenting our partnership with a digital survey company, for which we applied our modeling with embeddings to their survey data, scaling the small set of survey respondents to a high performing audience for use in digital advertising.
Amelia White is a lead data scientist at Dstillery. She leads a team of data science researchers focused on developing novel and impactful solutions to advance Dstillery's offerings as an audiences solutions partner.
Before joining Dstillery, Amelia worked in Computational Biology applying Computer Vision to high throughput imaging screens of c. elegans. Amelia holds a Ph.D in Computational Biology from Rutgers University.