In building data products at scale there exists a spectrum of endeavors, at one end of which is data analysis and model prototyping and at the other end are data engineering pipelines. Tools such as Scikit-learn and Tensorflow have made former accessible while Spark and other big data stacks have addressed needs on the latter end of the spectrum.
Somewhere in the middle of this spectrum is the challenge of operationalizing machine learning models. In this talk, we will share practical lessons and patterns for building machine learning (ML) models in production, based on our experience with search ranking and recommendation systems at Instacart. As part of this I will include a detailed discussion on the technical challenges in building a ML features pipeline, one of which is now shared across multiple data products at Instacart.