Feature engineering can produce a never-ending set of gotchas - from bugs in features that aren't defined the same way (or even in the same language) in training versus production to mistakes in recording when a feature or label was actually available that lead to unrealistically predictive models.
We will discuss the system we built at Stripe leveraging event-ed data to enable model developers to quickly define (and test!) complex and highly predictive features in a single place in code and make them available for both training and real-time scoring eliminating some of these common classes of feature generation errors.
Kelley Rivoire is an engineering manager at Stripe, where she leads the data infrastructure group, encompassing the storage systems for Stripe's data, the platforms for batch and streaming computation and machine learning that enhance Stripe's products and internal operations, and the core data pipelines powering analytics. As an engineer, she built Stripe’s first real-time machine learning evaluation of user risk. Previously, she worked on nanophotonics and 3D imaging as a researcher at HP Labs after receiving a PhD at Stanford.