When building a data pipeline, we need to decide if we should strictly validate incoming data, and discard anything that we don't support, or if we should be flexible, and accept anything so we can analyze it later. In this talk, I'll discuss how the compromise we've reached at Bluecore, where we both record the "raw" data to recover from bugs or mistakes, as well as strictly validated data. I'll talk about why we think that validating up front is the better choice when building data intensive applications.
Evan Jones is a software engineer at Bluecore in New York. He previously fixed interesting bugs at Twitter, and taught a database class at Columbia as an adjunct. Evan was a co-founder and CTO of Mitro, a password manager for groups and organizations. Before that, he earned a Ph. D. from MIT, researching distributed OLTP databases. Even earlier in his life, he worked at Google in New York for a bit more than a year, and he was a graduate and undergraduate student at the University of Waterloo.