We all know how hard Big Data stacks can be to build, use and maintain. Gartner estimates that 85% of big data projects are killed before production release. In this talk engineering leaders from Criteo's Data Reliability Engineering team will show how wide spread use of SQL addressed the two biggest issues in data engineering: systems efficiency and developer productivity.
Criteo has hundreds of PBs of data under management with over 100K cores and 1PB+ of main memory available for processing it. In addition to the pure scale of the system there are 500+ developers from around the world interacting with the system directly, the vast majority of whom have at one point or another push data transformation code into production.
The unique challenges of truly huge scale, highly concurrent workloads and geographic distribution of users required an equally unique approach (and quite a lot of serious engineering and good old fashioned elbow grease).
One doesn't have to look to very far back to realize that the RDBMS paradigm of a referentially transparent, lazily evaluated, declarative (and highly expressive) language executing on top of a separately optimizable and easily abstracted away run-time could reap huge benefits. With the advent of technologies like Hive, Spark-SQL and Presto we are clearly not the first engineers to think of the problem in these terms, but we decided to see just how far we could push SQL by leveraging it in every nook and cranny of our data infrastructure.
Justin Coffey is an Engineering Director in SRE at Criteo in charge of the Data Reliability Engineering team comprising over 30 engineers. The DRE team is responsible for the many PBs of data Criteo needs to run its predictive engine and day to day business operations as well as the tools and systems that enable engineers and analysts to build scalable solutions on top of it.