In Schibsted we have billions of events stored in their raw format on S3 buckets every day. Our analyst and data scientist have been fighting to get this data and start using it to: get insights, make analysis and build models. Exploring this data is complicated because of the evolving schema, the size and the lack of supporting tooling.
We have worked on democratizing access to data by providing tooling to reduce time to data, and time to insights. We started with Jupyter, providing a serverless solution with some extra features and 0 infrastructure work. as Easy as clicking a button on your SSO dashboard. But this wasn't enough, and later on, we started offering an alternative driven by the use of SQL and JDBC connectivity.
After a Beta version with Athena and few data, we have moved to Presto with our own patched solution. We are promoting some of these features to the OpenSource community and exploring ways to offer the others (like per-user data access authorization) to our DataENgineer colleagues outside Schibsted. We will speak about this journey and get deeper into the Presto chapter. How we have achieved a Continuous Delivery Pipeline using mixing Travis, spinnaker, cloudformation and AWS.
What are the downsides of maintaining your patched presto version, the cost of maintaining it up to date and what you should take into account before choosing a query engine solution for your company.
Iker is a software engineer with passion for data. He has worked during 10 years for several industries like farming, banking or web. Self declared Data-Democrat in Schibsted Media Group dedicates his time to making data reach people, either transforming complex data into the audience expectations, or building tools so his colleagues can make the best use of it.
Albert Franzi is a Software Engineer who fell so in love with data that ended up as Data Engineer for Alpha Health. He believes in a world where distributed organizations can work together to build common and reusable tools to empower their users and projects. Albert cares deeply about unified data and models as well as data quality and enrichment. He also has a secret plan to conquer the world with data, insights, and penguins.