An interesting difference between classic software engineering and data engineering is how we think about time. In software engineering we often get away with assuming that things happen instantly, but in data engineering we rarely can: Data is often not available until (long) after events happen. This freshness will often differ between sources within the same system. The act of processing data in itself often takes (significant) time.
When you mix batch processing and low-latency processing of data, you start to encounter challenges in how to maintain explicit lineage, how to ensure different processing jobs see a consistent view of the data, etc.
In this talk, Will will share lessons learned from building mixed-latency data platforms at startups like Better. Expect a deep dive into time horizons, data freshness, pipeline bottlenecks and what functionality you need in your data platform to be able to make your data products reliable and consistent.
With over a decade of experience in data and software, I've recently embarked on a new journey as a growth strategist at Twirl. From tinkering with startup concepts during my MBA to honing my skills in data-driven approaches, I bring a unique blend of expertise to the table.