Data Council Blog

November 2024 Top 10 (by Monte Carlo)

Written by Data Council | 20/11/24 15:56

Hey Data Council-ers!

This month, Lindsay MacDonald from Monte Carlo asks a critical question: Is data ready for GenAI? While AI seems ready to take off, are our data foundations really prepared? Let’s find out with this month’s roundup.

📣 Quick Reminder: Data Council 2025 CFP is open until Friday, Nov 22nd. Submit your talk (or idea) before time runs out!

01 / MISSION LANE TECH BLOG
Continuous Compliance Monitoring
Compliance isn’t something to be taken lightly—and that’s especially true in financial services. This fascinating article provides an overview of Mission Lane's newly developed always-on compliance testing strategy. They share the rationale for pursuing the project, the architecture leveraged to continuously test for compliance, and a few helpful tips to get started on a similar AI-driven strategy in your own organization.

02 / CONFLUENT
Shift Left: Bad Data in Event Streams, Part 1
This is the first post in a fascinating two-part series, and I highly recommend reading both articles. In this piece, Adam Bellemare looks at bad data in relation to event streams to better understand 1) how bad data ends up in an event stream, 2) what data orgs can do about it, 3) the impact on downstream consumers, and 4) how to fix it. This article focuses on handling bad data in batch processing, while the follow-up explores leveraging event design. Great reads!

03 / YELP
Loading Data into Redshift with DBT
Christopher Arnold, Software Engineer at Yelp, shares a technical deep dive into how his team uses DBT with Redshift Spectrum to read data from their data lake into Redshift. The approach eliminates the forking of data flows, reduces runtime, resolves data quality issues and improves developer productivity.

04 / MARTIN FOWLER
Governing Data Products Using Fitness Functions
One of the biggest topics of conversation among enterprise data organizations recently—and one we expect to trend upward—is the data product marketplace. Kiran Prakash shares a thoughtful outline of how automated governance fitness functions can help scale the governance of data products in a data mesh, along with implementation strategies.

05 / MIKKEL DENGSØE
How Top Data Teams Are Structured
It goes without saying—but I’ll say it anyway: building the right data team matters. When it comes to efficiently delivering on new business use cases, the right mix of roles and responsibilities is key to unlocking operational excellence. In his latest article, Mikkel Dengsøe analyzes the distribution of data roles within 40 top data teams in the US and Europe to understand how teams balance science and engineering—and how those structures evolve over time.



06 / FELICIS
The Rise of AI Data Infrastructure
If getting your data team’s structure right is the first step, getting your data infrastructure right is a close second. And that’s never been more apparent than it is now in the AI development race. In this piece, Astasia Myers and Eric Flaningam break down the AI data infrastructure landscape, including the trends they’re observing and the innovations they’re betting on.

07 / JACK VANLIGHTLY
Incremental Jobs and Data Quality Are On a Collision Course - Part 1 - The Problem
If most queries are actually run on smaller datasets, is incremental processing the answer to using less compute? Jack Vanlightly outlines a commonly-felt paradox: incremental, lower-latency analytics workloads are more cost-efficient, but demand for these workloads continues to increase. He shares his perspective on making incremental computation more successful for enterprises.

08 / LAK LAKSHMANAN
What Goes into Bronze, Silver and Gold Layers of a Medallion Data Architecture?
What does a four-layer data engineering architecture look like? In this article, Lak outlines his recommendation for a medallion architecture that enterprise data teams should leverage. He also introduces a “platinum layer” between the silver and gold layers to address data governance, separation of responsibility, and cost efficiency.

09 / BARR MOSES
Survey Says: Data Quality Management Isn’t Evolving Fast Enough for AI
In a recent survey of 200 data professionals, a staggering 91% said they were actively building AI applications, but 2 out of 3 admitted they don’t completely trust the data these applications are built on. Think that sounds like a recipe for disaster? You’re not alone. Check out the survey highlights to see where data quality practices are falling behind.

10 / GRADIENT FLOW
Lessons From the Frontlines of AI Training
Whether we’re talking about dashboards or AI agents, garbage in will ALWAYS mean garbage out. That’s why every AI strategy has to start with the quality of the data. Ben Lorica shares how top AI labs are tackling data quality, along with a few insights for teams looking to level up their AI game.

 

Thanks to Monte Carlo for curating this month’s newsletter!
Team Data Council