Data Council Blog

Data Council Blog
|

December 2024 Top Ten (by Dagster Labs)

Hey Data Council-ers!

I'm Pedram Navid, Chief Dashboard Officer at Dagster Labs, the modern data orchestrator for data engineers building data platforms. I'm excited to share some recent articles I've had my eye on these past few weeks.

|

Newsletter: Your December Dose of Data & AI

Ready to be part of something extraordinary? Data Council 2025 is coming back to The Bay, and trust us – you won't want to miss this! Next year will be our biggest Data Council yet with real-life insights, breakthrough discussions and new connections that could shape your next big move.

|

Data Council 2025: Meet the Track Hosts

Hey there, data geeks! We're going all-in on technical depth for Data Council 2025. Meet the industry leaders who‘ll be curating the most cutting-edge tracks in data & AI. No fluff, no marketing talks – just pure technical content from the trenches. Here's who’s crafting your learning experience at our SF Bay Area event this April 22-24.

|

November 2024 Top 10 (by Monte Carlo)

Hey Data Council-ers!

This month, Lindsay MacDonald from Monte Carlo asks a critical question: Is data ready for GenAI? While AI seems ready to take off, are our data foundations really prepared? Let’s find out with this month’s roundup.

Open Source Highlight: Apache Superset

Apache Superset is a very popular open-source project that provides users with an exploration and visualization platform for their (big or not-so-big) data. For instance, it can be used to create line charts, but also advanced geospatial charts and dashboards that support queries via SQL Lab.

Community, Metadata Management, and More: Top 10 Links From Across the Web

Here's our March 2021 roundup of links from across the web that we selected for you:

1. How to Build a Community (Fishtown Analytics)

Claire Carroll's first personal blog post on community-building is a must-read. As Fishtown Analytics' community manager for the last 2.5 years, she's arguably behind the success of the dbt community and its best-in-class practices, so we expected good advice… but she really hit the ball out of the park with this one! The key takeaway is that you should start with wondering 'why' you want to build a community. Make sure to read the full post to understand why it received so much praise.
 

Open Source Highlight: PostHog

PostHog provides open-source product analytics, which users can deploy on their own infrastructure to collect every event on their website or app without having to send the data to third parties - an increasing source of concern in times of GDPR and CCPA.

dbt at Shopify, Active Learning, and More: Top 10 Links From Across the Web

Here's our February 2021 roundup of links from across the web that we picked for you:

1. dbt at Shopify (Data Engineering Podcast)

The Data Engineering Podcast recently featured a very interesting discussion about dbt at Shopify. Engineering manager Zeeshan Qureshi and senior data engineer Michelle Ark explained how dbt answered Shopify’s need for an SQL-based solution that its data scientists could use autonomously. They also mentioned some of the best practices they followed for staging, and cost considerations related to BigQuery. Last but not least, they touched on some extensions they are considering, such as implementing Great Expectations for data quality control.

Open Source Highlight: OpenLineage

OpenLineage is an API for collecting data lineage and metadata at runtime. While initiated by Datakin, the company behind Marquez, it was developed with the aim to create an open standard. As Datakin’s CTO Julien Le Dem explained in a blog post announcing the launch, OpenLineage is meant to answer the industry-wide need for data lineage, while making sure efforts in that direction aren’t fragmented or duplicated.

Storing Cold Metadata, Snowflake Data Cloud, and More: Top 10 Links From Across the Web

Here's our January 2021 roundup of links from across the web that could be relevant to you:

1. Storing Cold Metadata with Alki (Dropbox)

Dropbox shared insights into Alki, the petabyte-scale metadata store it designed for infrequently accessed metadata (“cold data”). The post details how one-size-fits-all database Edgestore was reaching capacity limits, and why audit logs were a good candidate to be moved elsewhere than on costly SSDs. After considering off-the-shelf options, the team settled on building its own solution on top of AWS services: Alki; with DynamoDB as the hot store, and S3 as the cold store. Like HBase or Cassandra, Alki is based on log-structured merge-trees (LSM trees), but is better suited to handle hot-then-cold audit logs, as well as future use cases at Dropbox.