Data Council Blog

Data Council Blog

Data Council

Recent Posts:

Apache Airflow, Beyond Spreadsheets, and More: Top 10 Links From Across the Web

Here's our July 2020 roundup of relevant links for data professionals, from blog posts to podcast episodes:

1. The State of Airflow

Software Engineering Daily recently invited Apache Airflow's creator Maxime Beauchemin and Astronomer engineers Vikram Koka and Ash Berlin-Taylor to discuss the state of Airflow. Listen to the podcast episode or read the transcript to hear their comments on Airflow's use cases, its purpose, the open source ecosystem, and more.

| |

Open Source Highlight: Apache Iceberg

Apache Iceberg is an open table format for very large analytic datasets. You can use it with Presto or Spark to add tables that use a high-performance format that vows to work just like a SQL table.

AGI, Dask, Feature Stores, and More: Top 10 Links From Across the Web

Here's our June 2020 roundup of relevant links for data professionals, from blog posts to podcast episodes:

1. Self-Supervised Learning vs. AGI

"AGI does not exist — there is no such thing as general intelligence. We can talk about rat-level intelligence, cat-level intelligence, dog-level intelligence, or human-level intelligence, but not artificial general intelligence," Yann LeCun declared during an online session of the International Conference on Learning Representation (ICLR) 2020, which VentureBeat wrote about. Together with fellow Turing Award winner Yoshua Bengio, he advocated for pursuing humanlike AI through "self-supervised learning."

Open Source Highlight: Cube.js

Cube.js is an open source analytics framework meant to answer the "lack of tools for software engineers who are building production, customer-facing applications and need to embed analytics features into these applications," its co-founder and CEO Artyom Keydunov explained in a blog post

What Data Tools DON’T Do, CD4ML and NoSQL: Top 10 Links from Across the Web

Here's our monthly roundup of relevant links for data professionals, from blog posts and tutorials to podcast episodes:

1. Product Management for AI

Peter Skomoroch and Mike Loukides co-authored a very interesting post on what makes product management different in the context of AI. Based on the specificities of AI software development, they make a series of recommendations for a process that also takes business priorities into account. Their post also ends with a list of relevant resources, so it is worth checking out.

Open Source Highlight: Streamlit

Streamlit officially launched out of beta on October 1st, 2019 with the promise to "turn Python scripts into beautiful ML tools." On the same day, Google's AI-focused venture fund Gradient Ventures announced its investment into the startup, which has since then attracted a considerable amount of attention despite its young age.

Data Science, Data Analytics, Data Engineering and Artificial Intelligence: 11 Online Courses You Should Check Out

With COVID-19 forcing almost one billion people to shelter in place around the world, many people have turned to new activities, such as drawing, baking, gardening… or online learning. If that doesn't sound like you, don't feel guilty by any means – sometimes, surviving is enough! But if you want to get more knowledgeable about data science, data engineering and artificial intelligence, we are here for you.

This is why we came up with this list of courses that can help you prepare for a future job in the data field, upgrade your existing skills, or just satisfy your personal curiosity. From free entry-level courses to full-time bootcamps, here's our selection for you to check out:

PyTorch Lightning, ksqlDB and More: Top 10 Links from Across the Web

Here are 10 recent relevant links for data professionals, from blog posts and tutorials to podcast episodes:

1. PyTorch Lightning: a gentle introduction

Former Data Council speaker Will Falcon published an interesting post on PyTorch Lightning, the lightweight PyTorch wrapper born out of his Ph.D. AI research at NYU CILVR and Facebook AI Research (FAIR). Framed as "a gentle introduction", it includes a side-by-side comparison of building a simple MNIST classifier PyTorch and PyTorch Lightning, in order to illustrate how to refactor one into the other. This is highly recommended reading if you are working on AI/ML research, be it as a professional researcher, student or in production.

Data Engineer Salaries Around The World (2019)

Your potential salary as a data engineer heavily depends on where you are based; but cost of living also varies around the world. Wondering where you can actually earn more? Let's take a closer look at the United States, Europe and Asia to compare and benchmark data engineering salaries.