Data Council Blog (5)

05/04/18 08:00 | by Robert Winslow | in Data Engineering, Data Warehouse, Data Strategy

Wootric - Featured Startup SF '18

In this blog series leading up to our SF18 conference, we invite our featured startups to tell us more about their data engineering challenges. Today, we speak with Wootric, an early-stage company building customer experience tools powered by machine learning.

03/04/18 07:19 | by Pete Soderling | in Data Science, Data Engineering, Event Updates, Startups, Apache Arrow

The Future of Distributed Databases is Relational

What if developers could ditch their No-SQL solutions and still get scalability from a more traditional relational datastore?

I've been noticing an interesting pattern recently where developers seem to be rejecting some of the newer, more en vogue data stores with limited functionality and use-cases (while promising easier scale) and returning to the comfortable tried-and-true paradigm of relational databases. It seems that we've hit a watershed point where developers finally believe they don't necessarily need to make a trade-off between database features on one hand and easy scalability on the other.

One such company enabling this return to the golden era of of RDBMS is Citus Data. Citus is blazing a trail in 'cloud-proofing' the gold standard of relational databases, PostgreSQL, through extensions that allow their customers to achieve much easier horizontal scalability than ever before.

02/04/18 08:25 | by Robert Winslow | in Data Engineering, Data Warehouse, Data Strategy

Halo Tech - Featured Startup SF '18

In this blog series leading up to our SF18 conference, we invite our featured startups to tell us more about their data engineering challenges. Today, we speak with Halo Tech , an early-stage startup that analyzes complex data to accelerate medical advancements.

02/04/18 08:21 | by Robert Winslow | in Data Engineering, Data Warehouse, Data Strategy

PipelineAI - Featured Startup SF '18

DataEngConf Startup Track ft Pipeline AI

In this blog series leading up to our SF18 conference, we invite our featured startups to tell us more about their data engineering challenges. Today, we speak with PipelineAI, a startup helping you to continuously train, optimize and host deep learning models at scale.

30/03/18 17:57 | by Robert Winslow | in Data Engineering, Data Warehouse, Data Strategy

Instrumental - Featured Startup SF '18

In this blog series leading up to our SF18 conference, we invite our featured startups to tell us more about their data engineering challenges. Today, we speak with Instrumental, an early-stage company building data systems to monitor and improve manufacturing line performance.

21/03/18 08:33 | by Robert Winslow | in Data Engineering, Data Warehouse, Data Strategy

Pachyderm - Featured Startup SF '18

In this blog series leading up to our SF18 conference, we invite our featured startups to tell us more about their data engineering challenges. Today, we speak with Pachyderm, an early-stage company building a data platform for data science.

14/03/18 10:00 | by George Fraser | in Data Warehouse, Redshift, Snowflake, BigQuery, Data Strategy

Redshift versus Snowflake versus BigQuery / Part 1: Performance

Fivetran is a data pipeline that syncs data from apps, databases and file stores into our customers’ data warehouses. The question we get asked most often is “what data warehouse should I choose?” In order to better answer this question, we’ve performed a benchmark comparing the speed and cost of three of the most popular data warehouses — Amazon Redshift, Google BigQuery, and Snowflake.

12/03/18 09:46 | by Max Beauchemin | in Data Science, Data Engineering, functional programming, Data Warehouse, Data Warehousing

Functional Data Engineering — a modern paradigm for batch data processing

Batch data processing — historically known as ETL — is extremely challenging. It’s time-consuming, brittle, and often unrewarding. Not only that, it’s hard to operate, evolve, and troubleshoot.

In this post, we’ll explore how applying the functional programming paradigm to data engineering can bring a lot of clarity to the process. This post distills fragments of wisdom accumulated while working at Yahoo, Facebook, Airbnb and Lyft, with the perspective of well over a decade of data warehousing and data engineering experience.

26/10/17 11:02 | by Pete Soderling | in Data Engineering, Event Updates, Databases

ETL and the Question of Happiness

No one is happy with fragile ETL pipelines. But it doesn't need to be that way.

One might surmise that data "analysis" is, first and foremost, about data "access." It goes without saying that someone in the analyst's role must first obtain access to the data they wish to analyze. And with data being spread all over the inside, and now outside, of the enterprise (think of both your on-premises data stores, plus all the cloud and SaaS vendors you're currently using) modern day analysts face deeper challanges than ever before in obtaining access to the data they need.

And of course, techno-philosophical concepts like "democratizing acess to data" do nothing at all to help one overcome any of the actual technical integration challenges required to practically enable such unfettered access to one's data.

24/10/17 14:47 | by Adam Kelleher | in Data Science, Event Updates

Data Science in the Media

The past few years have been an interesting time for data science everywhere, and the media in particular! We’ve seen some incredible new technologies emerge, like open-source machine learning platforms, as well as machine learning services. These developments have opened the door for new consumer products, like conversational AIs, and new technologies in the media and advertising industries.

Data Council Blog

Wootric - Featured Startup SF '18

The Future of Distributed Databases is Relational

What if developers could ditch their No-SQL solutions and still get scalability from a more traditional relational datastore?

Halo Tech - Featured Startup SF '18

PipelineAI - Featured Startup SF '18

Instrumental - Featured Startup SF '18

Pachyderm - Featured Startup SF '18

Redshift versus Snowflake versus BigQuery / Part 1: Performance

Functional Data Engineering — a modern paradigm for batch data processing

ETL and the Question of Happiness

No one is happy with fragile ETL pipelines. But it doesn't need to be that way.

Data Science in the Media

Subscribe to Email Updates

Fresh Posts

Categories