Data Council Blog (6)

17/10/17 10:20 | by Pete Soderling | in Data Science, Data Engineering, Event Updates

How Data Has Evolved at The New York Times

Whether you love or hate their paywall, the Times successfully balances competeing business frictions using a deep view of data.

Since our initial DataEngConf in 2015, The New York Times has been a key supporter of the conference. The very first ever DataEngConf talk was a keynote given by Chris Wiggins, the Times' Chief Data Scientist, who presented a broad yet fascinating perspective on "Data Science at The New York Times" (video here).

In the years since, we've had deeply technical talks from both data engineers and data scientists at the Times, and I'm excited that their involvement in DataEngConf this year is as large as it's ever been.

16/10/17 12:37 | by Pete Soderling | in Data Science, Data Engineering, Event Updates, Startups, Apache Arrow

How Dremio Uses Apache Arrow to Increase the Performance

(Image source: http://arrow.apache.org/)

What if all the best open-source data platforms could easily share, ("ahem,") data with each other?

As data has proliferated and open-source software (OSS) has continued to dominate both the stacks and the business models of the top tech companies in the world, the number of different types of data platforms and tools we've seen emerge has accelerated.

Having a hard time keeping up with the differences between Kudu, Parquet, Cassandra, HBase, Spark, Drill and Impala? You're not alone, and obviously this is one of the reasons we bring together top OSS contributors to these platforms to share at DataEngConf.

But there's one new innovation that attempts to bind all the above projects together by enabling them to share a common memory format. It's a new top level Apache Project called Arrow that aims to dramatically decrease the amount of wasted computation that occurs when serializing and deserializing memory objects. The serialization pattern is commonly used when building analytics applications that interact between data systems which have their own internal memory representations.

08/10/17 07:32 | by Pete Soderling | in Data Science, Data Engineering, Event Updates, Startups

Introducing our Data Startups Track

Machine Learning, Neural Nets, "AI" and Computer Vision are changing the world. Discover the data startups that matter.

As an engineer turned founder I've been passionate for years about helping other technical founders succeed. There are a unique set of challenges faced by founders, and building support communities to help them successfully overcome their obstacles helps move innovation forward.

More broadly speaking, I'm also a proponent of bringing engineers together - hence our efforts in the data community via meetups, our conference series and via organizing other, smaller, events for engineers, data scientists and CTOs through Hakka Labs for the past 5 years.

This is why I'm so excited to be introducing the intersection of these two efforts - supporting startups and supporting the data community - into our upcoming DataEngConf NYC.

03/10/17 12:27 | by Pete Soderling | in Data Engineering, Event Updates, Databases, sharding, nosql, postgresql

To Shard or Not to Shard (PostgreSQL)

Wouldn't the world be a simpler place if we could easily scale our RDBMS? (gasp!)

What do you do when you find yourself in a situation where you need to scale out your RDBMS to support greater data volumes than you originally anticipated? Traditionally, one would either need to vertically scale their infrastructure by putting their database on more powerful (costlier) machines or sharding their data across multiple workers.

30/09/17 01:51 | by Pete Soderling | in Data Engineering, Event Updates, Databases

Rolling Your Own Distributed Column Store

When solving your customers' technical challenges push you to break the rules

A re-wording of one of the key maxims for startup success could be "KISS" - "keep it simple, stupid." If you've ever run your own startup, you also know the mantras of "focus" and "fail fast," and the critical reminder of how your product should be a "pain-killer not a vitamin."

14/09/17 08:00 | by Pete Soderling | in Data Science, big data, Data Visualization, disaster management

How Big Data Can Help Improve the Meteorological Risk Models That Are Out of Date

According to a recent article published in The New York Times, water damage from hurricane Harvey extended far beyond flood zones. Now that the rescue efforts are underway, it’s clear that much of the damage occurred outside of the typical boundaries drawn on official FEMA flood maps.

17/04/17 09:07 | by Pete Soderling | in Data Science, Data Engineering, Speaker Spotlight

A Day in the Life: What's it like Being an Engineer at Stripe?

Alyssa Frazee tells us about the unicorn data skills she's honed on the job.

One thing that Alyssa Frazee loves about her work at Stripe is that, like someone with traditional data science skills, she gets to build machine learning models. "Oh, the rapture," cries Alyssa the data scientist!

12/04/17 12:21 | by Pete Soderling | in Data Engineering, Event Updates

Rebuilding Open Source Analytics @ Airbnb

How open source allowed Airbnb to rebuild their expensive BI tool in less than one developer year

Granted Maxime Beauchemin isn't your average data engineer. As any Bay Area engineer worth their salt knows, anyone who worked on data at Facebook receives (deserves) a certain outsized respect from their peers.

06/04/17 12:45 | by Pete Soderling | in Data Engineering, Speaker Spotlight

Pushing Kafka to the Limit at Heroku

How Everyone's Favorite PaaS Operates Kafka at Scale

Scale presents unique challenges for engineers, particularly those at companies who have the largest number of users throwing off the most data exhaust, resulting in the fattest data pipelines with the gnarliest problems. For example, Heroku, arguably the most popular platform as a service (PaaS), who last year decided to offer Apache Kafka to their customers as a hosted service, quickly realized they would need to support a large number of distinct users, each with varying use cases. This put them on a challenging path to attempt to minimize the operational headaches that come inherently with running this kind of infrastructure at scale.

28/03/17 14:05 | by Pete Soderling | in Data Science, Speaker Spotlight

Fighting Fraud in Cryptocurrency using Machine Learning

Coinbase is on the front-lines of discovering advanced cryptocurrency and payment fraud techniques. Hear about how they use machine learning to help them fight the war.

Data Council Blog

How Data Has Evolved at The New York Times

Whether you love or hate their paywall, the Times successfully balances competeing business frictions using a deep view of data.

How Dremio Uses Apache Arrow to Increase the Performance

What if all the best open-source data platforms could easily share, ("ahem,") data with each other?

Introducing our Data Startups Track

Machine Learning, Neural Nets, "AI" and Computer Vision are changing the world. Discover the data startups that matter.

To Shard or Not to Shard (PostgreSQL)

Wouldn't the world be a simpler place if we could easily scale our RDBMS? (gasp!)

Rolling Your Own Distributed Column Store

When solving your customers' technical challenges push you to break the rules

How Big Data Can Help Improve the Meteorological Risk Models That Are Out of Date

A Day in the Life: What's it like Being an Engineer at Stripe?

Alyssa Frazee tells us about the unicorn data skills she's honed on the job.

Rebuilding Open Source Analytics @ Airbnb

How open source allowed Airbnb to rebuild their expensive BI tool in less than one developer year

Pushing Kafka to the Limit at Heroku

How Everyone's Favorite PaaS Operates Kafka at Scale

Fighting Fraud in Cryptocurrency using Machine Learning

Subscribe to Email Updates

Fresh Posts

Categories