Data Council Blog

Data Council Blog

Open Source Highlight: Apache Superset

Apache Superset is a very popular open-source project that provides users with an exploration and visualization platform for their (big or not-so-big) data. For instance, it can be used to create line charts, but also advanced geospatial charts and dashboards that support queries via SQL Lab.

Open Source Highlight: PostHog

PostHog provides open-source product analytics, which users can deploy on their own infrastructure to collect every event on their website or app without having to send the data to third parties - an increasing source of concern in times of GDPR and CCPA.

Open Source Highlight: OpenLineage

OpenLineage is an API for collecting data lineage and metadata at runtime. While initiated by Datakin, the company behind Marquez, it was developed with the aim to create an open standard. As Datakin’s CTO Julien Le Dem explained in a blog post announcing the launch, OpenLineage is meant to answer the industry-wide need for data lineage, while making sure efforts in that direction aren’t fragmented or duplicated.

Open Source Highlight: Orchest

Orchest is an open-source tool for creating data science pipelines. Its core value proposition is to make it easy to combine notebooks and scripts with a visual pipeline editor (“build”); to make your notebooks executable (“run”); and to facilitate experiments (“discover”).

Open Source Highlight: Klio

Klio is a framework for easy large-scale processing and ML research on binary files, such as audio files -- its original use case. As a matter of fact, it was developed for audio intelligence at Spotify, which open-sourced it earlier this year at the 2020 International Society for Music Information Retrieval Conference.

Open Source Highlight: DataHub

DataHub is a generalized metadata search & discovery tool. Originally created at LinkedIn, it was open sourced in February of this year , and has been adopted by other companies such as Expedia and Typeform, with the ambition to help connect employees to data that matters to them.
 

How Big Data Can Help Improve the Meteorological Risk Models That Are Out of Date

According to a recent article published in The New York Times, water damage from hurricane Harvey extended far beyond flood zones. Now that the rescue efforts are underway, it’s clear that much of the damage occurred outside of the typical boundaries drawn on official FEMA flood maps.