Building end-to-end data science solutions is a complex task that goes beyond simply winning prizes on Kaggle. Applying advanced machine learning techniques to real-word scenarios requires rigorous cleaning, preparing and feature-engineering of the data before we even get to discussions on algorithms. We then need to test various ML models, explore diverse configurations, and finally, productize our result. Doing this for massive quantities of data creates challenges at scale that become more complex when we also factor in the implications of bias and fairness in data representation.
In this talk, we’ll architect a production-grade ML Pipeline using feature engineering, model training and management tools from Apache Spark. We’ll see demos using Microsoft Azure services such as Azure DataBricks, Event Hub and Cognitive Services, that showcase these pipelines in action. And finally, we’ll explore relevant research, tools and best practices that can be used to craft responsible AI solutions with a focus on issues like bias and fairness in data representation.