Kafka comes today as the essential landmark whenever the main task is to write an event streaming application, to build a streaming architecture and to move software toward real-time processing. In parallel, artificial intelligence is dramatically unlocking new data-driven value for businesses, turning itself into a key process for modern companies. However, several aspects of machine learning don't easily fit in a streaming scenario, because they have been traditionally fed by historical data and supported by batch architectures, for instance when it comes to training. Luckily, progress can be made in that direction using the Kafka ecosystem.
After a brief introduction about "stream-phobic" critical topics in artificial intelligence, the talk will describe how the latter can be applied in real-time by naturally exploiting the Kafka architecture. We will then detail proposed technical solutions for online machine learning training and dynamic model scoring with modern OSS ML systems such as H2O. We will also look into how this effort is justified by listing the outcoming value, such as enabling Kappa Architecture and allowing users to manage end-to-end machine learning applications in a pure streaming context using one framework only.
Andrea Spina is currently working as Head of R&D and Data Engineer Team Lead at Radicalbit, Milan. His work has mostly been focused on streaming technologies, machine learning, and performance boosting. Andrea co-authored the "flink-jpmml" project; he loves to spread the word on how to regulate an end-to-end machine learning lifecycle and streaming applications. He co-authored the "Benchmarking Data Flow Systems for Scalable Machine Learning" science paper at DIMA Group, TU Berlin.