Because of the scope of Heroku's challenges, Software Engineer of Data Infrastructure, Jeff Chao, experienced a multitude of fascinating failure scenarios with Kafka that most of us would likely never see in our typical implementations. For example, Jeff discovered that there are a variety of situations where brokers can enter into cascading failure and eventually render clusters completely unavailable. He learned that the key to preventing these failures is to ensure that when one broker fails remaining brokers are resilient enough to be able to take on additional partitions from the downed broker. Although this might seem somewhat obvious in theory, Jeff discovered that in practice there are many details that can easily be overlooked which he will cover in depth in his talk.
To up your data pipeline game, and learn how Jeff and the data team pushed the limits of Kafka at Heroku, check out the full talk at DataEngConf SF '17.