Maximize value and minimize burden of your Kafka data – a summary of Kafka Summit 2024

Introduction

As we gradually move our data from primarily residing a quiet life in databases to flowing, spreading, and creating opportunities on a Kafka platform, we have taken the first step on our journey to become more data-driven and smarter in our services. The next step involves finding the techniques and patterns that maximize value from the data, while maintaining an efficient cost picture for the technologies we use and the solutions we implement. This post describes our thoughts and reflections from the Kafka Summit 2024 in London.

Reflections

One thing is certain after two intense days filled with keynotes and breakout sessions covering everything within streaming data: a data platform based on streaming data has become a foundation for many organizations and is here to stay. Kafka as a technology has been and is still a ‘safe bet’ for how this is technically realized, and it is clear that the ecosystem around both the use and management of the platform is flourishing. Getting our data into Kafka, regardless of where it resides today, has never been easier than now. So the next step is: how can we use the data in the best way?

Universal Data Products

The talks at Kafka Summit highlighted the gap between the operational and analytical sides, each living two separate lives in today’s organizations. Due to shortcomings within organizations, tools, and processes, friction and complexity is impacting these two worlds’ ability to effectively coexist. For example, thoughtless models or lack of schema management might make things blow up downstream, and just swapping out file-based BI exports for Kafka Connect does not entail real-time analytics. “Universal data products”, equally universal and well-managed regardless of which world we find ourselves in, was a recurring concept that we believe there is a lot to.

Apache Flink

One of the bigger talking points at the Kafka summit was of course Apache Flink, a distributed stream processing engine for processing and analyzing large volumes of data in real-time. Apart from performance features such as low latency and high throughput, Flink offers among other things:

  • Event-time processing, which facilitates the handling of event-based data streams, especially useful for time-based analysis involving time windows.
  • Stateful processing, which enables the preservation and reuse of states across time boundaries, necessary for complex computations.
  • More on Flink in a later blog post…

This was yet another topic which emphasized the importance of incorporating great analytics within streaming data.

Managed Kafka

Talking to companies at Kafka Summit, one thing we picked up was the growing option to buy Kafka capabilities as a service rather than running the platform themself. This is of course understandable as Kafka, in its distributed and critical nature, is not the easiest system to operate (though it is becoming easier with the shift from Zookeeper to KRaft). However, not all companies have this option due to cost, security or lock-in reasons and fortunately, Irori has long experience and knowledge in self-hosted Kafka for these types of companies, with everything from developer experience to observability and security.

Conclusion

Universal Data Products, Apache Flink, and Managed Kafka are three out of many takeaways we bring back home from Kafka Summit 2024. The key takeaway, a data platform based on streaming data has become a foundation for many organizations and now a shift in focus is how to more effectively manage the data within the platform.