Innovation Through Learning: Streaming Analytics Takes Center Stage at Irori

Continuous Learning and Exploring Streaming Analytics

Introduction

We follow our passion for innovation by continuously exploring promising technologies. To achieve this exploration and learning, our competence group enables the team of Irori to spend one day each month working hands-on with new technologies. We call these days firestarters, a time for everyone at Irori to come together, have fun, learn something new and hopefully start a “fire” that will lead to innovation.

Here is a short list of the different competence iterations that we have done since we started:

  • Real Time integration with Kafka
  • Platform and Infrastructure with focus on Kubernetes.
  • Security: blue team vs red team.
  • Streaming Analytics (current)

In May of 2023, we started our latest firestarter iteration, which is all about streaming analytics!

Current firestarter

The current firestarter encompasses the development and application of streaming analytics within a practice case scenario centered around Kafka and real-time data. The objective is to leverage new technologies to create a Proof of Concept for a real-time analytics platform. This includes delving into aspects such as pattern matching and aggregations in order to provide quick insights into the streaming data. 

All of Irori is divided into four teams, each with the same challenges given by the competence group, however, each team is responsible for designing and implementing the necessary infrastructure and applications to solve the tasks at hand. For example, the teams need to be able to identify patterns within the streaming data and effectively visualize these findings.

In addition to the specific firestarter challenges, recognition is also given for unique technical implementations and the performance of the solution. This arrangement encourages teams to either adopt cutting-edge technologies or to apply more established methods with an emphasis on optimizing performance. Through this approach, we aim to nurture a diverse range of solutions, each highlighting different strengths and innovations.

The teams and their approach

This firestarter involves analyzing streaming data from a social media platform that captures user interactions and activities. Examples of the challenges for this case are; identifying how users engage with one another, determining groups, and distinguishing potential bots among the users.

In addressing the analytics challenges of this firestarter, each team has chosen a distinct technology to leverage. Team Yellow is utilizing Neo4j, a graph database that excels in structuring data through relationships, complemented by its GUI for visual mapping of user interactions. Team Blue investigates the synergy between Kafka and Apache Flink, an established stream processing framework, aiming to create a real-time data processing pipeline that highlights the strengths of both technologies. Team Green focuses on BigQuery to manage datasets and perform efficient querying, integrating it with Kafka through a Google Cloud Run deployed BigQuery sink connector for streamlined data analysis. Meanwhile, Team Red employs Materialize, a streaming database designed for real-time analytics, to generate live, queryable views of streaming data, aiming for immediate insights into user behaviors. Each team’s approach reflects a strategic choice in technology to optimize real-time data analysis and insights.

Conclusion

Fostering a culture of continuous learning allows us to stay ahead in the dynamic field of tech and enables us to recognise trends early on. Our current observation of trends suggest that streaming analytics will become increasingly important as organizations transition towards event-driven architectures and incorporate more streaming data. The ability to process and rapidly analyze high loads of data will play a key role in effective decision-making and will give an advantage in the competitive landscape. Our latest project in streaming analytics has prompted the teams to leverage technologies like Neo4j, Apache Flink, BigQuery, and Materialize, and we will see what comes out from this iteration when we finish it in April.