DLQ in Kafka

DLQman is an open source application that simplifies error handling in your Kafka integrations by handling messages that could not be processed by a consumer.

Limitations in the DLQ pattern

You might be using a DLQ pattern and are facing a scenario where something happens in your environment that causes thousands of messages to pile up on the DLQ.

The most obvious way to solve that is perhaps to let the consuming application itself (the one sending to DLQ) contain logic to re-process messages from the DLQ. That would likely need some logic to decide on what to do depending on the error or contents of the message. When the number of applications grows this quickly becomes a challenge to maintain with a lot of duplicated code.

Filter the DLQ messages

The type of exception causing the message to be sent to the DLQ could many times be used as the primary piece of information to decide on how to handle the message. In a streaming architecture, a majority of DLQ messages are likely caused by connectivity issues or time related problems such as “data is not yet available”. By making it possible to filter on message traits such as exception headers and message content DLQman makes it easy to quickly address the most urgent problems.

Design and architecture of DLQman

For simplicity, DLQman does not require any client-side changes. Instead it assumes that standard dead letter error handling is used by either Spring Cloud Streams, Smallrye Messaging or Kafka Connect Sink. Each message is stored together with a lifecycle state. This state is central to the solution to track if messages are new, resent, dismissed or in other ways already handled. The state is then used in the filters in order to select the most relevant data.

When messages arrive they are processed by the indexing module and any indexing logic is applied before passing the data to the persistence module.

The API module serves as the interface for managing the received messages. It enables us to list, filter and search for messages based on their indexed traits.

The Workflow definition of the source DLQ tells the handling module where to send a message and what other logic to apply before sending it. Since the complete original message and all its metadata is stored, it is possible to resend the exact same contents again if needed.