Processing Kafka in Vert.x with control flow

When you read asynchronously from Kafka in Vert.x (subscribing to a KafkaConsumer) ingestion rate it’s capped basically by the Kafka broker and the network bandwidth (take this as a simplistic statement). What I really mean it’s that this setup can achieve really high ingestion rates without much effort.

If you opt to process those messages synchronously (providing that you have a KafkaConsumer per partition) your performance will be roughly that of Kafka Streams.

But if you do it asynchronously (sending those messages to the event bus and setting up a pipeline to process them in parallel) your processing performance can be a lot higher. Materialize doesn’t use Java nor Vert.x but its processing model will be quite similar to that setup. Usually Kafka processing parallelism is done just at the partition level (always will be like that at the broker side).

BTW would be really lovely to have a “parallel Kafka Streams” able to process in parallel even from the same partition while keeping the great expresiveness and simplicity.

You start wondering if you could read X millions messages per second from Kafka and after a while you begin wondering how can you handle that ingestion rate (in most cases if your messages doesn’t start piling up in front of your backend is the GC that cloggs up).

Then I usually setup a WriteStream proxy (with a Pump) that uses event bus metrics (`messages.pending) to flag the write queue full.

I intentionally keep this notes as code-free as I can.