Rockset

If you do stateful processing in Kafka Streams, you have already heard of Rocksdb. Rocksdb is a persistent key value store focused on performance. Despite its terse functionality set (put/get/range get and prefix get) is a powerful tool.

One of the members of the original team that developed Rocksdb at facebook founded Rockset. Rockset indexes automatically structured information (json, avro, etc..) on top of Rocksdb and allows to query that info using SQL.

As many more today it’s trying to market itselft as real-time analytics (who doesn’t). It’s really easy to use: just point your instance to a Kafka topic (there are many more available sources) and that’s it. Rockset will autodiscover the schema and index your data. You can provide simple transformations on ingestion.

Underneath rocksdb will provide the speed. Any query that hit the sweet spots of rocksdb (query by key, prefix search, etc..) will work well. Aside from there you will probably need some tweaking.

In my use cases, Rockset lacks the ability to build real-time pipelines, incremental materialized views, etc… But if you are planning to build a Kafka Streams topology just to index an business entity, Rockset could spare you the effort (not that much anyway). If you are going to build a business entity from a few CDC streams in real time, I’ll stick to other solutions.