Message Brokers and Distributed Logs

Apr 25, 2017 00:00 · 463 words · 3 minutes read distributed systems brokers logs

To connect systems using asynchronous messages, there are quite a few solutions available. That said, they can be separated in two main types: message brokers and distributed logs. And even though they solve the same problem, it’s important to consider the concepts that they’ve been built on to understand in which situations they perform better.

Message Brokers

Messages brokers are the classic messaging alternative. They consist of a broker that accepts messages from producers, puts them on a queue and routes them to the consumers. They’re usually quite sophisticated, providing features like:

Message transformation and encoding
Routing and filtering based on the contents of the message
Consumers can subscribe to queues with names that follow a pattern
Message transactions and rollbacks
Queues can be configured to have exclusive consumers (task queues) or to send the same message to a set of consumers

It’s also worth noticing that, unlike distributed logs, once a message has been delivered to a consumer it cannot be replayed. If the need arises, the producer will have to republish the message.

Some of the most used message brokers are ActiveMQ, RabbitMQ and Amazon SQS.

Distributed Logs

Distributed logs are much simpler in comparison: producers can only write at the end of the log, and consumers can only read messages sequentially from an offset. But by having less features, they’re able to achieve higher throughputs. This is because sequential reads and writes are extremely fast, and the broker is not concerned about routing or transforming the messages. In fact, the broker doesn’t even store the offset of the consumers in the log, each consumer has to remember it.

Because logs are persistent, consumers can replay the same log entries whenever they want, just by providing the correct offset. This is quite handy if you discover a bug in a consumer, because the messages can be replayed to produce the correct output.

The way to distribute the load across nodes is by using partition keys, which determine the log partition where events are stored. It’s important to choose a correct partition key, because events are only guaranteed to be ordered within the same partition. For example, if an application requires to process the events for a given user in order, you could use the user IP as partition key.

Because of their performance, distributed logs are often used as a source or destination of batch and near real-time data processing architectures. They are also commonly used to intake data from large scale sources, like behaviour tracking or connected devices. In a sense, this simplification to achieve greater performance and scalability it’s similar to what NoSQL databases did with traditional RDBMS.

The first and most well-known distributed log is LinkedIn’s Kafka, but Amazon’s Kinesis and Microsoft’s Event Hubs follow the same architecture.