To connect systems using asynchronous messages, there are quite a few solutions available. That said, they can be separated in two main types: message brokers and distributed logs. And even though they solve the same problem, it’s important to consider the concepts that they’ve been built on to understand in which situations they perform better.
Messages brokers are the classic messaging alternative. They consist of a broker that accepts messages from producers, puts them on a queue and routes them to the consumers. They’re usually quite sophisticated, providing features like:
- Message transformation and encoding
- Routing and filtering based on the contents of the message
- Consumers can subscribe to queues with names that follow a pattern
- Message transactions and rollbacks
- Queues can be configured to have exclusive consumers (task queues) or to send the same message to a set of consumers
It’s also worth noticing that, unlike distributed logs, once a message has been delivered to a consumer it cannot be replayed. If the need arises, the producer will have to republish the message.
Some of the most used message brokers are ActiveMQ, RabbitMQ and Amazon SQS.
Distributed logs are much simpler in comparison: producers can only write at the end of the log, and consumers can only read messages sequentially from an offset. But by having less features, they’re able to achieve higher throughputs. This is because sequential reads and writes are extremely fast, and the broker is not concerned about routing or transforming the messages. In fact, the broker doesn’t even store the offset of the consumers in the log, each consumer has to remember it.
Because logs are persistent, consumers can replay the same log entries whenever they want, just by providing the correct offset. This is quite handy if you discover a bug in a consumer, because the messages can be replayed to produce the correct output.
The way to distribute the load across nodes is by using partition keys, which determine the log partition where events are stored. It’s important to choose a correct partition key, because events are only guaranteed to be ordered within the same partition. For example, if an application requires to process the events for a given user in order, you could use the user IP as partition key.
Because of their performance, distributed logs are often used as a source or destination of batch and near real-time data processing architectures. They are also commonly used to intake data from large scale sources, like behaviour tracking or connected devices. In a sense, this simplification to achieve greater performance and scalability it’s similar to what NoSQL databases did with traditional RDBMS.
The first and most well-known distributed log is LinkedIn’s Kafka, but Amazon’s Kinesis and Microsoft’s Event Hubs follow the same architecture.