TransWikia.com

Subscribe to topics or to messages on those topics?

Software Engineering Asked by Sinaesthetic on September 9, 2020

I’ve encountered a couple of different techniques for dealing with service bus messaging (queues and topics) and I’m just looking for some input on best practices.

I’ve encountered a couple of different techniques for consuming messages off of a topic, for instance:

  • The first technique is that a subscriptions is made per message type that might appear on that topic (using filters or routing keys) and then the consumer service listens on each of those subscriptions separately and dispatches accordingly.

  • The other technique is to have a single subscription per consumer (like an identity) and then the consumer will inspect the message to see what the message type is and then dispatch it to its handler.

I feel like the first one could potentially offer some benefits such as making it easier to separate the messages and get them to their respective handlers, or depending on your tooling, maybe make it easier to maintain. For example, in the Azure service bus, you can use explorer to see your subscriptions’ "queues" and get directly to the message types you’re looking for. But I also feel like there’s going to be a performance impact on this technique because you’re creating a lot of subscription objects, and you’re potentially creating one connection to the bus per message type, or at least a thread for each. Also now you have to ensure your subscriptions account for consumers in a different context so that you’re not inadvertently creating a competing consumer model. At scale, I can see this being problematic in terms of performance and ultimately cost.

As for the second technique, it seems that depending on your chosen platform (i.e. RabbitMQ, AzureServiceBus, Kafka, etc.), you should still be able to enable parallelism by grabbing more than one message at a time, but you could have significantly fewer subscriptions, no competing consumer model issue, and the only scaling issue would be normal (i.e. how long does it take a single instance of the consumer to clear a queue vs. your SLO/SLA).The downside is that you don’t have the message separation in your debugging tools, so you’d have to sift through them all to find what you’re looking for.

What are you guys seeing out there? What have you done, what have you run into? Is this a matter of "there’s no truly right answer just do it case by case and pray?"

enter image description here

3 Answers

There is a subtle difference in the technologies that you have mentioned: Regardless whether call their things "topics" or not, the subscription patterns offered from most complex service buses are not just publish/subscribe, but mainly "producer/consumer", so the concept here becomes slightly different

A topic (or routing key) is a label that identifies a particular kind of message that some process will publish. Subscribing to topics means asking your broker to put a copy of that message in a particular queue: this could be implemented with "binding" semantic (AMQP) or with consumer groups (Kafka). In all cases every consumer, along with its replicas, can declare an "inbox" where to receive messages. There will be as many queues (or offsets or whatever) as many consumers (here intended as applications, not processes) are interested on the same topic. Brokers are designed to support huge amounts of consumers: for instance, an average VM will let RabbitMQ to manage ~10k queues or more. The real bottleneck, often, comes from tcp connections. So, unless you are not doing crazy things or you don't have to manage a fleet of IoT devices, having few queue/consumer group per app is quite enough

Notice that I've written "few" and not "one". Why? Because messages in the same queue are obviously processed sequentially, or at least they have a "happened-before" semantics. This is good when you need to rely on an order (eg. you never want to receive a "objectXDeleted" event before "objectXCreated") but in some other cases you can optimize doing things in parallel

The point now is: what we send over topics? A topic should be, precisely, a topic: here in softwareengineering.stackexchange you will find many people that talk about how to design software, but any of them won't talk about football. The messages that flow on a topic should be coherent and this leads you, as producer app developer, to think about

If I published this message on this topic, would ALL consumers be interested in it?

So you can work with few patterns. In IoT, for instance, is quite common to have a REST-style topic, something like

devices/1234/commands

So that the single device is able to receive all and the only messages that it should, while some component (I don't know....a metric collector?) could subscribe to devices/+/commands to listen command towards all devices

domainevents.order.created

This is how you could broadcast domain events in a microservices architecture. It would be very surprising if a microservice would be interested in reacting to changes of Order1 aggregate while is ignoring updates from Order2

So, even if there is never a one-size-fits-all answer (luckly, otherwise we would be unemployed) is clear that we should and we can provide ourselves the right means to reach the optimum goal: one copy of each message for each interested application and anything else

Answered by Carmine Ingaldi on September 9, 2020

The main thing that strikes out to me in your question is the following:

But I also feel like there's going to be a performance impact on this technique because you're creating a lot of subscription objects, and you're potentially creating one connection to the bus per message type, or at least a thread for each.

Have you ever measured this? Normally, message buses are specifically built to service a large number of consumers efficiently. Don't assume problems which you never encountered just because you have a potential count n of a given object X.

As for the second technique, [...] you should still be able to enable parallelism by grabbing more than one message at a time, but you could have significantly fewer subscriptions, no competing consumer model issue [...]

Yes, but instead of using the software you have aquired to do that one job, you implement the job yourself, by grabbing multiple messages, inspecting them, and forwarding them to the correct consumer.

This is one more component that you have to develop, maintain and pay for.

Conclusion: use technique A. It is cheaper, faster to develop, easier to maintain. If at any point in the future, you measure an actual problem, and at the same time, you prove through a prototype that technique B solves this problem, you can still refactor. Basically you just have an additional dispatcher component in front of your existing consumers.

Answered by mtj on September 9, 2020

In general, you won't subscribe to garbage you're definitely not interested in: it wastes network capacity, CPU cycles for desalinization and promotes more frequent garbage collection in managed environments.

More than that: think of error boundaries. You're kind introducing single point of failure; even if you span multiple instances of you dispatcher, it is still much harder to support and improve (downtime, rollbacks, load-sharing, metrics, logging etc.) then separate dataflow not interacting with each other.. unless they do interact with each other.

If they do, the synchronization pain (pray God you won't have to guarantee data consistency at some high measure) between pseudo-independent dataflows would definitely overweight all the benefits.

P.S. As usual with software engineering: there are no best-practicies that work for all the cases. Let your business requirements guide you. And make sure you are designing with clear error boundaries, taking into consideration worst cases that happen sooner or lesser.

Answered by Sereja Bogolubov on September 9, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP