Subscribe to topics or to messages on those topics?

Software Engineering Asked by Sinaesthetic on September 9, 2020

I’ve encountered a couple of different techniques for dealing with service bus messaging (queues and topics) and I’m just looking for some input on best practices.

I’ve encountered a couple of different techniques for consuming messages off of a topic, for instance:

• The first technique is that a subscriptions is made per message type that might appear on that topic (using filters or routing keys) and then the consumer service listens on each of those subscriptions separately and dispatches accordingly.

• The other technique is to have a single subscription per consumer (like an identity) and then the consumer will inspect the message to see what the message type is and then dispatch it to its handler.

I feel like the first one could potentially offer some benefits such as making it easier to separate the messages and get them to their respective handlers, or depending on your tooling, maybe make it easier to maintain. For example, in the Azure service bus, you can use explorer to see your subscriptions’ "queues" and get directly to the message types you’re looking for. But I also feel like there’s going to be a performance impact on this technique because you’re creating a lot of subscription objects, and you’re potentially creating one connection to the bus per message type, or at least a thread for each. Also now you have to ensure your subscriptions account for consumers in a different context so that you’re not inadvertently creating a competing consumer model. At scale, I can see this being problematic in terms of performance and ultimately cost.

As for the second technique, it seems that depending on your chosen platform (i.e. RabbitMQ, AzureServiceBus, Kafka, etc.), you should still be able to enable parallelism by grabbing more than one message at a time, but you could have significantly fewer subscriptions, no competing consumer model issue, and the only scaling issue would be normal (i.e. how long does it take a single instance of the consumer to clear a queue vs. your SLO/SLA).The downside is that you don’t have the message separation in your debugging tools, so you’d have to sift through them all to find what you’re looking for.

What are you guys seeing out there? What have you done, what have you run into? Is this a matter of "there’s no truly right answer just do it case by case and pray?"

There is a subtle difference in the technologies that you have mentioned: Regardless whether call their things "topics" or not, the subscription patterns offered from most complex service buses are not just publish/subscribe, but mainly "producer/consumer", so the concept here becomes slightly different

A topic (or routing key) is a label that identifies a particular kind of message that some process will publish. Subscribing to topics means asking your broker to put a copy of that message in a particular queue: this could be implemented with "binding" semantic (AMQP) or with consumer groups (Kafka). In all cases every consumer, along with its replicas, can declare an "inbox" where to receive messages. There will be as many queues (or offsets or whatever) as many consumers (here intended as applications, not processes) are interested on the same topic. Brokers are designed to support huge amounts of consumers: for instance, an average VM will let RabbitMQ to manage ~10k queues or more. The real bottleneck, often, comes from tcp connections. So, unless you are not doing crazy things or you don't have to manage a fleet of IoT devices, having few queue/consumer group per app is quite enough

Notice that I've written "few" and not "one". Why? Because messages in the same queue are obviously processed sequentially, or at least they have a "happened-before" semantics. This is good when you need to rely on an order (eg. you never want to receive a "objectXDeleted" event before "objectXCreated") but in some other cases you can optimize doing things in parallel

The point now is: what we send over topics? A topic should be, precisely, a topic: here in softwareengineering.stackexchange you will find many people that talk about how to design software, but any of them won't talk about football. The messages that flow on a topic should be coherent and this leads you, as producer app developer, to think about

If I published this message on this topic, would ALL consumers be interested in it?

So you can work with few patterns. In IoT, for instance, is quite common to have a REST-style topic, something like

devices/1234/commands

So that the single device is able to receive all and the only messages that it should, while some component (I don't know....a metric collector?) could subscribe to devices/+/commands to listen command towards all devices

domainevents.order.created

This is how you could broadcast domain events in a microservices architecture. It would be very surprising if a microservice would be interested in reacting to changes of Order1 aggregate while is ignoring updates from Order2

So, even if there is never a one-size-fits-all answer (luckly, otherwise we would be unemployed) is clear that we should and we can provide ourselves the right means to reach the optimum goal: one copy of each message for each interested application and anything else

Answered by Carmine Ingaldi on September 9, 2020

The main thing that strikes out to me in your question is the following:

But I also feel like there's going to be a performance impact on this technique because you're creating a lot of subscription objects, and you're potentially creating one connection to the bus per message type, or at least a thread for each.

Have you ever measured this? Normally, message buses are specifically built to service a large number of consumers efficiently. Don't assume problems which you never encountered just because you have a potential count n of a given object X.

As for the second technique, [...] you should still be able to enable parallelism by grabbing more than one message at a time, but you could have significantly fewer subscriptions, no competing consumer model issue [...]

Yes, but instead of using the software you have aquired to do that one job, you implement the job yourself, by grabbing multiple messages, inspecting them, and forwarding them to the correct consumer.

This is one more component that you have to develop, maintain and pay for.

Conclusion: use technique A. It is cheaper, faster to develop, easier to maintain. If at any point in the future, you measure an actual problem, and at the same time, you prove through a prototype that technique B solves this problem, you can still refactor. Basically you just have an additional dispatcher component in front of your existing consumers.

Answered by mtj on September 9, 2020

In general, you won't subscribe to garbage you're definitely not interested in: it wastes network capacity, CPU cycles for desalinization and promotes more frequent garbage collection in managed environments.

More than that: think of error boundaries. You're kind introducing single point of failure; even if you span multiple instances of you dispatcher, it is still much harder to support and improve (downtime, rollbacks, load-sharing, metrics, logging etc.) then separate dataflow not interacting with each other.. unless they do interact with each other.

If they do, the synchronization pain (pray God you won't have to guarantee data consistency at some high measure) between pseudo-independent dataflows would definitely overweight all the benefits.

P.S. As usual with software engineering: there are no best-practicies that work for all the cases. Let your business requirements guide you. And make sure you are designing with clear error boundaries, taking into consideration worst cases that happen sooner or lesser.

Answered by Sereja Bogolubov on September 9, 2020

Related Questions

Handling Networking on Android

1  Asked on January 31, 2021 by derek

How do developers show progress when it’s a “slow week”?

5  Asked on January 28, 2021 by timmy-zhou

How does one-way data binding and MVC achieve loose coupling?

1  Asked on January 26, 2021 by szeb

Trying to figure out the optimal selection based on a set of rules

2  Asked on January 24, 2021 by envin

C# Why should i limit myself to List or Stack ? ( instead of having both)

4  Asked on January 21, 2021 by ran-keren

How to handle API token(s) that expires after time

1  Asked on January 19, 2021 by inx51

Using for_each instead of iterators to avoid iterator invalidation

1  Asked on January 18, 2021 by scttrbrn

Implementing a “remember me” mechanism using access token and refresh token in a web server

0  Asked on January 14, 2021 by two-horses

What’s the progress on Haskell records?

2  Asked on January 14, 2021 by mmh

How to use Guice for an effective API Design?

3  Asked on January 9, 2021

How is a reproducible build guaranteed with version ranges in NPM?

3  Asked on January 9, 2021 by finlay-weber

Single vs Multiple Technology Stacks

1  Asked on January 6, 2021 by daniel-voina

Why are some languages called platform dependent if I can always share the source code?

2  Asked on January 4, 2021 by buddygyan

Why Don’t We Add Attributes To Use Case Diagrams?

1  Asked on January 3, 2021 by saif-ul-islam

Name for unnecessary transcoding antipattern?

1  Asked on December 30, 2020 by bart-robinson

2  Asked on December 28, 2020 by keezar

What’s the opposite of primitive?

2  Asked on December 24, 2020 by dwjohnston

Choosing which operations to perform in the front-end or the back-end

1  Asked on December 22, 2020 by toivo-swn

What’s the use case of the combined fragment “strict”?

2  Asked on December 21, 2020 by timsch