TransWikia.com

What are the different messaging patterns for multiple producers to single consumer through a distributed queue?

Software Engineering Asked by kashive on December 3, 2021

We have multiple producers that publish messages to the same SQS queue. We have a single consumer that processes the messages. The producers do not care about the response. It is more like a broadcast message.

We are evaluating several options of how the producers are going to publish to the queue.

Evaluated solution proposals

I’ve come up with three options. Here are the pros/cons of each I can think of:

1. Producers directly connect to the SQS endpoint and publish the message.

Pros

  • The responsibility of availability, latency is shifted to SQS endpoint.
  • SQS provides IAM authorization.
  • No need to maintain any infrastructure.

Cons

  • We cannot control the message content that goes to the queue. Let’s say there is a bug in a producer that leads to invalid message being put in the queue which has to be handled by the consumer. Some options include isolating bad messages in a dead letter queue or ignoring it since it is not a valid message altogether.

  • If we wanted to switch the distributed queue provider, then we would have to change all the producers to publish to a different endpoint.

2. Create a REST service with an API that validates the request and forwards it to the queue. All producers call the API.

Pros

  • Can validate the message at the API layer before putting it in the queue.
  • Can switch to a different distributed queue or processing mechanism without having to change the producers.

Cons

  • Cost associated with creating and maintaining a service that does something trivial as validation and putting the message to the queue eg. infrastructure cost.
  • Another layer of network indirection just for validation and cleaner data contract.
  • Maintaining availability. Adds one more point of failure. Although this service does not have any dependencies besides SQS, we do have to take responsibility of the availability.
    • In option 1, this concern would be handled by SQS.
  • Have to implement authz/authn on the API.

3. Expose a client library to the producers that connects to the queue endpoint and publishes the message.

Pros

  • Gets the best of both option 1 and option 2. We can add validation logic in the client library and expose the appropriate interface in the code. We can also switch to a different distributed queue or processing mechanism by making code changes to the client library and getting the producers to use the new version.

Cons

  • The library is going to be programming language specific. If we have producers in different languages, we may have to build language specific client.

I am leaning towards option 3. It has the pros of options 1 and 2. Also, most of our producers are microservices written in a particular language and I don’t think we will be experimenting with newer languages anytime soon.

Questions

  • Am I missing some options or pros/cons?
  • Are there any best practices for multiple producer to single consumer communication?
  • Are there cases where Option 1 or Option 2 would be more appropriate?

One Answer

Concerns

According to my understanding these are your main concerns:

  • Producers are publishing data without thinking, that's why data sanitization is needed
  • The operational and maintenance cost
  • The underlying queuing mechanism might change in the future

Where to put data sanitization?

As you have listed you can put this logic into

  • Producer side (Option #3)
  • Consumer side (Option #1)
  • Into an intermediate layer (Option #2)

This concern could be evaluated by asking the following questions?

  • What is the percentage of good / garbage message?
    • For example: If there is lot of trash data then filtering should be place as near to the producers as possible
  • What amount of data are we talking about?
    • For example: If the producers are pushing way more data than the consumer can process (fast producer - slow consumer problem) then you might to think to introduce throttling / sampling
  • How dynamic is your filtering logic?
    • For example: If it uses a bunch of rules that are parameterized with database records then a separate tier would make the most sense.
  • etc.

Vendor-locking vs Being technology agnostic

  • If you want to minimize maintenance cost then you would choose a PaaS solution, which would make it really hard to change that decision later
  • If you choose to adapt the anti-corruption layer and introduce a thin proxy then changing the underlying queuing might be an easier task
  • If you try to follow the smart endpoints and dumb pipes guidance then you would want to minimize the usage of PaaS specific functionalities
  • etc.

Answered by Peter Csala on December 3, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP