TransWikia.com

How to deail with a hypothetical situation in which a pub/sub cycle gets into an unending recursive loop?

Software Engineering Asked by Aurlito on November 17, 2021

Let me explain what I mean. Imagine A subs to event b. In such case A pubs event a. B subs to event a. In this case B subs b. This is a full-blown circle. How does a pub/sub cycle deal with such fringe case? I haven’t tested it because I’m yet to write my own pub/sub engine in Java or JavaScript. In fact I’m not sure if I understand it correctly that’s why nothing in Google came up after I searched fr these keywords "pubsub" + "loop" + "recursion".

Let me draw an ASCII diagram.

A <-----------> b
|               |
|               |
|               |
a <---------->  B

5 Answers

This scenario is only real in routing systems in a web topology. IP has a TTL field to prevent just this (Time To Live), which is decremented with every forwarding operation. When it hits zero the message will not be forwarded, even if it did not reach its destination yet. No response, please try again. Alternatively the counter can be a timeout value.

On a service level this problem should not exist, if it does there is a serious problem with the logic which should be fixed immediately. In a complex distributed system you could still include this mechanism as a safety measure to get notified if a loop is created by accident. Ideally you would get an email if a loop were detected, identifying the source and the followed route. Depending on the type of system this may mean a lot of overhead. I would make it such that you can turn it on and off.

Answered by Martin Maat on November 17, 2021

One of the lessons I've learned the hard way with complex messaging systems is that you need to use a some sort of common 'context' header that is attached to any message. One of the things that is helpful to add is a breadcrumb trail. Essentially, to start, when you have a message generated by the receipt of another, you record the source in the context. But take it one step further, you should also attach the history from the source event. As messages bounce around the pachinko machine, you now have a history of the path that was taken.

Once you have that, you can now do a simple check for cycles (this works for any graph-type situation by the way): check the history and if the current 'location' is already in the history, you have a cycle and you can route the message to a poison-message handling subsystem.

Of course, if you've designed thing to purposely loop through the same topics/queues (yes, I've seen this,) you will have a much harder problem to solve. I would avoid such designs anyway because they are problematic in general.

The big challenge is making sure this hand-off happens reliably. The best way to ensure this is to bake it into your libraries and APIs for message consumption/production.

Answered by JimmyJames on November 17, 2021

The question you've posted here boils down to

How do I not cause recursion when I set up a recursive chain of events?

The straightforward answer is

Don't set it up then.

The validity of your situation entirely relies on context, which you've omitted from the question.

  • The supposition is that the situation is configured this way (intentionally) because it is necessary. That's the basis for any hypothetical: what if we need to do this?
  • The logical conclusion is that when you set up a recursive scenario, you obviously want recursion to take place.
  • The logical consequence is that your recursive loop has to have some sort of end condition which prevents the cycle from repeating any further.

That's just recursion 101. Whether or not this is pubsub is irrelevant.

Answered by Flater on November 17, 2021

Consider changing the design of your events:

simplistic : a publication number that you increment. and drop the event when it reaches some number. Worked on a reactive client server system I made years ago.

Better: keep the parent event a inside b when published. if A saw an event triggered by A it would drop processing the event.

This is commonly used in large scale distributed . it makes debugging what's going on a lot easier too for debuggers of B or A., especially if they're separate services, servers or microservices.

Answered by Tim Williscroft on November 17, 2021

It depends on the tech involved, but in general, messages flood the hell out of each publisher and you quickly find out where the limitations of your system lie.

Subscribers that take the most time/cpu won’t keep up with the flood, and will either keel over due to the load or accumulate pending messages until something in the subscriber queue runs out of resources and breaks.

In the case of synchronous in-process events (like .NET’s events), it will look like your standard faire infinite loop - though harder to identify and harder to debug.

Instrumentation should catch the exponential growth, but probably not quick enough. Depending on the situation, throttling can be put into place so the pub isn’t flooded with messages - but there aren’t great failure modes for that. And this sort of flood can look very similar to normal traffic spikes due to say... a Black Friday sale.

For internal only events, documentation and communication will create hierarchies of events so there aren’t cycles. Humans will make mistakes of course.

Answered by Telastyn on November 17, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP