When would you need "hundreds of thousands" of threads?

Question

Erlang, Go, and Rust all claim in one way or another that they support concurrent programming with cheap "threads"/coroutines.  The Go FAQ states:

It is practical to create hundreds of thousands of goroutines in the same address space.

The Rust Tutorial says:

Because tasks are significantly cheaper to create than traditional threads, Rust can create hundreds of thousands of concurrent tasks on a typical 32-bit system.

Erlang's documentation says:

The default initial heap size of 233 words is quite conservative in order to support Erlang systems with hundreds of thousands or even millions of processes.

My question: what sort of application requires so many concurrent threads of execution?  Only the busiest of web servers receive even thousands of simultaneous visitors.  Boss-worker/job-dispatching type applications I've written hit diminishing returns when the number of threads/processes is much greater than the number of physical cores.  I suppose it might make sense for numerical applications, but in reality most people delegates parallelism to third party libraries written in Fortran/C/C++, not these newer generation languages.

Florian Margaine · Answer

A simple example for Erlang, which was designed for communication: transferring network packets. When you do one http request, you might have thousands of TCP/IP packets. Add to this that everyone connects at the same time, and you have your use case.

Consider many applications used internally by any big company to handle their orders or whatever they might need. Web servers are not the only thing needing threads.

BraveNewCurrency · Answer

My question: what sort of application requires so many concurrent threads of execution?

1) The fact that a language "scales" means there is less chance you'll have to ditch that language when things get more complex down the road.  (This is called the "Whole Product" concept.) Many people are ditching Apache for Nginx for this very reason. If you're anywhere close to the "hard limit" imposed by thread overhead, you'll get scared and start thinking about ways to get past it. Web sites can never predict how much traffic they will get, so spending a little time making things scalable is reasonable.

2) One goroutine per request just the start. There are plenty of reasons to use goroutines internally.

Consider a web app with 100's simultaneous requests, but each request generates 100's of back-end requests. The obvious example is a search engine aggregator. But pretty any app could create goroutines for each "area" on screen, then generate them independently instead of sequentially. For example, every page on Amazon.com is made up of 150+ back-end requests, assembled just for you. You don't notice because they are in parallel, not sequential, and each "area" is it's own web service.
Consider any app where reliability and latency are paramount. You probably want each incoming request to fire off a few back-end requests, and return whichever data comes back first.
Consider any "client join" done in your app. Instead of saying "for each element, get data", you can spin off a bunch of goroutines. If you have a bunch of slave DBs to query, you will magically go N time faster.  If you don't, it won't be any slower.

hit diminishing returns when the number of threads/processes is much greater than the number of physical cores

Performance isn't the only reason to break up a program into CSP.  It can actually make the program easier to understand, and some problems can be solved with a lot less code.

As in the slides linked above, having concurrency in your code is a way to organize the problem.  Not having goroutines is like not having a Map/Dictonary/Hash data structure in your language. You can get by without it. But once you have it, you start using it everywhere, and it really simplifies your program.

In the past, this meant "roll your own" multithreaded programming. But this was complex and dangerous -- there still aren't a lot of tools to make sure you're not creating races. And how do you prevent a future maintainer from making a mistake?  If you look at big/complex programs, you'll see they expend a LOT of resources in that direction.

Since concurrency isn't a first-class part of most languages, today's programmers have a blind spot for why it would be useful to them.  This will only become more apparent as every phone and wristwatch heads towards 1000 cores. Go ships with a built-in race-detector tool.

Brian Knoblauch · Answer

Convenience. Back when I started doing multi-threaded programming, I was doing a lot of simulation and game development on the side for fun. I found it to be of great convenience to just spin off a thread for every single object and let it do it's own thing rather than process each one through a loop. If your code isn't disturbed by non-deterministic behaviour and you don't have collisions, it can make coding easier. With the power available to us now, if I were to get back into that, I can easily imagine spinning off a couple thousand threads due to having enough processing power and memory to handle that many discrete objects!

Answered by Brian Knoblauch on November 6, 2021

Blrfl · Answer

In a language where you're not allowed to modify variables, the simple act of maintaining state requires a separate execution context (which most people would call a thread and Erlang calls a process).  Basically, everything is a worker.

Consider this Erlang function, which maintains a counter:

counter(Value) ->
    receive                               % Sit idle until a message is received
        increment -> counter(Value + 1);  % Restart with incremented value
        decrement -> counter(Value - 1);  % Restart with decremented value
        speak     ->
            io:fwrite("~B~n", [Value]),
            counter(Value);               % Restart with unaltered value
        _         -> counter(Value)       % Anything else?  Do nothing.
    end.

In a conventional OO language like C++ or Java, you'd accomplish this by having a class with a private class member, public methods to get or change its state and an instantiated object for each counter.  Erlang replaces the notion of the instantiated object with a process, the notion of methods with messages and maintenance of state with tail calls that restart the function with whatever values make up the new state.  The hidden benefit in this model -- and most of Erlang's raison d'être --  is that the language automatically serializes access to the counter value through the use of a message queue, making concurrent code very easy to implement with a high degree of safety.

You're probably used to the idea that context switches are expensive, which is still true from the perspective of the host OS.  The Erlang runtime is itself a small operating system tuned so switching among its own processes is quick and efficient, all while keeping the number of context switches the OS does down to a minimum.  For this reason, having many thousands of processes isn't an issue and is encouraged.

Maximus Minimus · Answer

Some rendering tasks spring to mind here. If you're doing a long chain of ops on every pixel of an image, and if those ops are parallelizable, then even a relatively small 1024x768 image is right into the "hundreds of thousands" bracket.

Answered by Maximus Minimus on November 6, 2021

Zachary K · Answer

For Erlang it is common to have one process per connection or other task. So for example a streaming audio server might have 1 process per connected user.

The Erlang VM is optimised to handle thousands or even hundreds of thousands of processes by making context switches very cheap.

kr1 · Answer

one use case - websockets:
as websockets are long-lived compared to simple requests, on a busy server a lot of websockets will accumulate over time. microthreads give you a good conceptual modelling and also an relatively easy implementation.

more in general, cases in which numerous more or less autonomous units are waiting for certain events to occur should be good use-cases.

Dave Clausen · Answer

It might help to think of what Erlang was originally designed to do, which was to manage telecommunications. Activities like routing, switching, sensor collection/aggregation, etc.
Bringing this into the web world - consider a system like Twitter. The system probably wouldn't use microthreads in generating web pages, but it could use them in its collection/caching/distribution of tweets.

When would you need "hundreds of thousands" of threads?

8 Answers

Add your own answers!

Ask a Question