How are the CPU cores distributed to each kernel in parallelization calculation?

Question

Just want to make sure that I understand correctly before I ask questions. I saw some people saying that some functions in Mathematica will automatically use multi-cores (I am not referring to those ones that we parallelize, but referring to those like NIntegrate), so I think if I have 2 cores, it will be faster than single core. So my questions is if I have a code like following:
ParallelTable[NIntegrate[x, {x, 1, 3}], {loop, 1, 3}]
I think three kernels will be launched. If I have 4 cores, how are these four cores distributed to each kernel? (Since I think each kernel can use multi-cores based on the property of function integration)

Andreas Lauschke · Answer

Welcome noo-b, m.se is a great community for infinite learning about M!
I think you have a few false assumptions:
First, even single-threaded operations can thread over multiple cores. A good operating system tries to avoid that, but every so-and-so many seconds, it may switch to another core, or it may split the load over multiple cores -- although the latter usually not for an extended time.
Second, you can't assume that NIntegrate will always parallelize for all inputs, and in particular you can't assume that NIntegrate will parallelize for the entire computation time. It may parallelize for only the initialization or at the end, or at select tasks in between. For example,
Do[Do[NIntegrate[x,{x,1,3}],{3}],{100000}]

if you look at the core utilization (not: process utilization, like in a simple task manager) -- if you're on Linux, you can run top and hit 1 -- you will see that this spends 99% of the time on one core. It may switch the core after some time, but then you see 99% for that core. So I don't see NIntegrate threading over multiple cores at all, at least not all the time (perhaps for fractions of seconds). This may be different for different NIntegrate inputs, but this simple example shows that NIntegrate doesn't always parallelize and not for the whole duration of its computation.
With the M parallelism framework this doesn't change, it's really an operating system matter. With ParallelTable (and brethren) you're just supplying processing tasks from more processes, and how the o/s schedules that to cores is entirely up to the o/s. So you can't really "back out" the assignment to cores from an understanding of the parallel processes.
somewhat of a tangent:
In Scala, Java or C# (or many other languages) you can schedule tasks on a thread level. But even then it's up to the o/s to schedule treads to cores. With Java's vmstat you have a wonderful visualization of the threads (horizontal bars that grow over time, one per thread), I think what you're really interested in is how things work in the threads, not necessarily how the threads are assigned to cores. With that said though, threads are a software concept, not a hardware concept, a core doesn't know what a thread is. But I think a thread analysis would tell you more to understand concurrency as the assignment to cores, and core switching, and percentages of workload for every core, is entirely up to the o/s.

Szabolcs · Answer

There are some functions that automatically use multiple cores. How many cores they use is determined by some of the settings in SystemOptions["ParallelOptions"].
If you use such functions on subkernels, they will use only a single core. You can verify this by looking at ParallelEvaluate@SystemOptions["ParallelOptions"]. Notice that all thread counts are set to 1 on subkernels.
Generally, explicit parallelization (such as ParallelTable) is not as efficient as the built-in parallelization of some functions. Thus, if your bottleneck is a function that already runs in parallel, then implementing additional parallelization with ParallelTable or related functions will slow it does (or at least it did slow it down in all cases I checked).

How are the CPU cores distributed to each kernel in parallelization calculation?

2 Answers

Add your own answers!

Ask a Question