TransWikia.com

OpenMP outputs incorrect answers

Stack Overflow Asked by Mohamed Rady on November 27, 2020

I have typed this simple code to calculate the number of prime numbers between 2 and 5,000,000.
The algorithm works fine and it outputs the correct answer, however when I try to use OpenMP to speedup the execution it outputs a different answer every time.

#include "time.h"
#include "stdio.h"
#include "omp.h"
int main()
{
    clock_t start = clock();
    int count = 1;
    int x;
    bool flag;
    #pragma omp parallel for schedule(static,1) num_threads(2) shared(count) private(x,flag)
    for (x = 3; x <= 5000000; x+=2)
    {
        flag = false;
        if (x == 2 || x == 3)
            count++;
        else if (x % 2 == 0 || x % 3 == 0)
            continue;
        else
        {
            for (int i = 5; i * i <= x; i += 6)
            {
                if (x % i == 0 || x % (i + 2) == 0)
                {
                    flag = true;
                    break;
                }
            }
            if (!flag)
                count++;
        }
    }
    clock_t end = clock();
    printf("The execution took %f msn", (double)end - start / CLOCKS_PER_SEC);
    printf("%dn", count);
}

The code doesn’t work for any number of threads, dynamic or static scheduling or different chunk sizes.
I have tried messing with private and shared variables but it still didn’t work and declaring x and flag inside the for loop didn’t work either.
I am using Visual Studio 2019 and I have OpenMP support enabled.
What’s the problem with my code ?

2 Answers

You have race conditions with your count variable where multiple threads can try to update it at the same time. The easy fix is to use an OpenMP reduction() clause to give each thread a private copy of the variable and have them all get added up properly:

#include <time.h>
#include <stdio.h>
#include <stdbool.h>

int main(void)
{
    clock_t start = clock();
    int count = 1;
#pragma omp parallel for schedule(static,1) num_threads(2) reduction(+:count)
    for (int x = 3; x <= 5000000; x+=2)
    {
        bool flag = false;
        if (x == 2 || x == 3)
            count++;
        else if (x % 2 == 0 || x % 3 == 0)
            continue;
        else
        {
            for (int i = 5; i * i <= x; i += 6)
            {
                if (x % i == 0 || x % (i + 2) == 0)
                {
                    flag = true;
                    break;
                }
            }
            if (!flag)
                count++;
        }
    }
    clock_t end = clock();
    printf("The execution took %f msn", (double)end - start / CLOCKS_PER_SEC);
    printf("%dn", count);
}

This outputs 348513 (Verified as the right number through other software).


Also note cleaned up headers and moving some variable declarations around to avoid the need for a private() clause.

You could also make count an atomic int, but that's slower than using reduction() in my testing.

Correct answer by Shawn on November 27, 2020

Just to add to the answer provided by @Shawn, besides solving the count race condition using the reduction. You can also see if your code has load balancing issues, looking at the iterations of the loop that you are parallelizing it is clear that not all iterations have the same among of work. Since you are assigning work to threads in a static manner you might have one thread doing much more work that the other. Play around with the dynamic schedule to see if you notice any difference.

Besides that IMO you can simplified a lot your sequential code by remove all those conditional branching that negatively affects the performance of your parallel version.

First you do not need (x == 2), since int x = 3;. You do not need (x == 3) either, just remove it make count=2; (instead of count=1;), and int x = 5;, since you are increment in steps of 2 (i.e., x+=2). With this you can also remove this:

if (x == 2 || x == 3)
    count++;

Now because the loop starts at 5, and has an increment step of 2, you know that it will be iterating over odd numbers only, so we can remove also x % 2 == 0 . Now we only have an if( x % 3 == 0) continue; else{..}, which can be simplified into if(x % 3 != 0){..}.

You can rewrite the code also to remove that break:

#pragma omp parallel for schedule(static,1) num_threads(2) reduction(+:count)
for (int x = 5; x <= 5000000; x += 2) {
    boolean flag = false;
    if (x % 3 != 0) {
        for (i = 5; !flag && i * i <= x; i += 6) {
            flag = (x % i == 0 || x % (i + 2) == 0);
        }
        if (!flag) {
            count++;
        }
    }
}

because you are using C/C++ you can even remove that if as well:

    int count = 2;
    #pragma omp parallel for schedule(static,1) num_threads(2) reduction(+:count)
    for (int x = 5; x <= 5000000; x += 2) {
        if (x % 3 != 0) {
            int flag = 1;
            for (int i = 5; flag && i * i <= x; i += 6) {
                flag = x % i != 0 && x % (i + 2) != 0;
            }
            count += flag;
        }
    }
    printf("%dn", count);

IMO the code is more readable now, we could further improve by given a good name to the variable flag.

Answered by dreamcrash on November 27, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP