TransWikia.com

Using for_each instead of iterators to avoid iterator invalidation

Software Engineering Asked by scttrbrn on January 18, 2021

I am writing a simple custom (special purpose) container and would like to allow for iteration over each element, however, avoid using iterators due to the problem of iterator invalidation.

Instead of providing a pair of iterators (via begin() and end()) I was thinking to provide a for_each method that iterates over the elements and passes them to a functor. The method would increment a counter on entry and decrement it on exit. If the counter is non-null, every time a method that would modify the container (and cause iterator invalidation) is called, it would return early, resulting in a no-op. See the following for a simple example

#include <cstdlib>
#include <cstdio>
#include <cstring>
#include <cassert>


struct FloatArray {

    float *data = nullptr;
    size_t count = 0;
    size_t niter = 0;


    float *begin() { return data; }
    float *end() { return data + count; }

    template<typename Fn>
    void
    for_each(Fn &&fn)
    {
        niter++;
        for (size_t k = 0; k < count; k++) {
            fn(data[k]);
        }
        niter--;
    }
    
    void
    push_back(float f)
    {
        if (niter) {
            /* Return early if the container is being iterated over. */
            return;
        }
        float *new_data = new float[count + 1];
        memcpy(new_data, data, sizeof(*data) * count);
        if (data) {
            delete [] data;
        }
        new_data[count] = f;
        data = new_data;
        count++;
    }
};


int
main(int argc, char **argv)
{
    FloatArray arr;

    arr.push_back(1.0f);
    arr.push_back(2.0f);
    arr.push_back(3.0f);

    arr.for_each(
        [&arr](float f)
        { 
            if (f == 1.0f) {
                arr.push_back(42.0f); // no-op
            }
            printf("%fn", f); 
        }
    );
    for (float f : arr) {
        if (f == 1.0f) {
            arr.push_back(42.0f); // Undefined behaviour
        }
        printf("%fn", f);         
    }

    arr.push_back(4.0);
    return 0;
}

I guess similar behavior could be achieved using iterators pointing back to the container incrementing a counter on construction and decrementing it on destruction, but is think this (and iterators in general) are way more complex to implement (and understand) than the for_each method.

One concern would be that the "iterator counter" could overflow, but I am not expecting that many nested iterations.

I guess my question is: Would using a for_each method make more sense for special purpose containers, that do not need iterators for reasons other then iterating over its elements (e.g. specifying a range, or being a reference passed to a container method), since it is more simple to implement and understand (IMO) and (more easily) avoids complexities such as iterator invalidation? Also, are there better implementations then the one described above for this kind of use case?

One Answer

Would using a for_each method make more sense for special purpose containers [...]?

for_each makes a lot of sense to me for certain containers where the design of a standard iterator doesn't map so efficiently either for CPU or programmer or both. Consider this very simple example of iterating efficiently through set bits of a bitset:

// Calls 'func(n)' for each nth bit which is set.
template <class Func>
void Bitset::for_each(Func func)
{
    size_t start=0;
    for(uint64_t bits64: all_bits64)
    {
        const uint64_t bits64 = data[j];
        if(bits64 == 0xffffffffffffffffull)
        {
             // Iterate through all 64 bits if they're all set.
             for(size_t j=0; j < 64; ++j)
                 func(start + j);
        }
        else if(bits64 != 0x0ull)
        {
             // Test each individual bit if some of the 64 bits are set.
             // Can make this even more elaborate and efficient in dense cases
             // using FFZ/FFS.
             for(size_t j=0; j < 64; ++j)
             {
                 if(bits64 & (1ull << j))
                     func(start + j);
             }
        }
        start += 64;
    }
}

Something like that can be very difficult or impossible to implement as efficiently using a flat forward iterator design without additional branching in operator++. I often wish that at least ranged-based for loops used a generic for each function of this sort behind the hood (that we can overload) over standard iterators and begin/end pairs since I have cases (along with measurements) where such for_each methods can be up to two or three times as fast when there's light work being performed in each iteration.

Iterator Invalidation

In your case though, I think merely ignoring mutation requests during traversal is even more prone to confusion and mistakes in lots of circumstances than dealing with (and documenting) iterator invalidation. At the very least I think you could assert(!"Trying to mutate the container while iterating!"); or something to this effect rather than simply ignoring such requests. I dealt with a monstrous codebase a couple of decades ago where the developers had a habit of turning erroneous requests that failed to satisfy requirements like required preconditions into no-ops, and the result after years of doing that was a bug haven so difficult to maintain where many bugs had to exist in the software to avoid causing multiple new ones. I'd strongly discourage ever coding like this where we simply ignore violations of design requirements. That can create monstrosities of a sort rivaling a codebase that ignores endless failing unit and integration tests for years on end and actually starts to come to depend on most of the tests failing to produce the desired behavior.

If you are really interested in tackling the safety of such things (along with possible thread-safety in cases where mutations would otherwise occur in parallel) and want to put in the elbow grease, I've gotten the most mileage out of using persistent data structures for such cases. They forbid structural sharing of all but immutable data/memory blocks, so things like iterator invalidation become a non-concern even in multithreading cases since the shared data is never allowed to be mutated in place. A request to make changes results in a new version of the data structure with a copy-on-write approach and forbidden from mutating anything shared that exists (like what we're iterating over, making even iteration using iterators protected from invalidation; the iterators themselves store lightweight copies using value semantics of the persistent structure to prevent invalidation rather than pointers/references).

Such structures revolving around value/copy semantics do come with their overhead, but I've found I can more than compensate for that with parallel execution by having a codebase using them where thread-safety is a no-brainer. There is also a C++ library called immer which provides such structures already implemented for you along with a very interesting presentation from the author here. I haven't looked into it in depth but I suspect it also avoids iterator invalidation.

Correct answer by user377672 on January 18, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP