TransWikia.com

C++ - checking a string for all values in an array

Stack Overflow Asked on January 5, 2022

I have some parsed text from the Vision API, and I’m filtering it using keywords, like so:

    if (finalTextRaw.find("File") != finalTextRaw.npos)
{
    LogMsg("Found Menubar");
}

E.g., if the keyword "File" is found anywhere within the string finalTextRaw, then the function is interrupted and a log message is printed.

This method is very reliable. But I’ve inefficiently just made a bunch of if-else-if statements in this fashion, and as I’m finding more words that need filtering, I’d rather be a little more efficient. Instead, I’m now getting a string from a config file, and then parsing that string into an array:

    string filterWords = GetApp()->GetFilter();
    std::replace(filterWords.begin(), filterWords.end(), ',', ' ');  ///replace ',' with ' '
    vector<int> array;
    stringstream ss(filterWords);
    int temp;
    while (ss >> temp)
        array.push_back(temp); ///create an array of filtered words

And I’d like to have just one if statement for checking that string against the array, instead of many of them for checking the string against each keyword I’m having to manually specify in the code. Something like this:

        if (finalTextRaw.find(array) != finalTextRaw.npos)
{
    LogMsg("Found filtered word");
}

Of course, that syntax doesn’t work, and it’s surely more complicated than that, but hopefully you get the idea: if any words from my array appear anywhere in that string, that string should be ignored and a log message printed instead.

Any ideas how I might fashion such a function? I’m guessing it’s going to necessitate some kind of loop.

3 Answers

As pointed out by Thomas, the most efficient way is to split both texts into a list of words. Then use std::set_intersection to find occurrences in both lists. You can use std::vector as long it is sorted. You end up with O(n*log(n)) (with n = max words), rather than O(n*m).

Split sentences to words:

auto split(std::string_view sentence) {
    auto result = std::vector<std::string>{};
    auto stream = std::istringstream{sentence.data()};    

    std::copy(std::istream_iterator<std::string>(stream),
              std::istream_iterator<std::string>(), std::back_inserter(result));

    return result;
}

Find words existing in both lists. This only works for sorted lists (like sets or manually sorted vectors).

auto intersect(std::vector<std::string> a, std::vector<std::string> b) {
    std::sort(a.begin(), a.end());
    std::sort(b.begin(), b.end());

    auto result = std::vector<std::string>{};
    std::set_intersection(std::move_iterator{a.begin()},
                          std::move_iterator{a.end()}, 
                          b.cbegin(), b.cend(),
                          std::back_inserter(result));

    return result;
}

Example of how to use.

int main() {
    const auto result = intersect(split("hello my name is mister raw"),
                                  split("this is the final raw text"));

    for (const auto& word: result) {
      // do something with word
    }
}

Note that this makes sense when working with large or undefined number of words. If you know the limits, you might want to use easier solutions (provided by other answers).

Answered by local-ninja on January 5, 2022

Borrowing from Thomas's answer, a ranged for loop offers a neat solution:

for (const auto &word : words)
{
   if (finalTextRaw.find(word) != std::string::npos)
   {
        // word is found.
        // do stuff here or call a function.
        break;  // stop the loop.
   }
}

Answered by Paul Sanders on January 5, 2022

You could use a fundamental, brute force, loop:

unsigned int quantity_words = array.size();
for (unsigned int i = 0; i < quantity_words; ++i)
{
   std::string word = array[i];
   if (finalTextRaw.find(word) != std::string::npos)
   {
        // word is found.
        // do stuff here or call a function.
        break;  // stop the loop.
   }
}

The above loop takes each word in the array and searches the finalTextRaw for the word.

There are better methods using some std algorithms. I'll leave that for other answers.

Edit 1: maps and association
The above code is bothering me because there are too many passes through the finalTextRaw string.

Here's another idea:

  1. Create a std::set using the words in finalTextRaw.
  2. For each word in your array, check for existence in the set. This reduces the quantity of searches (it's like searching a tree).

You should also investigate creating a set of the words in array and finding the intersection between the two sets.

Answered by Thomas Matthews on January 5, 2022

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP