AnswerBun.com

C++ - checking a string for all values in an array

Stack Overflow Asked on January 5, 2022

I have some parsed text from the Vision API, and I’m filtering it using keywords, like so:

    if (finalTextRaw.find("File") != finalTextRaw.npos)
{
    LogMsg("Found Menubar");
}

E.g., if the keyword "File" is found anywhere within the string finalTextRaw, then the function is interrupted and a log message is printed.

This method is very reliable. But I’ve inefficiently just made a bunch of if-else-if statements in this fashion, and as I’m finding more words that need filtering, I’d rather be a little more efficient. Instead, I’m now getting a string from a config file, and then parsing that string into an array:

    string filterWords = GetApp()->GetFilter();
    std::replace(filterWords.begin(), filterWords.end(), ',', ' ');  ///replace ',' with ' '
    vector<int> array;
    stringstream ss(filterWords);
    int temp;
    while (ss >> temp)
        array.push_back(temp); ///create an array of filtered words

And I’d like to have just one if statement for checking that string against the array, instead of many of them for checking the string against each keyword I’m having to manually specify in the code. Something like this:

        if (finalTextRaw.find(array) != finalTextRaw.npos)
{
    LogMsg("Found filtered word");
}

Of course, that syntax doesn’t work, and it’s surely more complicated than that, but hopefully you get the idea: if any words from my array appear anywhere in that string, that string should be ignored and a log message printed instead.

Any ideas how I might fashion such a function? I’m guessing it’s going to necessitate some kind of loop.

3 Answers

As pointed out by Thomas, the most efficient way is to split both texts into a list of words. Then use std::set_intersection to find occurrences in both lists. You can use std::vector as long it is sorted. You end up with O(n*log(n)) (with n = max words), rather than O(n*m).

Split sentences to words:

auto split(std::string_view sentence) {
    auto result = std::vector<std::string>{};
    auto stream = std::istringstream{sentence.data()};    

    std::copy(std::istream_iterator<std::string>(stream),
              std::istream_iterator<std::string>(), std::back_inserter(result));

    return result;
}

Find words existing in both lists. This only works for sorted lists (like sets or manually sorted vectors).

auto intersect(std::vector<std::string> a, std::vector<std::string> b) {
    std::sort(a.begin(), a.end());
    std::sort(b.begin(), b.end());

    auto result = std::vector<std::string>{};
    std::set_intersection(std::move_iterator{a.begin()},
                          std::move_iterator{a.end()}, 
                          b.cbegin(), b.cend(),
                          std::back_inserter(result));

    return result;
}

Example of how to use.

int main() {
    const auto result = intersect(split("hello my name is mister raw"),
                                  split("this is the final raw text"));

    for (const auto& word: result) {
      // do something with word
    }
}

Note that this makes sense when working with large or undefined number of words. If you know the limits, you might want to use easier solutions (provided by other answers).

Answered by local-ninja on January 5, 2022

Borrowing from Thomas's answer, a ranged for loop offers a neat solution:

for (const auto &word : words)
{
   if (finalTextRaw.find(word) != std::string::npos)
   {
        // word is found.
        // do stuff here or call a function.
        break;  // stop the loop.
   }
}

Answered by Paul Sanders on January 5, 2022

You could use a fundamental, brute force, loop:

unsigned int quantity_words = array.size();
for (unsigned int i = 0; i < quantity_words; ++i)
{
   std::string word = array[i];
   if (finalTextRaw.find(word) != std::string::npos)
   {
        // word is found.
        // do stuff here or call a function.
        break;  // stop the loop.
   }
}

The above loop takes each word in the array and searches the finalTextRaw for the word.

There are better methods using some std algorithms. I'll leave that for other answers.

Edit 1: maps and association
The above code is bothering me because there are too many passes through the finalTextRaw string.

Here's another idea:

  1. Create a std::set using the words in finalTextRaw.
  2. For each word in your array, check for existence in the set. This reduces the quantity of searches (it's like searching a tree).

You should also investigate creating a set of the words in array and finding the intersection between the two sets.

Answered by Thomas Matthews on January 5, 2022

Add your own answers!

Related Questions

Update a with react

1  Asked on December 30, 2021 by mara-oliveira

     

Euclidean distance of all pandas rows to single row

1  Asked on December 30, 2021 by aquamad96

     

Different result with online compiler and visual code

0  Asked on December 30, 2021 by swayamjeet-swain

   

recursive function debug asserion failed

2  Asked on December 30, 2021 by steven-zhou

   

SQL Grant Execute On Object where do I put GO?

2  Asked on December 30, 2021 by andy-williams

   

Split string by comma unless followed by a space or a ‘+’

3  Asked on December 30, 2021 by callum-brown

       

Get values of cells in dataframe column quickly

2  Asked on December 30, 2021 by user11035198

     

Parsing XML values in Javascript

2  Asked on December 30, 2021 by topcat3

   

split sequences

1  Asked on December 30, 2021 by backgroup

 

when checkbox checked value true, display text not displayed

4  Asked on December 30, 2021 by user11247496

     

.net core 2.0 logging inside Kubernetes pod console

1  Asked on December 30, 2021 by abhay

       

XML vs JSON vs SQLite for only reading data

9  Asked on December 30, 2021 by brandon-cornelio

   

Ask a Question

Get help from others!

© 2023 AnswerBun.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP