I need to be able to search for compound words like “menneskerettighedsforkæmperens” (the human rights activist’s) and find the words “menneske” (human), “rettighed” (right) and “forkæmper” (activist).
I have all the words (and hundreds of thousands of other words) listed in MySQL, all in separate rows with additional information about each word (like hyphenation, which is what I need the DB to return), but I need to search a lot of words at the same time, which can be painfully slow.
I’m currently using a MySQL database, but willing to switch it out for something better suiting my needs, maybe some kind of NoSQL or elasticsearch. But I haven’t been able to find any examples of how to accomplish what I’m looking for in any other type of databases. So if anyone can help me out, I would really appreciate it.
I would look for a library that specializes in splitting compound words. Search Java, PHP, Perl, etc. SQL is not likely to be involved.
If you must use a database, then consider this for a piece of the algorithm:
SELECT word FROM words WHERE word < 'menneskerettighedsforkæmperens' ORDER BY word DESC LIMIT 5;
That is likely to return a few compound words that begin with 'menneske' and perhaps 'menneske', itself.
In MySQL, that query will be quite fast (assuming an index on
word). You would need some code in addition to a few SQL calls.
Answered by Rick James on January 4, 2022
1 Asked on December 26, 2021 by jrdba
1 Asked on December 26, 2021 by rik-bradley
3 Asked on December 24, 2021 by thx-1138
2 Asked on December 24, 2021 by mikkergp
1 Asked on December 24, 2021 by dennis-finke
2 Asked on December 24, 2021
1 Asked on December 24, 2021 by ingus
0 Asked on December 24, 2021
1 Asked on December 24, 2021
1 Asked on December 22, 2021 by user73639
1 Asked on December 22, 2021
1 Asked on December 22, 2021 by hder
1 Asked on December 21, 2021 by scott-magnan
1 Asked on December 21, 2021 by lee-m
1 Asked on December 19, 2021 by adam-heeg
Get help from others!