TransWikia.com

Generate unique random strings considering collisions

Code Review Asked by user2652379 on December 30, 2020

I’m coding a random strings generator. It’s working, but I’m not sure if this is efficient.

/**
 * @param n          total number of strings to generate
 * @param length     length of generated strings
 * @param charsetStr charset to be used when generate
 */
public static List<String> generateRandomStrings(int n, int length, @NonNull String charsetStr) {
    if (n < 1 || length < 1 || charsetStr.length() < 1) {
        throw new IllegalArgumentException(String.format("Illegal argument(s): %d, %d, %s", n, length, charsetStr));
    }

    //remove duplicate chars
    Set<Character> charset = new HashSet<>(charsetStr.length());
    StringBuilder sb = new StringBuilder();
    for (int i = 0; i < charsetStr.length(); i++) {
        char c = charsetStr.charAt(i);
        if (charset.add(c)) {
            sb.append(c);
        }
    }
    charsetStr = sb.toString();

    BigDecimal caseNum = BigDecimal.valueOf(charsetStr.length()).pow(length);
    BigDecimal nBd = BigDecimal.valueOf(n);
    if (caseNum.compareTo(nBd) < 0) {
        throw new IllegalArgumentException(
                String.format("Number of possible strings cannot exceed the requested size: (length of '%s') ^ %d < %d", charsetStr, length, n));
    }

    if (nBd.divide(caseNum, 1, RoundingMode.HALF_UP).compareTo(BigDecimal.valueOf(5, 1)) < 0) {
        //when fill factor is below 50%
        //generate until target size is reached
        Set<String> results = new HashSet<>(n);
        while (results.size() < n) {
            results.add(RandomStringUtils.random(length, charsetStr));
        }
        return new ArrayList<>(results);
    }

    // when fill factor is above 50%
    // pre-generate all possible strings
    List<String> results = new ArrayList<>(caseNum.intValue());
    generateAllPossibleStrings(results, "", charsetStr, length);

    // then shuffle the collection and pick target sized sublist
    Collections.shuffle(results);
    return results.subList(0, n);
}

public static void generateAllPossibleStrings(@NonNull List<String> results, @NonNull String prefix, @NonNull String charset, int length) {
    if (prefix.length() >= length) {
        results.add(prefix);
        return;
    }

    for (int i = 0; i < charset.length(); i++) {
        generateAllPossibleStrings(results, prefix + charset.charAt(i), charset, length);
    }
}

Any advice would be grateful, but my main focus here is the collisions. I’m using org.apache.commons.lang3.RandomStringUtils to generate strings if n is less than 50% of the number of all possible strings(let’s call it "fill factor"). If not, to avoid unnecessary collisions, I generate all possible strings using generateAllPossibleStrings, then shuffle the result and get a sublist from it.

The reasons I used 2 methods for generating strings are:

  • If the fill factor is low(like 1~2%), using generateAllPossibleStrings seemed overkill. So I used org.apache.commons.lang3.RandomStringUtils, hoping collisions not matter much.
  • If the fill factor is high(like more than 90%), using org.apache.commons.lang3.RandomStringUtils seems inefficient since a lot of collisions will occur as time goes by. So I used generateAllPossibleStrings in this case, dropping without any collision looks more efficient than generating.

There is no big reason why I choose 50% as a diverging point; I just used my hunch. This too may need to be fixed.

How can I make this more time-efficient?

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP