Decoding New Jersey Driver's License Codes

Question

Driver's License numbers in New Jersey aren't random. They follow the format: Affff lllii mmyye, where A is the first letter of the person's last name, ffff is some mapping of the remaining letters of the last name to a four digit numeric, lll is a mapping of the full first name to a three digit numeric and ii is a code representing the middle initial (according to the below table: | | 6 | 7 | 8 | |---|---|---|---| | 1 | a | j | | | 2 | b | k | s | | 3 | c | l | t | | 4 | d | m | u | | 5 | e | n | v | | 6 | f | o | w | | 7 | g | p | x | | 8 | h | q | y | | 9 | i | r | z | Where the number corresponding to the initial is 10*column number + row number. mm corresponds to the month born, and yy to the year born. e is the eye color (a value 1-8 corresponding to BRO, BLU, GRY, GRN, BLK, etc.) The only thing I don't understand is how the names are mapped to the integer values. I only have 5 examples for the last name mappings: (ignoring the first letter because it doesn't play into the mapping aab -> 0001 ackson -> 0062 eals -> 2024 eimel -> 2278 ounds -> 6810 For first names, I only have four: Alexander -> 019 Richard -> 655 John -> 407 Matthew -> 529 Does anyone have any ideas how the implementation is done, or even a general mapping function that will hash a max 25 length string to a four digit or three digit number while maintaining lexicographical order (<=, not <). Things I've Tried Convert each letter to a number 1-26. Then, taking only the first four numbers, create the number by the rule 26^3 * first number + 26^2 * second number + 26 * third + fourth. Then, divide this number by 26^4 + 26^3 + 26^2 + 26, and multiply by 10000 to map the decimal into 0-9999. This produces the following mappings: aab -> 0000 ackson -> 0035 eals -> 1547 emiel -> 1722 ounds -> 5695 Get a list of the top 10,000 most common surnames. Order by the second letter, and then check the index. This produces the following mappings: aab -> 0005 ackson -> 0128 eals -> 2813 emiel -> 3235 ounds -> 7588 Each letter subdivides the 10,000. The first number (according to 1-26) cuts it into one of 26 pieces. The second cuts the piece into one of 26, and so on and so forth. This produces the following mappings: aab -> 0000 ackson -> 0028 eals -> 1536 emiel -> 1648 ounds -> 5656 Convert each of the first four letters to 1-26. Concatenate all of them, multiply the resulting number by 10,000, and divide by 26262626. This produces the following mappings: aab -> 0003 ackson -> 0392 eals -> 1908 emiel -> 1953 ounds -> 5792 Do the above with 0-25, divide by 25252525. This produces the following mappings: aab -> 0000 ackson -> 0008 eals -> 1584 emiel -> 1631 ounds -> 5623 Additional Samples While I believe all of the above samples are correct, I tried to track down more authentic sample data points. Ones that I can guarantee are below: Last Names avis -> 0921 eals -> 2024 olff -> 6247 orello -> 6581 First Names Alexander -> 019 Andrew -> 042 Gabriel -> 270 Lena -> 456

BitShifter · Answer

Many states use something called SoundEx to generate license numbers (sometimes you even see SoundEx on government forms and/or computer screens when they ask for drivers license numbers.)

The soundex system was designed to phonetically map names that sound similar to close values, even though they might be spelled wildly differently eg Pheiffer vs Fifer)

See also things like Metaphone.  Also, they may not use soundex directly.

Wikipedia Soundex

Edward · Answer

This is not yet a complete answer, but perhaps what I've found can be combined with other information to come up with the complete solution.

First name encoding

If we assume a linear encoding, then we have everything needed to figure this out based on your four samples.  If we consider letter values as a=0, b=1, ... regardless of whether they're uppercase or lowercase, your four samples can be turned into four linear equations:

a*0 +b*11+c*4 +d*23 =  19  (Alex)
a*12+b*0 +c*19+d*19 = 529  (Matt)
a*9 +b*14+c*7 +d*13 = 407  (John)
a*17+b*8 +c*2 +d*7  = 655  (Rich)

Since we have four equations and four unknowns, it's easily solved using simple but tedious algebra or in matrix form using Gaussian elimination.  (Sorry for the ugly looking math, but unlike other StackExchange sites apparently ReverseEngineering doesn't support MathML, which is unfortunate.)

If you do so, you get the following values:

a = 83700 / 2279
b = 9484  / 2279
c = 16030 / 2279
d = −5441 / 2279

All very neat and accurate, but there's a problem, which is that any four samples would result in some answer.  The question is whether it works for all possible names, and unfortunately, the answer is no.

Further samples

I did some searching on the internet and found a few more samples.  Here's an image of a Russian spy's New Jersey license and here is a Police guide (see page 60).  This pamphlet from the NJ MVC encodes "Dennis J. Driver" as  D4047-16371

If we try the first name equation above on these new samples, they fail, so it's not quite right.  The result suggests that the weighting is not quite so simple.  When searching, I also found that both Ontario and Québec licenses appear to use the same first and last name encodings.  So for example, this temporary Ontario permit verifies that "Dennis" is encoded as 163 in Ontario as well as in New Jersey.

When I run a linear regression on all of the first name values vs. the first letter l (encoded as a=0, b=1, ...) I get the equation 32.42*l+52.55 with an R^2 value of 0.986 which shows this to be highly linear.

Last name experiment

I tried a very simple experiment with the last name encoding which was a very simplistic method not mentioned in your list of things you have tried.  That was to simply consider each character as a base-26 digit.  Using the 4 characters following the first, the encodings for "Baab" and "Jackson" are correctly obtained, but no others matched.

Other encoding schemes

I did some searching for existing encoding schemes.  Soundex was both easily found and easily discounted, but there are many variations to it and it's possible that some expanded variation was used. I was not able to locate a Soundex variant that produced these particular values, but I learned some interesting things along the way.

First, perhaps not surprisingly, there has long been a need to try to match up names in a database using some kind of encoding.  Generically, the problem is called record-linking and is typically thought of as mathing a possibly misspelled name to a subset of possible matches in a database.  Soundex has been used for this purpose, but found to be somewhat lacking in effectiveness.

Other schemes I have located, or at least located references to include:

Levenshtein edit distance
Jaro record-linkage methodology
various phonetic algorithms
Cutter-Sanborn Four-Figure used to encode author names for libraries

This stringmetric project has what appears to be a nice collection of algorithm implementations with links to the original describing papers, but I haven't tried all of these.

Perhaps if someone does, they can report back here.

alexandra · Answer

I don't see this above, but male or female is coded in as well. in the last five digits, the first 2 are month of birth. Males are  01-12. Females 50 is added. so the run from 51 (january) to 62 (december)
Also, my name is Alexandra, which is also 019 as is your example of alexander.
   The absence of a middle name is reflected as 00
i know a friend with middle name alexandra has 61 = (ii)
another, is Serafina middle name 82 = (ii)
another, is Dorothy middle name  64 = (ii)
I would suggest collecting more name samples to compare

boat · Answer

In case you're still trying to figure this out, I've made some progress. With assistance from u/jccool5000 on reddit (post), who has a collection of over 900 samples mostly from Ontario. AFAIK, Ontario and NJ share the same encoding - Quebec, not so sure. I did some data manipulation to figure this out.

Starting with the numbers of the last name, 1st of 4 digits corresponds to the 2nd letter of the last name, as the 1st is already coded directly to the first letter of the license number.

0 = A
1 = B C D
2 = E
3 = F G H
4 = I J K
5 = L M N
6 = O
7 = P Q R
8 = S T
9 = U V W X Y Z

The remaining three numerical digits codes the second letter of the last name as well, from 000-999. However, each second-digit has its own 000-999 range. That is to say:

Hypothetical last name XA is X0001
Hypothetical last name XAZZZ is X0999, or something close to 999.
Hypothetical last name XB is X1001 
Hypothetical last name XDZZZ is X1999, or something close to 999.
Hypothetical last name XE is X2001
Hypothetical last name XEZZZ is X2999, or something close to 999.

You can refer to the above table to see when the 999 will reset back to 000. This is just the pattern I've found so far. I don't know how the numbers are distributed to the names.

First name code is a lot simpler, but at the same time, it's also not evenly distributed. The difference with first name code is it only goes from 000 (Aaron) to probably 799 (796 for Zoe). What I mean by not evenly distributed is names that start with A range from 000 to 071, which 071 has some names that start with BA. Meanwhile, names that begin in Y are confined to a small range of no less than 785 to no more than 792.

bozz · Answer

I believe the name is represented in something called the Soundex Code: the numbers represent the sounds of the last & first name. I can't quickly find the article I read breaking down the code for NJ licenses, but here is a general entry:
https://www.tutorialgateway.org/sql-soundex-function/

Alex Beals · Answer

I filed a FOIA request with the DMV. As I said many years ago, this was almost definitely protected information and likely to be rejected, but here's the formal response to that effect.
Request #W165699
Under the New Jersey Open Public Records Act, N.J.S.A. 47:1A-1 et seq., I am requesting an opportunity to inspect or obtain copies of public records that describe the algorithm for mapping first and last names to drivers license ID numbers (a Soundex-esque derivative). If there are any fees for searching or copying these records, please inform me if the cost will exceed $10. However, I would also like to request a waiver of all fees in that
the disclosure of the requested information is in the public interest. This information is not being sought for commercial purposes.The New Jersey Open Public Records Act requires a response time of seven business days. If access to the records I am requesting will take longer than this amount of time, please contact me with information about when I might expect copies or the ability to inspect the requested records. Preferably I would
like to receive all information through electronic records sent to my email address. If you deny any or all of this request, please cite each specific exemption you feel justifies the refusal to release the information and notify me of the appeal procedures available to me under the law. Thank you for considering my request.
Response (Excerpted)
The algorithm information you seek is exempt from disclosure by the Drivers' Privacy Protection Act, the
Open Public Records Act, New Jersey Court Rules and Executive Order Number 21.
Further, N.J.S.A. 47:1A-1.1 provides:

A government record shall not include the following
information which is deemed to be confidential for
the purposes of P.L. 1963, c. 73 (C.47:1A-1 et seq.)
as amended and supplemented:...trade secrets and
proprietary commercial or financial information
obtained from any souce...

Decoding New Jersey Driver's License Codes

6 Answers

First name encoding

Further samples

Last name experiment

Other encoding schemes

Add your own answers!

Ask a Question