TransWikia.com

XKCD #936: Short complex password, or long dictionary passphrase?

Information Security Asked on January 9, 2021

How accurate is this XKCD comic from August 10, 2011?

linked and maintained alt-text

I’ve always been an advocate of long rather than complex passwords, but most security people (at least the ones that I’ve talked to) are against me on that one. However, XKCD’s analysis seems spot on to me.

Am I missing something or is this armchair analysis sound?

22 Answers

In an empirical test, passphrases don't seem to help as much as XKCD would have you believe: dl.acm.org/citation.cfm?id=2335356.2335366

Users tend to create passwords that are easy to guess, while system-assigned passwords tend to be hard to remember. Passphrases, space-delimited sets of natural language words, have been suggested as both secure and usable for decades. In a 1,476-participant online study, we explored the usability of 3- and 4-word system-assigned passphrases in comparison to system-assigned passwords composed of 5 to 6 random characters, and 8-character system-assigned pronounceable passwords. Contrary to expectations, system-assigned passphrases performed similarly to system-assigned passwords of similar entropy across the usability metrics we examined. Passphrases and passwords were forgotten at similar rates, led to similar levels of user difficulty and annoyance, and were both written down by a majority of participants. However, passphrases took significantly longer for participants to enter, and appear to require error-correction to counteract entry mistakes. Passphrase usability did not seem to increase when we shrunk the dictionary from which words were chosen, reduced the number of words in a passphrase, or allowed users to change the order of words.

Answered by WBT on January 9, 2021

Seems that most agree that regarding maths, Horse method is superior--to what extent seems to be mostly about limitations like how uniform the choices are, or what are these "easy to remember" or "easy to type" phrases.

Fair enough, but I'll teach you a magic trick on how to make these limitations a "bit" less relevant:

  1. Use Horse method as platform for building new phrases. Use properly random method to choose the words. This may get you some hard words that you never heard and may find hard to remember, but...

That's for following the Horse method blindly. The magic trick is that you don't stop here. Unless you are a desperately boring and un-creative person, you can get a great advantage from the next steps.

  1. Make use of your brain.

    I mean, not just think, have your brain fart out completely new words for you, or completely new methods to distort the existing ones.

    You are allowed (or even encouraged) to replace some words in the phrase with these.

    Also, this rule also applies to everything I write from now on: just go ahead and change the methods arbitrarily ;)

  2. Make use of your other languages.

    Most of us know more than one language. Mix them as you see fit.

    (Special case of this can be making use of different keyboard layout used in your country. For example, in Czech layout, letters with diacritics share keys with numbers---the row above the alphabetic part. This, in fact, creates a mapping of letters and numbers that can supplement or replace the "traditional" L33T. Think about how you can benefit from it.

  3. Invent your own methods.

    ...how to further distort the phrase. You can re-use the method for new passwords, it all depends on how complex method you will create---more complex, more re-usable but don't overdo it :)

  4. Make the process fun!

    Generating a password does not have to be boring. In fact, funnier you make it, the more likely you are to actually remember the password.

    But don't get me wrong: don't make it funny at the cost of uniqueness. Try to use that kind of funny which is funny only to you (ask your brain).

    (Oh and don't make it too funny--you don't want to giggle and blush every time you type your passphrase ;))

If done right, every bit of the above will give the Horse method a great advantage.

Answered by Alois Mahdal on January 9, 2021

The XKCD comic does not explicitly depict that passphrases may contain separator symbols between the words. A natural choice is to add the same symbol between all words. If the app has a show password option, the phrase can be red easily. In theory that adds 5 bits of strength, downgraded to 4 bits, see The "troubador" method of explanation of the mathematics in this comic

The exact downgrading also depends on how easy it is for an attacker to guess the symbol, or first try specific separator symbols. That's why I use 3 distinct sets of symbols to calculate this type of strengthening. Set1: 13 symbols from the iPhone number-symbols, 31 symbols for all of them, and a special set of 3, for the often used separator symbols: space - _

I use the following for the calculation (Excel notation, only if a separator symbol is being used):

strength = >> see below for a simplification =log( NumWordsInCurrentDict*((sepaClassSize^(SepratorLen)+1)*(NumWordsInCurrentDict)^(nWordInPhrase-1)) , 2)

4 Diceware words give the following strengths: No separator: 51.7 bits; A space separator: 53.3 bits; and a separator from all 31 symbols like ^ : 56.7 bits

============= edit en edit2: forgot the log 2, fixed

The simplified formula for the above one is: log( ((sepaClassSize+1)^SepratorLen)*(NumWordsInCurrentDict^numWordsInPhrase) , 2 )

The XKCD comic depicts extensive modification/substitution options for passwords. The Diceware Passphrase Home Page mentions a special modification that is not in the comic pass phrase part: insert just 1 random letter in just 1 of the words of the phrase chosen. That would add another 10 bits of entropy.(see the section "Optional stuff you don't really need to know")

Answered by Dick99999 on January 9, 2021

Bruce Scheiner loves long passphrases but he also has been pointing out the practical difficulties of a long passphrase for many years. Passwords are not echoed on the screen when you type them. You see an asterisk or a big fat black dot on the screen per letter. Even when typing out 7-8 character passwords you occasionally wonder in the middle if you have typed it right. So you backspace everything and start again. Occasionally we forget where we are in the middle of the password & count number of characters already typed and comparing it mentally with the password to figure out where we are. It would be even more difficult to do this with a long passphrase if it's not being echoed. I think long passphrases will happen only after this problem has been effectively solved.

Answered by user93353 on January 9, 2021

I've wondered about this one as well, and I would like to analyze it not from a philosophical point (if users write down their passwords, it becomes something you have instead of something you know... 2-factor becomes 1-factor), but a mathematical and scientific standpoint.

I recently downloaded a GPU password cracking software to play around with. I'd like to crack both of these passwords using that (since it's my new toy) and determine which is better.

For a hypothesis, I would like to also throw out a possible variation--the attacker may know you only use dictionary words and don't enforce symbols and numbers (decreasing the key space).

Scenario 1

- Attacker knows you only used dictionary words. (Keyspace = 26 letters + 26 capitals + 1 space = 53).
- Password requirement is must have 4 dictionary words with a minimum total characters of 20.  

Scenario 2

- Same as Scenario 1, except Attacker doesn't know only dictionary words are used (keyspace increases).

Against 2 control groups where the random passwords contain a 2 numbers, a 2 special characters and is 16 characters long.

Would this be an appropriate test? If not, let me know, and I'll edit the parameters.

Answered by Jeff on January 9, 2021

To add to Avid's excellent answer, the other key messages of the comic are:

  • the appropriate way to calculate the entropy of a password generation algorithm is to calculate the entropy of its inputs, not to calculate the apparent entropy of its outputs (as rumkin.com, grc.com etc. do)
  • minor algorithm variations such as "1337-5p34k" substitutions and "pre/append punct & digits" add less entropy than most users (or sysadmins) think
  • (more subtly) passphrase entropy depends on wordlist size (and number of words), not number of characters, but can still provide easily sufficient entropy to protect against "generation algorithm aware" brute force attacks

To those messages we might wish to add:

  • as a user you can't generally control whether the web site operator uses salting, bcrypt/scrypt/PBKDF2, keeps their password hashes safe, or even whether they hash passwords in the first place -- so you should probably choose passwords that matter on the basis that they don't (e.g. assume 10^9 guesses per second when sizing passwords/phrases, don't reuse passwords and don't use simple "append the site name" techniques) - which probably makes using LastPass/KeePass/hashpass inevitable
  • long complicated words don't add much to the entropy unless you use more than a couple of them (there are only ~500K words in English, which is only 19 bits -- just 8 bits more than a word from Randall's 2048-word list)
  • the "random words" need to be really random for this to work -- picking song lyrics/movie quotes/bible verses gives much lower entropy (e.g. even with perfectly random choice, there are only 700K words in the bible, so there are only ~4M 5-10 word bible phrases, which is only 22 bits of entropy)

Answered by Misha on January 9, 2021

The Openwall Linux pwqgen tool generates passwords with a specific amount of entropy along these very lines.

However, instead of using spaces to separate words, punctuation characters and digits are used instead. Here's ten examples with 49 bits of entropy:

Cruise!locus!frame
tehran!Commit6church
Seller7Fire3sing
Salt&Render4export
Forget7Driver=Tried
Great5Noun+Khaki
hale8Clung&dose
Ego$Clinch$Gulf
blaze5vodka5Both
utmost=wake7spark

The words are harder to read without their spaces, but in the years that I've been using passwords generated with this tool, I haven't found the punctuation to be difficult to remember.

Answered by sarnold on January 9, 2021

As some people already stated (so I'm not going to repeat that), it depends on the mechanism of brute-force attacks and dictionary attacks being used.

First of all, the best way to keep an attacker from attacking is taking away the target in the first place. None of my servers have SSH running on port 22 and root login is most always deactivated in sshd configuration. But that's just an example. Don't give away the user name and you can save yourself a lot of trouble.

That's the simplest of math: Avoid attacks by others by hiding :)

So, for the rest: Those who actually guess the username right and find your service, will try very common brute-force attacks. Short passwords are always a bad idea, because there's no dictionary needed. Cycling through all the alphanumberic combinations in both lower and uppercase and common 'salt' like commas, semicolons and so on would take a few days to crack. Based on my own experience (had an old OpenBSD routing machine setup, but the internet provider password changed and I didn't have physical access to the machine). The password turned out to be [Firstname][Lastname][Number] of some celebrity.

I was curious, so I tried different cracking tools. A name-based one took only six hours to crack the same password. Guess it was cycling through common name/number combinations.

The trick with those brute-force attacks is to know what you're dealing with. A password that is based on something personal, that is encrypted with your own method is still safe from most dictionary attacks and can only be guessed by a simple brute-force attack, which would take years to cycle through all the possibilities.

Give you an example: My name is Andreas, so my password is kinda safe.

MyN4me,A->PwKndSf

According to rumkin, this is kinda safe :) 87.1 bits entropy. Wow. Not bad for a first try. I can actually remember that and most mechanisms will not attempt to 'guess' that kind of a password, because it doesn't make any sense to any of the systems.

Either it's short and complex, like L5q3CR,-F - which is kind of hard to remember but easy to guess, or it consists of variations of actually existing words. It's a human weakness, to help yourself remember things or go for something really simple, or common.

I know, this is a little bit off-topic, but: if you don't want to become a victim of a brute-force attack, lock most of the doors first, or even better: remove/hide the doors :)

  • don't offer the login mechanisms that crackers expect, if you can.
  • protect web-services with client authentication
  • if you're totally paranoid, filter access to your service by IP-address, too
  • secure and totally paranoid setup: (this is unfair to crackers :)) ) after a failed password attempt, for the following 5 seconds, every following attempt for the same user (even a correct one) will fail, too.

If somebody manages to get around all that, you're dealing with pros anyways :) but keep your password secure by doing something human, that nobody expects and no computer can guess or predict: do your own thing, just remember that your own thing has to be long enough to avoid the simple attacks and stay out of the dictionary for the most part. Use something, that only makes sense in YOUR brain and scatter in a few special characters.

For a cracker, a fast way to guess a password is only offered when you do something predictable, like use something short, that's easy to memorize or something that consists of common words, or combinations of letters that you find in dictionaries.

Stay away from those, and you can even stick with rumkins calculations.

Answered by Andreas on January 9, 2021

A lot of the responses to this question raise the obvious point that even if you ask users to use several words separated by spaces for readabilty, too many users will choose words in "banal" phrases, like "i love my mom", which, if crackers were cognizant that such simple phrases were in common use, would be quickly cracked. I assume that some crackers may already be doing this -- they're not stupid after all! But all is not lost.

Too many websites that I have login relationships with require me to make my password more complex (using leetspeakish and other combination requirements for upper/lowercase, numerics and symbols). They enforce making my passwords hard to remember.

What if instead of requiring complexity, a password validattion check was performed that first checked a recognized password dictionary, and once a dictionary attack was eliminated as a problem, it would calculate the strength against a bruteforce attack, and require only that a password be able to hold off bruteforce attack for a trillion years? In that instance, Tr0ub4dor&3 would not pass -- Steve Gibson's search space calculator at https://www.grc.com/haystack.htm says it could be brute-forced in only 1.83 billion centuries, but "hold wine fine cold" could withstand it for 1.43 billion trillion centuries.

What I am trying to say here is to stop forcing users to come up with bizarre character combinations, and actually vet their passwords against dictionary and brute force attacks. I think security could only improve as a result.

Answered by Cyberherbalist on January 9, 2021

Randall is mostly correct here. A few additions:

Of course you have to choose the words randomly. The classic method is Diceware, which involves rolling 5d6, giving almost 13 bits of entropy per word, but the words are more obscure.

There may be 2048 common words in English, but there aren't 2048 short common words in English. The Diceware list (which has 6^5 = 7776 words under 6 letters long) has some pretty obscure stuff in it, plus names, plurals, two-letter combinations, two-digit combinations, 19xx, etc, and I don't think the top quarter of that would be much better. If you just take the top 2k word in English, you get stuff like "multiplication" which is a bit long as part of a 4-word password. So I'd be interested to see Randall's suggested word list.

There are more obvious variations of "Tr0ub4dor&3"-type passwords than of Diceware ones, so in practice the former will have a couple more bits of entropy. Also, in my experience, "Tr0ub4dor&3" type passwords are not actually that hard to remember if you use them often. In the past, I've generated several passwords with strings /dev/urandom, and had no trouble using them as login passwords. Today, though, I couldn't tell you which of the letters were capitalized. On the other hand, I'm not sure I could recite some of my Diceware passwords without confusing homophones, pluralization, etc.

If the password database is stolen, a strengthener like PBKDF2 would add a word or more to to the effective length of one of these passwords, but many sites don't use it. 5 words + one for the strengthener would yield some 66 bits, which is probably too big for a rainbow table. This puts you well out of range of casual attackers, so unless you have something really important on your account you should be fine.

In sum, Diceware-type passwords are ideal for things you type occasionally, but not necessarily every day. If you use a password every day, then Diceware would work, but a strings /dev/urandom password will be shorter and you should be able to memorize it anyway. If you log in rarely, then choose a password in any way you like, toss it into your password manager (which you should use for the more commonly-used passwords too), and forget it.

If a site has some odd restriction, like "no spaces, at least one upper-case letter and at least one number", then string the words together with 5s between them and cap the first letter. This loses epsilon entropy for Diceware and none for Randall's scheme.

Answered by Mike Hamburg on January 9, 2021

I think most of the answers here are missing the point. The final frame is talking about ease of memorization. correct horse battery staple (typed from memory!!) eliminates the fundamental danger of password security -- The Post-IT note.

Using the first password, I've got a Post-IT note in my wallet (if I'm smart) or in my desk drawer (if I'm dumb) which is a huge security risk.

Lets assume that the pass phrase option is only as secure as the munged base word option, then I'm already better off because I've eliminated the human failing in password storage.

Even if I wrote down the pass phrase, it wouldn't look like a password. It might be a shopping list - Bread Milk Eggs Syrup. But 5t4ck3xCh4ng3 is very obviously a password. If I came across that, It would be the first thing I would try.

Answered by Chris Cudmore on January 9, 2021

The issue is still, sadly, a human one.

Will pushing users to alphanumeric + punctuation passwords be safer, or longer passwords?

If you tell them to user alpha + numbers, they will write their name + birthday. If you tell them to use also use punctuation, they will replace an "a" with "@", or something similarly predictable.

If you tell them "use four simple words", they will write "i love my mother", "i love your mother", "thank god its friday" or something else banally predictable.

You just can't win. The advantage of 4 word passwords is, they can memorize it, so if you are going to force users to have strong passwords (which you generate) then at least they won't need to write it down on a post-it note, or email themselves the password, or something else stupid.

Answered by wisty on January 9, 2021

Looking at the XKCD comic, and at examples of real world passwords, we see that most users have passwords much much weaker than the XKCD example.

A bunch of users will do exactly as the first panel says - they'll take a dictionary word, capitalize the first letter, do some gentle substituting, then add a number and symbol to the end. That's quite bad, especially if they re-use that password (because they think it's strong) or if their account has privs.

As has been mentioned in comments, Diceware is a nice way to generate a passphrase. I'd like to see the easy read version of a formal analysis of Diceware. (I suspect that even if the attacker has your dictionary and knows that your passphrase is 5 or 6 words long that Diceware is better than a bunch of other password generation systems.

But, whatever password they have, many users can be persuaded to change it to a known value:

http://passwordresearch.com/stories/story72.html

During a computer security assessment, auditors were able to convince 35 IRS managers and employees to provide them with their username and change their password to a known value. Auditors posed as IRS information technology personnel attempting to correct a network problem.

Answered by DanBeale on January 9, 2021

I think the most important part of this comic, even if it were to get the math wrong (which it didn't), is visually emphasizing that there are two equally important aspects to selecting a strong password (or actually, a password policy, in general):

  • Difficulty to guess
  • Difficulty to remember

Or, in other words:

  • The computer aspect
  • The human aspect

All too often, when discussing complex passwords, strong policies, expiration, etc (and, to generalize - all security), we tend to focus overly much on the computer aspects, and skip over the human aspects.

Especially when it comes to passwords, (and double especially for average users), the human aspect should often be the overriding concern.
For example, how often does strict password complexity policy enforced by IT (such as the one shown in the XKCD), result in the user writing down his password, and taping it to his screen? That is a direct result of focusing too much on the computer aspect, at the expense of the human aspect.

And I think that is the core message from the sage of XKCD - yes, Easy to Guess is bad, but Hard to Remember is equally so.
And that principle is a correct one. We should remember this more often, AKA AviD's Rule of Usability:

Security at the expense of usability comes at the expense of security.

Answered by AviD on January 9, 2021

I love xkcd and agree with his basic point -- passphrases are great for adding entropy, but think he low balled the entropy on the first password.

Let's go through it:

  • Random dictionary word. xkcd: 16 bits, Me: 16 bits. A random word from a dictionary with ~65000 words is lg(65000) ~ 16. Very reasonable
  • Adding in capitalization. xkcd: 1 bit, Me: 0 bits (deal with in common subsitutions). 1 bits means its there or not there -- which seems very low for the complexity added by adding capitalization to a password -- generally when capitalization to a password its in a random place or I can think of other many possible capitalization schemes (capitalize everything, capitalize the last letter). I'm going to group this with common substitutions.
  • Common substitutions. xkcd: 3 bits, Me: 13 bits. Only 8 choices for leet speak substitutions, even when only sometimes used? I can think of on average ~2 ways to leet alter each letter (like a to @,4; b to 8,6; c to (,[, <) and add in the original or capitalizing the letter, then for a 9 letter password, I've increased the entropy by 18 (lg(4**9)=18), if I randomly choose each letter to leet alter or not. This is too high for this password, however. A more reasonable approach would be say I randomly chose say 4 letters to "common substitutions" to (one of two leet options, plus capitalization for three options for each letter being substituted). Then it gets to lg( nCr(9,4) 3*4) ~ 13 bits.

  • Adding two random characters '&3' to the end. xkcd: 8 bits, Me: 15 bits. I'm considering it as two characters to one of ~4 places (say before the word, one before and one after, both after, or both smack in the middle of the word). I'll also let non-special characters be in these two added letters. So assuming an 88 character dictionary (56 characters+10 digits, plus 32 symbols) you add lg(4 * 80**2) ~ 15 bits of entropy.

So I have the calculation as not being 16+2+2+8=28 bits of entropy, but being 16+13+15=44 bits very similar to his passphrase.

I also don't think 3 days at 1000 guesses/sec is by any means "easy" to guess or the plausible attack mechanism. A remote web service should detect and slow down a user trying more than 10000 guesses in a day period for any specific user.

Much more likely are other attacks (e.g., key loggers on public computers, a malicious admin at a service logging passwords and reusing them, eavesdropping if ssl not used, get access to the server somehow (e.g., SQL injection; break into server room; etc)).

I use a passphrase when its necessary -- e.g., strong encryption (and not 44-bits more like 80-bits -- typically 8 word diceware passphrase plus two or three modifications -- e.g., misspell a word or substitute a word for a non diceware word starting with the same two letters; E.g., if you had "yogurt" come up maybe substitute it for "yomama"). For websites (no money involved or security permissions), I don't care about trivial passwords are typically used.

I do notice that for often used passwords, I'm much much better at typing passwords then I am at typing my passphrases (which get annoying when you have to re-key in a ~50 character sentence a few times). Also for passwords, I often prefer finding a random sentence (like a random song lyric -- to a song no one would associate with you that's not particularly meaningful) and come up with a password based on the words (like sometimes use first letter; last letter; or substitute a word for a symbol; etc). E.g. L^#g&B9y3r from "Load up on guns and bring your" from Smells Like Teen Spirit.


TL;DR: Randall is right, if you (a) assume you can check passwords at 1000/s for three days without getting slowed down, and (b) can make many assumptions about the password: constructed based on a rare dictionary word, that is capitalized, has some leet substitutions for some commonly substituted letters, and has a symbol and number added at the end. If you only slightly generalize the allowed substitutions (like I did) and characters added at the end, you get a similar entropy to the passphrase.

In summary, both are probably secure for most purposes with a low threat level. You are much more vulnerable by other exploits, something Randall would likely agree with [538] [792]. In general having password requirements like having a upper/lower/symbol/number is good, as long as long high-entropy passphrases are also allowed. Force additional complexity for shorter passwords, but allow over ~20 characters to be all lower case. Some users will choose poor passphrases just as they choose poor passwords (like "this is fun" which is idiotically claimed to be ridiculously secure here or using their child's name or their favorite sports team). Requiring special characters may make it non-trivial to easily guess (say by a factor of 100-1000 -- changing a password from being 10 likely guesses to 10000 is very significant). Sure it won't prevent any bot on a weak web service that allows thousands of bad login attempts per second, but it forces an addition of a modest level of security which would hinder efforts at sites that limit bad logins or from the unsophisticated manually guessing the password. Sort of like how a standard 5-pin house lock is fundamentally insecure as anyone can learn to pick it in minutes (or break the glass window); however in practice locking your door is good as it provides some safety against the unsophisticated who don't have tools handy (and breaking windows comes with its own dangers of alerting others).

Answered by dr jimbob on January 9, 2021

Totally wrong. There's more than math at work here. Human beings creating passwords out of human language != bots creating randomized strings out of buckets of char. If you start with that assumption, entropy is radically reduced as a factor in the time necessary to crack the password. As Don Corleone, the great philosopher, said: Think like the people who are around you.

Answered by yelvington on January 9, 2021

I'm not a security expert, but I think there's a mistake here.

The point of the comic seems to indicate that by increasing the length of the password, you will increase the complexity.

The complexity of a four word password is actually significantly reduced. Essentially you've swapped 8-11 semi randomly selected characters for 4 randomly selected words. The length of the pass-phrase is less important than the number of words. All that is required is that the attacker attempts various combinations of words from a dictionary. Words used are likely to be fairly common if they are to be remembered and are likely to be short if they must be typed.

Dictionary attacks are already used to great effect. We don't commonly use 260,000 words. We only know around 12,000 to 20,000 words. Of these we commonly use less than 8,000. I can't imagine regular users are going to create passwords which contain long and complicated words to type which further reduces the subset of possible words. Assuming that everyone were to start using this method of password selection, then dictionary attacks would actually become far more effective.

Now, a combination of the two may be more secure, although this defies the point of the exercise.

Answered by nullnvoid on January 9, 2021

I agree with Jeff Atwood. Also, I have taken a (not the) English dictionary I have here in MSSQL with 266,166 words in it (and also 160,086 German and 138,946 Dutch words) and taken a random selection of 10 words for each language.

These are the results for English:

  1. clever-handed
  2. wolframinium
  3. muth
  4. unvolcanic
  5. contradictorily
  6. desperadoism
  7. unpreternatural
  8. placability
  9. recondensation
  10. Remi

Now take any combination of any words that will give you enough entropy and you're good to go. But as you might see from this example it's not very easy to make something easy to remember out of this wordjumbo. So entropy goes down a lot when you're trying to create "understandable (not correct or logical) sentences".

For completeness' sake (random) results for German and Dutch:

German:

  1. torlos
  2. Realisierung
  3. hinterhaeltigsten
  4. anbruellend
  5. Orthograpie
  6. vielsilbigen
  7. lebensfaehig
  8. drang
  9. festgeklebtes
  10. Bauernfuehrer

Dutch:

  1. spats
  2. bijstander
  3. Abcoude
  4. vergunningsaanvraag
  5. schade-eis
  6. ammoniakuitstoot
  7. onsentimenteel
  8. ebstand
  9. radiaalband
  10. profielschets

Answered by RobIII on January 9, 2021

The two passwords, based on rumkin.com's password strength checker:

Tr0ub4dor&3

  • Length: 11
  • Strength: Reasonable - This password is fairly secure cryptographically and skilled hackers may need some good computing power to crack it. (Depends greatly on implementation!)
  • Entropy: 51.8 bits
  • Charset Size: 72 characters

and

correct horse battery staple

  • Length: 28
  • Strength: Strong - This password is typically good enough to safely guard sensitive information like financial records.
  • Entropy: 104.2 bits
  • Charset Size: 27 characters

It is certainly true that length, all other things being equal, tends to make for very strong passwords -- and we're seeing that confirmed here.

Even if the individual characters are all limited to [a-z], the exponent implied in "we added another lowercase character, so multiply by 26 again" tends to dominate the results.

In other words, 7211 < 2728.

Now, what is not clearly addressed:

  1. Will these passwords have to be entered manually? And if so, how difficult is it, mechanically, to enter a each character of the password? On a keyboard it's easy, but on a smartphone or console... not so much.

  2. How easy are these passwords to remember?

  3. How sophisticated are the password attacks? In other words, will they actually attempt common schemes like "dictionary words separated by spaces", or "a complete sentence with punctuation", or "leet-speak numb3r substitution" as implied by xkcd? Crucially, this is how XKCD justifies cutting the entropy of the first password in half!

Point 3 is almost unanswerable and I think personally highly unlikely in practice. I expect it will be braindead brute force all the way to get the low-hanging fruit, and the rest ignored. If there isn't any low-hanging password fruit (and oh, there always is), they'll just move on to the next potential victim service.

Therefore I say the cartoon is materially accurate in terms of its math, but the godlike predictive password attacks it implies are largely a myth. Which means, IMHO, that these specific two passwords are kind of a wash in practice and would offer similar-ish levels of protection.

Answered by Jeff Atwood on January 9, 2021

I agree that length is often preferable to complexity. But I think the controversy is less around that, and more around how much entropy you want to have. The comic says that a "plausible attack" is 1000 guesses/second:

"Plausible attack on a weak remote web service. Yes, cracking a stolen hash is faster, but it's not what the average user should worry about"

But I see more of a consensus that web site operators can't keep their hash databases secure over time against attackers, so we should engineer the passwords and hash algorithms to withstand stealing the hashes for offline attack. And an offline attack can be massive, as described at How to securely hash passwords?

This makes the problem even harder, and sites should really be looking at options besides requiring users to memorize their own passwords for each web site, e.g. via OpenID and OAuth. That way the user can get one good authentication method (perhaps even involving a hardware token) and leverage it for web access.

But good password hashing can also be done via good algorithms, a bit more length, and bookmarklet tools. Use the techniques described at the above question on the server (i.e. bcrypt, scrypt or PBKDF2), and the advice at Is there a method of generating site-specific passwords which can be executed in my own head? on the use of SuperGenPass (or SuperChromePass) on the user/client end.

Answered by nealmcb on January 9, 2021

Reading the comment, I have a few thoughts.

  1. The entropy count on the password is rated quite low. At the very least, you should add in another 1-3 bits for character substitutions. In this example, it seems like the exact formula is known, which seems a bit unlikely.
  2. The entropy count of the longer password seems correct to me.
  3. The base word seems to be "Troubador", if I'm deciphering the common substitutions correctly. I don't think that's a common word, so limiting the entropy of a dictionary based word to only 11 bits of entropy is a bit low. I'm guessing that it would still be in a dictionary, but should be expanded to at least a 15 bit entropy, or a selection of 32K words. That seems to be about the level it would be at using a dictionary based attack.

It is true that in principal, a long password composed of random words is at least as good of a password as a short password with more characters, but the words must be random. If you start quoting a well known phrase, or even anything that could be a sentence or part of one, it severely limits the entropy.

Answered by PearsonArtPhoto on January 9, 2021

Here is a thorough explanation of the mathematics in this comic:

The little boxes in the comic represent entropy in a logarithmic scale, i.e. "bits". Each box means one extra bit of entropy. Entropy is a measure of the average cost of hitting the right password in a brute force attack. We assume that the attacker knows the exact password generation method, including probability distributions for random choices in the method. An entropy of n bits means that, on average, the attacker will try 2n-1 passwords before finding the right one. When the random choices are equiprobable, you have n bits of entropy when there are 2n possible passwords, which means that the attacker will, on average, try half of them. The definition with the average cost is more generic, in that it captures the cases where random choices taken during the password generation process (the one which usually occurs in the head of the human user) are not uniform. We'll see an example below.

The point of using "bits" is that they add up. If you have two password halves that you generate independently of each other, one with 10 bits of entropy and the other with 12 bits, then the total entropy is 22 bits. If we were to use a non-logarithmic scale, we would have to multiply: 210 uniform choices for the first half and 212 uniform choices for the other half make up for 210·212 = 222 uniform choices. Additions are easier to convey graphically with little boxes, hence our using bits.

That being said, let's see the two methods described in the comic. We'll begin with the second one, which is easier to analyze.

The "correct horse" method

The password generation process for this method is: take a given (public) list of 2048 words (supposedly common words, easy to remember). Choose four random words in this list, uniformly and independently of each other: select one word at random, then select again a word at random (which could be the same as the first word), and so on for a third and then a fourth words. Concatenate all four words together, and voila! you have your password.

Each random word selection is worth 11 bits, because 211 = 2048, and, crucially, each word is selected uniformly (all 2048 words have the same probability of 1/2048 of being selected) and independently of the other words (you don't choose a word so that it matches or non-matches the previous words, and, in particular, you do not reject a word if it happens to be the same choice as a previous word). Since humans are not good at all at doing random choices in their head, we have to assume that the random word selection is done with a physical device (dice, coin flips, computers...).

The total entropy is then 44 bits, matching the 44 boxes in the comic.

The "troubador" method

For this one, the rules are more complex:

  1. Select a random word in a given big list of meaningful words.
  2. Decide randomly whether to capitalize the first letter, or not.
  3. For the letters which are eligible to "traditional substitutions", apply or not apply the substitution (decide randomly for each letter). These traditional substitutions can be, for instance: "o" -> "0", "a" -> "4", "i" -> "!", "e" -> "3", "l" -> "1" (the rules give a publicly known exhaustive list).
  4. Append a punctuation sign and a digit.

The random word is rated to 16 bits by the comic, meaning uniform selection in a list of 65536 words (or non-uniform in a longer list). There are more words than that in English, apparently about 228000, but some of them are very long or very short, others are so uncommon that people would not remember them at all. "16 bits" seem to be a plausible count.

Uppercasing or not uppercasing the first letter is, nominally, 1 bit of entropy (two choices). If the user makes that choice in his head, then this will be a balance between user's feeling of safety ("uppercase is obviously more secure !") and user's laziness ("lowercase is easier to type"). There again, "1 bit" is plausible.

"Traditional substitutions" are more complex because the number of eligible letters depends on the base word; here, three letters, hence 3 bits of entropy. Other words could have other counts, but it seems plausible that, on average, we'll find about 3 eligible letters. This depends on the list of "traditional substitutions", which are assumed to be a given convention.

For the extra punctuation sign and digit, the comic gives 1 bit for the choice of which comes first (the digit or the punctuation sign), then 4 bits for the sign and 3 bits for the digit. The count for digits deserves an explanation: this is because humans, when asked to choose a random digit, are not at all uniform; the digit "1" will have about 5 to 10 times more chances of being selected than "0". Among psychological factors, "0" has a bad connotation (void, dark, death), while "1" is viewed positively (winner, champion, top). In south China, "8" is very popular because the word for "eight" is pronounced the same way as the word for "luck"; and, similarly, "4" is shunned because of homophony with the word for "death". The attacker will first try passwords where the digit is a "1", allowing him to benefit from the non-uniformity of the user choices.

If the choice of digit is not made by a human brain but by an impartial device, then we get 3.32 bits of entropy, not 3 bits. But that's close enough for illustration purposes (I quite understand that Randall Munroe did not want to draw partial boxes).

Four bits for punctuation are a bit understated; there are 32 punctuation signs in ASCII, all relatively easy to type on a common keyboard. This would mean 5 bits, not 4. There again, if the sign is chosen by a human, then some signs will be more common than others, because humans rarely think of '#' or '|' as "punctuation".

The grand total of 28 bits is then about right, although it depends on the precise details of some random selections, and the list of "traditional substitutions" (which impacts the average number of eligible letters). With a computer-generated password, we may hope for about 30 bits. That's still low with regards to the 44 bits of the "correct horse" method.

Applicability

The paragraphs above show that the maths in the comic are correct (at least with the precision that can be expected in these conditions -- that's a webcomic, not a research article). It still requires the following conditions:

  • The "password generation method" is known by the attacker. This is the part which @Jeff does not believe. But it makes sense. In big organizations, security officers publish such guidelines for password generation. Even when they don't, people have Google and colleagues, and will tend to use one of about a dozen or so sets of rules. The comic includes provisions for that: "You can add a few more bits to account for the fact that this is only one of a few common formats".

    Bottom-line: even if you keep your method "secret", it won't be that secret because you will more or less consciously follow a "classic" method, and there are not that many of those.

  • Random choices are random and uniform. This is hard to achieve with human users. You must convince them to use a device for good randomness (a coin, not a brain), and to accept the result. This is the gist of my original answer (reproduced below). If the users alter the choices, if only by generating another password if the one they got "does not please them", then they depart from random uniformity, and the entropy can only be lowered (maximum entropy is achieved with uniform randomness; you cannot get better, but you can get much worse).

The right answer is of course that of @AviD. The maths in the comic are correct, but the important point is that good passwords must be both hard to guess and easy to remember. The main message of the comic is to show that common "password generation rules" fail at both points: they make hard to remember passwords, which are nonetheless not that hard to guess.

It also illustrates the failure of human minds at evaluating security. "Tr0ub4dor&3" looks more randomish than "correcthorsebatterystaple"; and the same minds will give good points to the latter only because of the wrong reason, i.e. the widespread (but misguided) belief that password length makes strength. It does not. A password is not strong because it is long; it is strong because it includes a lot of randomness (all the entropy bits we have been discussing all along). Extra length just allows for more strength, by giving more room for randomness; in particular, by allowing "gentle" randomness that is easy to remember, like the electric horse thing. On the other hand, a very short password is necessarily weak, because there is only so much entropy you can fit in 5 characters.

Note that "hard to guess" and "easy to remember" do not cover all that is to say about password generation; there is also "easy to use", which usually means "easy to type". Long passwords are a problem on smartphones, but passwords with digits and punctuation signs and mixed casing are arguably even worse.


Original answer:

The comic assumes that the selection of a random "common" word yields an entropy of about 11 bits -- which means that there are about 2000 common words. This is a plausible count. The trick, of course, is to have a really random selection. For instance, the following activities:

  • select four words randomly, then remember them in the order which makes most sense;
  • if the four words look too hard to remember, scrap them and select four others;
  • replace one of the words with the name of a footballer (the attacker will never guess that !);

... all reduce the entropy. It is not easy to get your users to actually use true randomness and accept the result.

The same users will probably complain about the hassle of typing a long password (if the typing involves a smartphone, I must say that I quite understand them). An unhappy user is never a good thing, because he will begin to look for countermeasures which will make his life easier, such as keeping the password in a file and "typing" it with a copy&paste. Users can often be surprisingly creative that way. Therefore long passwords have a tendency to backfire, security-wise.

Answered by Thomas Pornin on January 9, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP