The Bcrypt Protocol… is kind of a mess
Table of Contents
While writing my own bcrypt library, being unsatisfied with the current Java de-facto standard implementation jBcrypt, I discovered a lot of odd things surrounding the bcrypt protocol (mind you: not the underlying cryptographic primitive ‘Eksblowfish’).
Bcrypt is a password hashing function designed by Niels Provos and David Mazières in 1999 which became popular as the default password hash algorithm for OpenBSD¹. In comparison to simple cryptographic hash functions (like SHA-256), the main benefit of using bcrypt is that a developer can set how expensive it is to calculate the hash. This is called key stretching and should be used with any (usually weak) user provided password to protect against brute force attacks (i.e. simple guessing).
Introduction to Bcrypt #
So how does it work? First you need a password and set the iteration count as a logarithmic work factor between 4–31, doubling the required computation every increment. So for example it could look like this:
bcrypt("secretPassword", 8)
which may output
This is the modular crypt format defined and used by OpenBSD. The first part $2a$ is the protocol version identifier. Historically bcrypt used to be $2$ but since nobody back then thought about Unicode handling, a new version had to be defined. The next part 08$ is the work factor as passed to the function. The next 22 ASCII characters (16 byte raw) 0SN/h83Gt1jZMR6924.Kd. represent the encoded salt. Most implementations automatically create a salt for the caller, which is nice. This measure protects against rainbow tables, i.e. using a list of pre-calculated hashes of commonly used passwords.
And finally the last 23 byte or encoded 31 ASCII characters are the actual bcrypt hash: HaK3MyTDt/W8FCjUOtbY3Pmres5rsma. This format is especially convenient for storing password hashes, since all the parameters apart from the actual password are included.
Issue 1: Non-Standard encoding #
One oddity of bcrypt is that it uses a non-standard Base64 dialect for encoding the salt and hash. This dialect is listed in Wikipedia’s “Radix-64 Applications” section, and is attributed to crypt(3), with modern usage limited to bcrypt. Although it uses the same alphabet as standard Base64, it is permutated, making it incompatible with RFC 4648. This makes implementing it unnecessarily harder and more error-prone, since practically all programming languages have an RFC 4648 implementation of Base64 or support for it.
Issue 2: Using 23 byte instead of the full 24 bytes hash #
As stated before, nearly all bcrypt implementations output a 23 byte long hash. The bcrypt algorithm however generates a 24 byte password hash by encrypting three 8 byte blocks using a password-derived blowfish key. The original reference implementation however choose truncate the hash output, it is rumored the reason is to cap it to a more manageable length of 60-character limit (a strange reason if you ask me). The consensus seems to be that the issue of cutting a hash byte off is not a meaningful degradation in security, so it stays an oddity inherited from the reference implementation.
Issues 3: Derivations of the White Paper vs Reference Implementation #
According to the BouncyCastle Javadoc the OpenBSD reference implementation derivatives from the algorithm described in the whitepaper:
In contrast to the paper, the order of key setup and salt setup is reversed: state <- ExpandKey(state, 0, key) state <- ExpandKey(state, 0, salt). This corresponds to the OpenBSD reference implementation of Bcrypt.
There is no reference to a potential security vulnerability because of that (and I wouldn’t expect one). Since basically everybody copied the behavior of the reference implementation, the specification seems to be superseded.
Issue 4: No handling for passwords longer than 56/72 bytes #
According to the whitepaper:
(…) the key argument is a secret encryption key, which can be a user-chosen password of up to 56 bytes (including a terminating zero byte when the key is an ASCII string).
Others have pointed out that the algorithm, internally, manages things as 18 32-bit words, for a total of 72 bytes (including a null-terminator byte). Mind that a UTF-8 character can take up to 4 bytes of space, which would limit the password with the reference limitation to 14 characters in the worst case.
The confusion about the actual limitation and the lack of specification on what to do on long passwords can create a lot of issues with compatibility as well as possibly creating a lot of different dialects of bcrypt.
Issue 5: Many Non-Standard Versions #
As stated earlier, bcrypt started out with version $2$ which lacked definition on how to handle non-ASCII characters, so the most prevalent version is $2a$ which fixed that. But of course other implementations have had issues too, so there is a $2x$ and $2y$ for the PHP version and there is talk to bump the version output of the original to $2b$. Most of these version changes address bugs in specific implementation and may not apply to others. This however makes it harder to achieve interoperability between different systems (most often databases which are used by PHP and other systems, e.g. this)
Issue 6: Slightly Inefficient Format #
The output format
$2a$08$0SN/h83Gt1jZMR6924.Kd.HaK3MyTDt/W8FCjUOtbY3Pmres5rsma
is clearly optimized to be user readable. It is also slightly inefficient to parse: first the whole string has to be read in as ASCII, then it has to be parsed character to character until the last $. After that the next 22 and 32 characters are decoded separately. Using a more compact message format and encoded only once, for example:
the storage demand is reduced from 60 bytes to ~56 byte (Base64 encoded). This is irrelevant in most use cases, but in the grant scale when storing millions or billions of password hashes this can make a difference slightly reducing storage demand and parsing computational time.
Issue 7: No Official Test Vectors #
When trying to implement bcrypt, the developer is faced with the lack of official test vectors (aka test cases) to verify the algorithm. Neither the white paper, nor a Google search reveals a lot of useful examples other than some random test cases. This makes it hard to verify compatibility with other implementations. That’s actually the reason I posted some of my own, trying to capture most of the edge cases; they can be found here.
Issue 8: Not well suited as Key Derivation Function #
Bcrypt on OpenBSD was designed for password storage. Many times however it is necessary to create a secret key from a user password requiring a key derivation function (KDF). The two lacking properties are the ability to set arbitrary out length to satisfy different key types and to just output the raw hash without the whole message format. Currently, one would need to parse the last 31 characters from the hash message and then expand it with a proper KDF like. HKDF.
Note: Bcrypt not being a KDF is not necessarily an issue with bcrypt (it never claimed to be). However, often times developers misuse it as such without really knowing about the potential problems. Giving an option to use as KDF would help.
Conclusion #
The lack of a strong specification by an authority, age and various quirks and bugs in many implementations make it hard to properly implement this time tested password hashing function.
Security is hard to get right. So even if some of the issues seem nitpicky (and maybe they are), there is no reason to strive for the most simple, straight forward implementation and API resulting in a reviewed specification which can be used as a base for an implementation. Trust me, I’ve been there.
Just to reiterate: I do not challenge the security strength of the underlying “Eksblowfish” (“expensive key schedule Blowfish”), those analyses should be left to the cryptographic experts. I would summarize however that bcrypt is still in the realm of recommended password hashing functions.
Stay tuned for part 2 where I propose a KDF based on bcrypt and an improved password hashing protocol.
A small plug: most of the issues explained can be overcome with my Java implementation of bcrypt (using the Eksblowfish algorithm from jBcrypt). The main goal of this lib is to be as straight forward as possible to make it hard for people not familiar with bcrypt to get it wrong, but still allowing as much flexibility as possible. Check it out, you maybe find it useful (it’s Apache v2):