Nicolas Polomack
My personal blog, with all my random thoughts
Handling authentication in Alexandrie
2019-10-14 - About the design process of the authentication flow of Alexandrie

Alexandrie has now been announced to the world (via Reddit), after quite some development time already.
Having it ready for the announcement was a main focus of these recent months.
Since Alexandrie is open-source and aims to be easily deployable, its security story needs to be solid, else we risk exposing personal user data (even though, today, 'personal user data' is limited to the user's password).

# First draft: Hashing (SHA-512)

The initial plan was the same one I always done for other little web projects: just hash the user's password before storing it in the database.

Here is some pseudocode illustrating how this would work:

def client_side (user provides: [email, password]):
    client_hash <- sha512(password)

def server_side (client transmits: [email, client_hash]):
    expected_hash <- from_database(users::passwd)
    if expected_hash == client_hash:
        authentication_success()
    else:
        authentication_failure()

Note that this pseudocode syntax is arbitrary and isn't meant to be real code.
It ressembles the Python syntax in order to enable some syntax-highlighting (for readability purposes).

At first, I was in the camp of computing the hash client-side to prevent an evedropper from being able to read the plaintext password in-transit.
But, having the server directly putting it in the database means that in the case of a database dump, where an attacker would get access to all the password hashes, the attacker can just make a manual POST request submitting that hash and he would successfully be logged in.

The solution for solving this would be to compute the hash server-side.
But then we would transport the plaintext password over-the-wire, which is not good if someone happens to somehow evedrop the connection.
Also, this would mean that the server would end up storing the user's password in memory for some time, before computing its hash.
If a vulnerability were to be found which allows the attacker to read the server's memory and send it back (kinda like the Heartbleed vulnerability), he could end up seeing other users' passwords while they're being processed.

# Second draft: Double hashing

To get the best of both worlds, we could try to hash twice, once client-side and once server-side.
This fixes all the issues we discussed above:

  • The plaintext password can't be seen in-transit on the connection.
  • In case of a database dump, the attacker can't log in as the user, as he would to know the intermediate hash (the client-side computed hash, which is not stored anywhere).
  • The server never ends up with the plaintext password in-memory, protecting against memory-dumping attacks.
def client_side (user provides: [email, password]):
    client_hash <- sha512(password)

def server_side (client transmits: [email, client_hash]):
    server_hash <- sha512(client_hash)
    expected_hash <- from_database(users::passwd)
    if expected_hash == server_hash:
        authentication_success()
    else:
        authentication_failure()

But there is another problem:
A hash is a deterministic one-way function, you cannot get the original from its hash but a hash is a signature (a witness) of the original data.
The same original data always results in the same hash.
This means that, if multiple users happen to use the same passwords, the hashes stored in the database for these users would be the same.
That could allow an attacker that succeeded to perform a database dump, to potentially guess common passwords based on how many users uses it.

# Third draft: Double hashing + salting

Let's now introduce salting into the mix.
When salting comes into play, no hashes will ever be the same anymore (literally).
Salting describes the act of adding extra data to the password before computing the hash.

def client_side (user provides: [email, password]):
    client_hash <- sha512(password)

def server_side (client transmits: [email, client_hash]):
    user_salt <- from_database(users::salt)
    server_salted_hash <- sha512(client_hash + salt)
    expected_salted_hash <- from_database(users::passwd)
    if expected_salted_hash == server_salted_hash:
        authentication_success()
    else:
        authentication_failure()

The salt should be unique (or close to it) for each user but it is not necessarily secret knowledge.
The only purpose of the salt is to mess with the hashing algorithm to have it output a different and unique hash.
This has the effect that two users who shares the same password don't share the same hash, thus increasing the entropy about the shared original plaintext password.

But (there's always a 'but'), this just moved the problem elsewhere, someone evedropping the connection will see the same intermediate hash for two different users who share their passwords (since the salt only happens server-side).
Maybe it sounds nitpicky and we shouldn't worry about this with HTTPS, but let's solve this anyway.

# Final draft: Double hashing + double salting (+ move to PBKDF2)

So here is the final draft, this is the one currently implemented in Alexandrie and I think it is quite good now. The idea is just to salt on both sides.
As the salt, we could use another server-generated salt and send it back to the client.
But, as the salt doesn't need any secrecy and needs to be unique per user, we can actually just use the user's email as the salt (the email is unique to each user).

In addition to salting on both sides, in order to slow the brute-forceability of the system, we could swap a single hash computation for something that takes a bit more time to compute.

For this, there exists the PBKDF2 algorithm, which is perfect for our use-case.
PBKDF2 stands for: Password-Based Key Derivation Function version 2. It takes a password, a salt, a number of iterations to perform, a stream cipher algorithm and a hashing algorithm.
It uses both the stream cipher and the hashing algorithm in a loop for the given number of iterations.
The output is a hash from the given hashing algorithm.

We'll be using PBKDF2 with the HMAC stream cipher and the SHA-512 hashing algorithm.

Here is the new authentication flow:

def client_side (user provides: [email, password]):
    derived_client_salted_hash <- pbkdf2(HMAC_SHA512, password, email, 5_000)

def server_side (client transmits: [email, derived_client_salted_hash]):
    user_salt <- from_database(users::salt)
    derived_server_salted_hash <- pbkdf2(HMAC_SHA512, derived_client_salted_hash, user_salt, 100_000)
    expected_salted_hash <- from_database(users::passwd)
    if expected_salted_hash == derived_server_salted_hash:
        authentication_success()
    else:
        authentication_failure()

Now, we have a lot of great properties:

  • It is almost impossible to get back at the plaintext password from the intermediate hash (thanks to the first salt).
  • It is almost impossible to get back at the intermediate hash from the server-side stored hash (thanks to the second salt).
  • The whole authentication routine takes about a second to compute, which is fast enough for logging-in as a legitimate user, but slow enough to significantly impact the potential bruteforce rate.
  • The server never ever get to see any thing that could be used to get the plaintext password back, the password is completely opaque to the server.
  • Not really compute-heavy on the client.

Notice that we do PBKDF2 for 5 000 rounds client-side and the server then does it for 100 000 rounds (which seems ridiculous at first, but this is to be slow-enough to compute, and instead of just waiting, why not increase the entropy of the hash by repeating it countless times while we have some time to spare). The client only does 5 000 rounds as its available resources aren't under our control, as opposed to the server.

For the performance, I am using Rust and wasm-pack to compile a WebAssembly module that can perform this key derivation on the client-side portably.

# Conclusion

I hope I could make this explaination of how Alexandrie's password management works and how it came to be this solution, clear and easily understandable.
This is personally the first time I implement the password management of a user-managing service this seriously and I find the final solution to be really neat (even the pseudocode is quite small).

# Identified a security vulnerability ?

If you've found a vulnerability in this system and wish to share it responsibly to get it fixed, I am pleased to be able to say that Alexandrie has access to Security advisories, so please contact me at nicolas@polomack.eu and I'll open an advisory with you so that we can work on the fix.