Saturday 12 February 2011

Passwords and server side storage.

OK, so some background reading here. I know it says I will post something about password lockers, but this is not it. Instead of considering where you store your password, we consider where they store you password. They being any website you have signed up for. 

Now, it's obvious that they need to store your password. If not, the whole exercise would be pointless. The initial idea was to store the password in plaintext in a file in the server. BAD IDEA! If an attacker were to somehow get a hold of that file, you are screwed. Royally! The attacker now knows all your users' login credentials and can run amok doing all sorts of bad things. So, there were 2 bright ideas to fix this: encryption and hashing.

We note that both work in a similar way at the high level, which we will abstract as a function f. With these, when a user registers, we get the plaintext input of the password, pass and compute f(pass). We then store f(pass) instead of pass. When a user attempts to login, we get the password they entered login_pass and compute f(login_pass). We then check to see if f(login_pass) == f(pass). If they are the same, we conclude the user has entered the right password and let them in. If not, then they get an annoying error message.

Now, we straight away notice a problem: Collisions! If person A and person B both have the same password, then the stored value f(pass) will be the same. This means that if the attacker knows person A's password, they know person B's password. There is a very very small possibility that f(passA) = f(passB) and passA != (not equal) passB, but we will ignore this possibility. So, now what?

Well, the answer is salt. Well not exactly that kind of salt, but what we call a "random salt." Basically, this is a random bit-string, which or may not have some structure. So, we now have the modified password storage protocol as follows: A user registers with password pass. We compute f(pass+salt) and store (f(pass+salt), salt). When a user attempts to login, we get the password they entered login_pass and the stored salt. We compute f(login_pass+salt). We then check to see if f(login_pass+salt) == f(pass+salt). Thus, even if 2 users have the same password, the collision is not apparent due the salt. Now, they are secure, right?

Not quite yet. This depends on your choice of the function f. Let's start by discussing the symmetric key crypto option. You need to use a secure crypto scheme, such as AES, with a large enough key size. Then, we need to consider the key storage. If we store the key with the passwords file, then there is no security. If the attacker can access the password file, its safe to assume he can access the encryption key. Once, he has the encryption key, it's game over! But if you store the key safely in another location, even if the attacker gets the password file, there is still some security.

Now, we look at hash functions. You would need to use a collision-resistant hash function, which is not broken. Since the hash functions are not really keyed, there is no concern about key storage, so that makes things a bit easier. However the trade-off is the due to the fact that hash functions are designed to be computed quickly, the attacker has quite an advantage as detailed here.

Of course, the increased computational power attacks can also make it faster to break the symmetric crypto algorithm. How quickly this is don depends on known attacks for the algorithm. So, there is still a threat if your encrypted password file is compromised. Ask Gawker, they will tell you. This particular attack revealed some interesting statistics about passwords, shown here. More about these later.

For now, I have but one humble request for all admins on all servers out there: please store my, and everybody else's passwords securely. Although a lot of them do it right, some don't. This needs to change.

2 comments:

  1. Interesting read :)

    To add - A complexity function like PBKDF2 (http://en.wikipedia.org/wiki/PBKDF2) should be wrapped around the inner choice of function/salt combo.

    The key deviation function basically joins n iterations of the inner function, with n=1000 recommended. This means that to calculate h = f(salt+pass) in a brute force search (or rainbow table generation), the attacker must run f() n times + some other computation bits to get the resulting hash. This makes the style of attack much less feasible (if it wasn't already) :-)

    ReplyDelete
  2. True. I decided to omit that detail as the math gets a bit funky for certain choices of f. But a very valid point

    ReplyDelete