buildgame

Why you should be hashing sensitive data

When most people think about security, they think about a login/registration system. They usually don’t tend to think about what you need to protect most – a user’s data.

Take, for example, the simple authentication system we built in the last two posts. In both of those cases, there was at least one line of code that would hash the user’s password before it was stored into the database. That’s because, when you’re building something people will be storing their data in, keeping the user’s data safe is vitally important.

There are no ifs, ands, or buts about the situation. If you are storing any kind of sensitive data – be it passwords, credit card numbers, or anything else that can be considered ’sensitive’(even if it’s just the user’s mother’s maiden name or something), you need to either encrypt or hash the data.

What’s the difference between hashing and encryption? There’s just one – encryption can be broken. No matter how strong your encryption is, if someone is determined enough they will eventually manage to break it. Plain and simple.

That’s why, if the data isn’t something you’ll need to retrieve and display, you should hash it instead of encrypt it. By hashing the data, you supply a ’salt’ and the value to hash, and it gets converted one-way into a string of characters. The process of hashing the data makes it irretrievable – if you were to hash, for example, ’23skidoo’, you might get back something like ‘23Kub08nEFeSs’. Whatever data you hash is gone – you can’t get it back.

But even though the hashed data is gone, that doesn’t mean it’s useless – you can still use it for comparisons. As an example, let’s look at some pseudocode for a registration and login system that uses hashing to protect user passwords. Here’s what would happen on the registration page:

if(username not taken and passwords match) {
insert user into database with username sent to us, password(hashed)
}
Pretty simple, right? Here’s the logic for the login page:

variable hashedpassword = hash(password sent to us)
if(user exists in database with username passed to us and hashed password) {
log the user in, and redirect them to the post-login page
} else {
login error – password/username mismatch!
}
And it’s actually that simple. As long as you don’t need to retrieve the data and display it back to the user, hashed is the way to go for data protection.

Now, you might be thinking “but wait – this is all well and good, but what about if a user forgets their password and I want to e-mail it to them? If I hash their password, I can’t send it to them!” – and you would be right. In that case, you have two options. You could either encrypt their password instead of hashing it, and then just decrypt it and send them that one – or you could simply auto-generate a new password for them, re-hash it and store it into the database, and then send them that new password. Which approach you choose is up to you.

Whatever you do, don’t store sensitive data in plaintext. It only takes one slip for an attacker to get access to your database, and instantly be able to read every user’s passwords. By hashing your data, you can make sure that instead of seeing this:

They see this:

Which will help you to prevent the attacker from actually getting access to any user accounts. Hopefully this brief entry has shown you how easy it is to keep a user’s sensitive data safe – which will help protect both you and your users in the long run.